Tianshou

Author	SHA1	Message	Date
Bernard Tan	bc53ead273	Implement CQLPolicy and offline_cql example (#506 )	2022-01-16 05:30:21 +08:00
Yi Su	a59d96d041	Add Intrinsic Curiosity Module (#503 )	2022-01-15 02:43:48 +08:00
Markus28	a2d76d1276	Remove reset_buffer() from reset method (#501 )	2022-01-12 16:46:28 -08:00
Yi Su	3592f45446	Fix critic network for Discrete CRR (#485 ) - Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies; - Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic; - Add `writer.flush()` in TensorboardLogger to ensure real-time result; - Enable `test_collector=None` in 3 trainers to turn off testing during training; - Updates the Atari offline results in README.md; - Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments. v0.4.5	2021-11-28 23:10:28 +08:00
Bernard Tan	5c5a3db94e	Implement BCQPolicy and offline_bcq example (#480 ) This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning). Example usage is in the examples/offline/offline_bcq.py.	2021-11-22 22:21:02 +08:00
Jiayi Weng	94d3b27db9	fix tqdm issue (#481 )	2021-11-19 00:17:44 +08:00
Markus28	8f19a86966	Implements set_env_attr and get_env_attr for vector environments (#478 ) close #473	2021-11-03 00:08:00 +08:00
Jiayi Weng	098d466467	fix atari wrapper to be deterministic (#467 )	2021-10-19 22:26:11 +08:00
Jiayi Weng	b9eedc516e	bump to 0.4.4 v0.4.4	2021-10-13 12:22:24 -04:00
Ayush Chaurasia	63d752ee0b	W&B: Add usage in the docs (#463 )	2021-10-13 23:28:25 +08:00
Jiayi Weng	926ec0b9b1	update save_fn in trainer (#459 ) - collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger) - save_fn() will be called at the beginning of trainer	2021-10-13 21:25:24 +08:00
Jiayi Weng	e45e2096d8	add multi-GPU support (#461 ) add a new class DataParallelNet	2021-10-06 01:39:14 +08:00
Jiayi Weng	5df64800f4	final fix for actor_critic shared head parameters (#458 )	2021-10-04 23:19:07 +08:00
Ayush Chaurasia	22d7bf38c8	Improve W&B logger (#441 ) - rename WandBLogger -> WandbLogger - add save_data and restore_data - allow more input arguments for wandb init - integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py - documentation update	2021-09-24 21:52:23 +08:00
Jiayi Weng	e8f8cdfa41	fix logger.write error in atari script (#444 ) - fix a bug in #427: logger.write should pass a dict - change SubprocVectorEnv to ShmemVectorEnv in atari - increase logger interval for eps	2021-09-09 00:51:39 +08:00
n+e	fc251ab0b8	bump to v0.4.3 (#432 ) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check v0.4.3	2021-09-03 05:05:04 +08:00
Ending Hsiao	a740496a51	fix dual clip implementation (#435 ) close #433	2021-09-02 21:43:14 +08:00
Andriy Drozdyuk	8a5e2190f7	Add Weights and Biases Logger (#427 ) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2021-08-30 22:35:02 +08:00
n+e	e4f4f0e144	fix docs build failure and a bug in a2c/ppo optimizer (#428 ) * fix rtfd build * list + list -> set.union * change seed of test_qrdqn * add py39 test	2021-08-30 02:07:03 +08:00
Yi Su	291be08d43	Add Rainbow DQN (#386 ) - add RainbowPolicy - add `set_beta` method in prio_buffer - add NoisyLinear in utils/network	2021-08-29 23:34:59 +08:00
Andriy Drozdyuk	d161059c3d	Replaced indice by plural indices (#422 )	2021-08-20 21:58:44 +08:00
deeplook	728b88b92d	Fix conda install command (#419 )	2021-08-16 18:56:01 +08:00
n+e	5b7732a29b	make ppo discrete test script more general (#418 )	2021-08-15 21:37:37 +08:00
n+e	bba30f83d1	fix sb2's coverage (#412 )	2021-08-10 17:43:27 +08:00
Miguel Morales	42538f8e58	Update README.md (#410 )	2021-08-10 09:14:20 +08:00
ChenDRAG	0674ff628a	Cite Tianshou's latest paper (#406 ) * Cite Tianshou's latest paper * update new version README * change order Co-authored-by: Jiayi Weng <wengj@sea.com>	2021-08-10 08:35:01 +08:00
Andriy Drozdyuk	18d2f25eff	Remove warnings about the use of save_fn across trainers (#408 )	2021-08-04 09:56:00 +08:00
n+e	c19876179a	add env_id in preprocess fn (#391 )	2021-07-05 09:50:39 +08:00
n+e	ebaca6f8da	add vizdoom example, bump version to 0.4.2 (#384 ) v0.4.2	2021-06-26 18:08:41 +08:00
Yi Su	c0bc8e00ca	Add Fully-parameterized Quantile Function (#376 )	2021-06-15 11:59:02 +08:00
Yi Su	21b2b22cd7	update iqn results and reward plots (#377 )	2021-06-10 09:05:25 +08:00
Yi Su	f3169b4c1f	Add Implicit Quantile Network (#371 )	2021-05-29 09:44:23 +08:00
n+e	458028a326	fix docs (#373 ) - fix css style error - fix mujoco benchmark result	2021-05-23 12:43:03 +08:00
Ark	655d5fb14f	Allow researchers to choose whether to use Double DQN (#368 )	2021-05-21 10:53:34 +08:00
Yi Su	8f7bc65ac7	Add discrete Critic Regularized Regression (#367 )	2021-05-19 13:29:56 +08:00
Yi Su	b5c3ddabfa	Add discrete Conservative Q-Learning for offline RL (#359 ) Co-authored-by: Yi Su <yi.su@antgroup.com> Co-authored-by: Yi Su <yi.su@antfin.com>	2021-05-12 09:24:48 +08:00
Ark	84f58636eb	Make trainer resumable (#350 ) - specify tensorboard >= 2.5.0 - add `save_checkpoint_fn` and `resume_from_log` in trainer Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-05-06 08:53:53 +08:00
Yuge Zhang	f4e05d585a	Support deterministic evaluation for onpolicy algorithms (#354 )	2021-04-27 21:22:39 +08:00
n+e	ff4d3cd714	Support different state size and fix exception in venv.__del__ (#352 ) - Batch: do not raise error when it finds list of np.array with different shape[0]. - Venv's obs: add try...except block for np.stack(obs_list) - remove venv.__del__ since it is buggy	2021-04-25 15:23:46 +08:00
ChenDRAG	bbc3c3e32d	Add numerical analysis tool and interactive plot (#341 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-22 12:49:54 +08:00
ChenDRAG	844d7703c3	NPG Mujoco benchmark release (#347 )	2021-04-21 16:31:20 +08:00
ChenDRAG	1dcf65fe21	Add NPG policy (#344 )	2021-04-21 09:52:15 +08:00
n+e	c059f98abf	fix atari_bcq (#345 )	2021-04-20 22:59:21 +08:00
ChenDRAG	a57503c0aa	TRPO benchmark release (#340 )	2021-04-19 17:05:06 +08:00
n+e	f68cb78ed7	Add self-hosted runner for GPU checks (#339 )	2021-04-18 16:57:37 +08:00
ChenDRAG	5057b5c89e	Add TRPO policy (#337 )	2021-04-16 20:37:12 +08:00
ChenDRAG	333b8fbd66	add plotter (#335 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-14 14:06:36 +08:00
ChenDRAG	dd4a01132c	Fix SAC loss explode (#333 ) * change SAC action_bound_method to "clip" (tanh is hardcoded in forward) * docstring update * modelbase -> modelbased v0.4.1	2021-04-04 17:33:35 +08:00
n+e	825da9bc53	add cross-platform test and release 0.4.1 (#331 ) * bump to 0.4.1 * add cross-platform test	2021-03-31 15:14:22 +08:00
n+e	09692c84fe	fix numpy>=1.20 typing check (#323 ) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list).	2021-03-30 16:06:03 +08:00

1 2 3 4 5 ...

295 Commits