Tianshou

Author	SHA1	Message	Date
ChenDRAG	c25926dd8f	Formalize variable names (#509 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-01-30 00:53:56 +08:00
Bernard Tan	bc53ead273	Implement CQLPolicy and offline_cql example (#506 )	2022-01-16 05:30:21 +08:00
Yi Su	a59d96d041	Add Intrinsic Curiosity Module (#503 )	2022-01-15 02:43:48 +08:00
Markus28	a2d76d1276	Remove reset_buffer() from reset method (#501 )	2022-01-12 16:46:28 -08:00
Yi Su	3592f45446	Fix critic network for Discrete CRR (#485 ) - Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies; - Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic; - Add `writer.flush()` in TensorboardLogger to ensure real-time result; - Enable `test_collector=None` in 3 trainers to turn off testing during training; - Updates the Atari offline results in README.md; - Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments. v0.4.5	2021-11-28 23:10:28 +08:00
Bernard Tan	5c5a3db94e	Implement BCQPolicy and offline_bcq example (#480 ) This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning). Example usage is in the examples/offline/offline_bcq.py.	2021-11-22 22:21:02 +08:00
Jiayi Weng	94d3b27db9	fix tqdm issue (#481 )	2021-11-19 00:17:44 +08:00
Markus28	8f19a86966	Implements set_env_attr and get_env_attr for vector environments (#478 ) close #473	2021-11-03 00:08:00 +08:00
Jiayi Weng	098d466467	fix atari wrapper to be deterministic (#467 )	2021-10-19 22:26:11 +08:00
Jiayi Weng	b9eedc516e	bump to 0.4.4 v0.4.4	2021-10-13 12:22:24 -04:00
Ayush Chaurasia	63d752ee0b	W&B: Add usage in the docs (#463 )	2021-10-13 23:28:25 +08:00
Jiayi Weng	926ec0b9b1	update save_fn in trainer (#459 ) - collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger) - save_fn() will be called at the beginning of trainer	2021-10-13 21:25:24 +08:00
Jiayi Weng	e45e2096d8	add multi-GPU support (#461 ) add a new class DataParallelNet	2021-10-06 01:39:14 +08:00
Jiayi Weng	5df64800f4	final fix for actor_critic shared head parameters (#458 )	2021-10-04 23:19:07 +08:00
Ayush Chaurasia	22d7bf38c8	Improve W&B logger (#441 ) - rename WandBLogger -> WandbLogger - add save_data and restore_data - allow more input arguments for wandb init - integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py - documentation update	2021-09-24 21:52:23 +08:00
Jiayi Weng	e8f8cdfa41	fix logger.write error in atari script (#444 ) - fix a bug in #427: logger.write should pass a dict - change SubprocVectorEnv to ShmemVectorEnv in atari - increase logger interval for eps	2021-09-09 00:51:39 +08:00
n+e	fc251ab0b8	bump to v0.4.3 (#432 ) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check v0.4.3	2021-09-03 05:05:04 +08:00
Ending Hsiao	a740496a51	fix dual clip implementation (#435 ) close #433	2021-09-02 21:43:14 +08:00
Andriy Drozdyuk	8a5e2190f7	Add Weights and Biases Logger (#427 ) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2021-08-30 22:35:02 +08:00
n+e	e4f4f0e144	fix docs build failure and a bug in a2c/ppo optimizer (#428 ) * fix rtfd build * list + list -> set.union * change seed of test_qrdqn * add py39 test	2021-08-30 02:07:03 +08:00
Yi Su	291be08d43	Add Rainbow DQN (#386 ) - add RainbowPolicy - add `set_beta` method in prio_buffer - add NoisyLinear in utils/network	2021-08-29 23:34:59 +08:00
Andriy Drozdyuk	d161059c3d	Replaced indice by plural indices (#422 )	2021-08-20 21:58:44 +08:00
deeplook	728b88b92d	Fix conda install command (#419 )	2021-08-16 18:56:01 +08:00
n+e	5b7732a29b	make ppo discrete test script more general (#418 )	2021-08-15 21:37:37 +08:00
n+e	bba30f83d1	fix sb2's coverage (#412 )	2021-08-10 17:43:27 +08:00
Miguel Morales	42538f8e58	Update README.md (#410 )	2021-08-10 09:14:20 +08:00
ChenDRAG	0674ff628a	Cite Tianshou's latest paper (#406 ) * Cite Tianshou's latest paper * update new version README * change order Co-authored-by: Jiayi Weng <wengj@sea.com>	2021-08-10 08:35:01 +08:00
Andriy Drozdyuk	18d2f25eff	Remove warnings about the use of save_fn across trainers (#408 )	2021-08-04 09:56:00 +08:00
n+e	c19876179a	add env_id in preprocess fn (#391 )	2021-07-05 09:50:39 +08:00
n+e	ebaca6f8da	add vizdoom example, bump version to 0.4.2 (#384 ) v0.4.2	2021-06-26 18:08:41 +08:00
Yi Su	c0bc8e00ca	Add Fully-parameterized Quantile Function (#376 )	2021-06-15 11:59:02 +08:00
Yi Su	21b2b22cd7	update iqn results and reward plots (#377 )	2021-06-10 09:05:25 +08:00
Yi Su	f3169b4c1f	Add Implicit Quantile Network (#371 )	2021-05-29 09:44:23 +08:00
n+e	458028a326	fix docs (#373 ) - fix css style error - fix mujoco benchmark result	2021-05-23 12:43:03 +08:00
Ark	655d5fb14f	Allow researchers to choose whether to use Double DQN (#368 )	2021-05-21 10:53:34 +08:00
Yi Su	8f7bc65ac7	Add discrete Critic Regularized Regression (#367 )	2021-05-19 13:29:56 +08:00
Yi Su	b5c3ddabfa	Add discrete Conservative Q-Learning for offline RL (#359 ) Co-authored-by: Yi Su <yi.su@antgroup.com> Co-authored-by: Yi Su <yi.su@antfin.com>	2021-05-12 09:24:48 +08:00
Ark	84f58636eb	Make trainer resumable (#350 ) - specify tensorboard >= 2.5.0 - add `save_checkpoint_fn` and `resume_from_log` in trainer Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-05-06 08:53:53 +08:00
Yuge Zhang	f4e05d585a	Support deterministic evaluation for onpolicy algorithms (#354 )	2021-04-27 21:22:39 +08:00
n+e	ff4d3cd714	Support different state size and fix exception in venv.__del__ (#352 ) - Batch: do not raise error when it finds list of np.array with different shape[0]. - Venv's obs: add try...except block for np.stack(obs_list) - remove venv.__del__ since it is buggy	2021-04-25 15:23:46 +08:00
ChenDRAG	bbc3c3e32d	Add numerical analysis tool and interactive plot (#341 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-22 12:49:54 +08:00
ChenDRAG	844d7703c3	NPG Mujoco benchmark release (#347 )	2021-04-21 16:31:20 +08:00
ChenDRAG	1dcf65fe21	Add NPG policy (#344 )	2021-04-21 09:52:15 +08:00
n+e	c059f98abf	fix atari_bcq (#345 )	2021-04-20 22:59:21 +08:00
ChenDRAG	a57503c0aa	TRPO benchmark release (#340 )	2021-04-19 17:05:06 +08:00
n+e	f68cb78ed7	Add self-hosted runner for GPU checks (#339 )	2021-04-18 16:57:37 +08:00
ChenDRAG	5057b5c89e	Add TRPO policy (#337 )	2021-04-16 20:37:12 +08:00
ChenDRAG	333b8fbd66	add plotter (#335 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-14 14:06:36 +08:00
ChenDRAG	dd4a01132c	Fix SAC loss explode (#333 ) * change SAC action_bound_method to "clip" (tanh is hardcoded in forward) * docstring update * modelbase -> modelbased v0.4.1	2021-04-04 17:33:35 +08:00
n+e	825da9bc53	add cross-platform test and release 0.4.1 (#331 ) * bump to 0.4.1 * add cross-platform test	2021-03-31 15:14:22 +08:00

... 3 4 5 6 7 ...

496 Commits