Tianshou

Author	SHA1	Message	Date
Alex Nikulkov	92456cdb68	Add learning rate scheduler to BasePolicy (#598 )	2022-04-17 23:52:30 +08:00
Jiayi Weng	2a9c9289e5	rename save_fn to save_best_fn to avoid ambiguity (#575 ) This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper.	2022-03-22 04:29:27 +08:00
Yi Su	9cb74e60c9	Add imitation baselines for offline RL (#566 ) add imitation baselines for offline RL; make the choice of env/task and D4RL dataset explicit; on expert datasets, IL easily outperforms; after reading the D4RL paper, I'll rerun the exps on medium data	2022-03-12 21:33:54 +08:00
Chengqi Duan	ad2e1eaea0	Fix WandbLogger import error in Atari examples (#562 )	2022-03-08 08:38:56 -05:00
Costa Huang	df3d7f582b	Update WandbLogger implementation (#558 ) * Use `global_step` as the x-axis for wandb * Use Tensorboard SummaryWritter as core with `wandb.init(..., sync_tensorboard=True)` * Update all atari examples with wandb Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-03-07 06:40:47 +08:00
Yi Su	2377f2f186	Implement Generative Adversarial Imitation Learning (GAIL) (#550 ) Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531, #173)	2022-03-06 23:57:15 +08:00
Yi Su	97df511a13	Add VizDoom PPO example and results (#533 ) * update vizdoom ppo example * update README with results	2022-02-25 09:33:34 +08:00
Chengqi Duan	23fbc3b712	upgrade gym version to >=0.21, fix related CI and update examples/atari (#534 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-02-25 07:40:33 +08:00
Yi Su	d29188ee77	update atari ppo slots (#529 )	2022-02-13 04:04:21 +08:00
Yi Su	40289b8b0e	Add atari ppo example (#523 ) I needed a policy gradient baseline myself and it has been requested several times (#497, #374, #440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters. Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in https://github.com/DLR-RM/rl-baselines3-zoo/issues/156.	2022-02-11 06:45:06 +08:00
ChenDRAG	c25926dd8f	Formalize variable names (#509 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-01-30 00:53:56 +08:00
Bernard Tan	bc53ead273	Implement CQLPolicy and offline_cql example (#506 )	2022-01-16 05:30:21 +08:00
Yi Su	a59d96d041	Add Intrinsic Curiosity Module (#503 )	2022-01-15 02:43:48 +08:00
Yi Su	3592f45446	Fix critic network for Discrete CRR (#485 ) - Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies; - Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic; - Add `writer.flush()` in TensorboardLogger to ensure real-time result; - Enable `test_collector=None` in 3 trainers to turn off testing during training; - Updates the Atari offline results in README.md; - Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments.	2021-11-28 23:10:28 +08:00
Bernard Tan	5c5a3db94e	Implement BCQPolicy and offline_bcq example (#480 ) This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning). Example usage is in the examples/offline/offline_bcq.py.	2021-11-22 22:21:02 +08:00
Jiayi Weng	098d466467	fix atari wrapper to be deterministic (#467 )	2021-10-19 22:26:11 +08:00
Ayush Chaurasia	22d7bf38c8	Improve W&B logger (#441 ) - rename WandBLogger -> WandbLogger - add save_data and restore_data - allow more input arguments for wandb init - integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py - documentation update	2021-09-24 21:52:23 +08:00
Jiayi Weng	e8f8cdfa41	fix logger.write error in atari script (#444 ) - fix a bug in #427: logger.write should pass a dict - change SubprocVectorEnv to ShmemVectorEnv in atari - increase logger interval for eps	2021-09-09 00:51:39 +08:00
n+e	fc251ab0b8	bump to v0.4.3 (#432 ) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check	2021-09-03 05:05:04 +08:00
Andriy Drozdyuk	8a5e2190f7	Add Weights and Biases Logger (#427 ) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2021-08-30 22:35:02 +08:00
Yi Su	291be08d43	Add Rainbow DQN (#386 ) - add RainbowPolicy - add `set_beta` method in prio_buffer - add NoisyLinear in utils/network	2021-08-29 23:34:59 +08:00
n+e	ebaca6f8da	add vizdoom example, bump version to 0.4.2 (#384 )	2021-06-26 18:08:41 +08:00
Yi Su	c0bc8e00ca	Add Fully-parameterized Quantile Function (#376 )	2021-06-15 11:59:02 +08:00
Yi Su	21b2b22cd7	update iqn results and reward plots (#377 )	2021-06-10 09:05:25 +08:00
Yi Su	f3169b4c1f	Add Implicit Quantile Network (#371 )	2021-05-29 09:44:23 +08:00
n+e	458028a326	fix docs (#373 ) - fix css style error - fix mujoco benchmark result	2021-05-23 12:43:03 +08:00
Yi Su	8f7bc65ac7	Add discrete Critic Regularized Regression (#367 )	2021-05-19 13:29:56 +08:00
Yi Su	b5c3ddabfa	Add discrete Conservative Q-Learning for offline RL (#359 ) Co-authored-by: Yi Su <yi.su@antgroup.com> Co-authored-by: Yi Su <yi.su@antfin.com>	2021-05-12 09:24:48 +08:00
Ark	84f58636eb	Make trainer resumable (#350 ) - specify tensorboard >= 2.5.0 - add `save_checkpoint_fn` and `resume_from_log` in trainer Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-05-06 08:53:53 +08:00
ChenDRAG	bbc3c3e32d	Add numerical analysis tool and interactive plot (#341 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-22 12:49:54 +08:00
ChenDRAG	844d7703c3	NPG Mujoco benchmark release (#347 )	2021-04-21 16:31:20 +08:00
n+e	c059f98abf	fix atari_bcq (#345 )	2021-04-20 22:59:21 +08:00
ChenDRAG	a57503c0aa	TRPO benchmark release (#340 )	2021-04-19 17:05:06 +08:00
ChenDRAG	333b8fbd66	add plotter (#335 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-14 14:06:36 +08:00
ChenDRAG	dd4a01132c	Fix SAC loss explode (#333 ) * change SAC action_bound_method to "clip" (tanh is hardcoded in forward) * docstring update * modelbase -> modelbased	2021-04-04 17:33:35 +08:00
ChenDRAG	6426a39796	ppo benchmark (#330 )	2021-03-30 11:50:35 +08:00
ChenDRAG	1730a9008a	A2C benchmark for mujoco (#325 )	2021-03-28 13:12:43 +08:00
ChenDRAG	3ac67d9974	refactor A2C/PPO, change behavior of value normalization (#321 )	2021-03-25 10:12:39 +08:00
ChenDRAG	47c77899d5	Add REINFORCE benchmark for mujoco (#320 )	2021-03-24 19:59:53 +08:00
ChenDRAG	4d92952a7b	Remap action to fit gym's action space (#313 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-03-21 16:45:50 +08:00
ChenDRAG	e605bdea94	MuJoCo Benchmark - DDPG, TD3, SAC (#305 ) Releasing Tianshou's SOTA benchmark of 9 out of 13 environments from the MuJoCo Gym task suite.	2021-03-07 19:21:02 +08:00
n+e	31e7f445d1	fix vecenv action_space randomness (#300 )	2021-03-01 15:44:03 +08:00
ChenDRAG	f22b539761	Remove reward_normaliztion option in offpolicy algorithm (#298 ) * remove rew_norm in nstep implementation * improve test * remove runnable/ * various doc fix Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-27 11:20:43 +08:00
ChenDRAG	3108b9db0d	Add Timelimit trick to optimize policies (#296 ) * consider timelimit.truncated in calculating returns by default * remove ignore_done	2021-02-26 13:23:18 +08:00
ChenDRAG	9b61bc620c	add logger (#295 ) This PR focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally. Things changed: 1. trainer now accepts logger (BasicLogger or LazyLogger) instead of writer; 2. remove utils.SummaryWriter;	2021-02-24 14:48:42 +08:00
ChenDRAG	7036073649	Trainer refactor : some definition change (#293 ) This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.	2021-02-21 13:06:02 +08:00
ChenDRAG	150d0ec51b	Step collector implementation (#280 ) This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-19 10:33:49 +08:00
ChenDRAG	f528131da1	hotfix：fix test failure in cuda environment (#289 )	2021-02-09 17:13:40 +08:00
wizardsheng	1eb6137645	Add QR-DQN algorithm (#276 ) This is the PR for QR-DQN algorithm: https://arxiv.org/abs/1710.10044 1. add QR-DQN policy in tianshou/policy/modelfree/qrdqn.py. 2. add QR-DQN net in examples/atari/atari_network.py. 3. add QR-DQN atari example in examples/atari/atari_qrdqn.py. 4. add QR-DQN statement in tianshou/policy/init.py. 5. add QR-DQN unit test in test/discrete/test_qrdqn.py. 6. add QR-DQN atari results in examples/atari/results/qrdqn/. 7. add compute_q_value in DQNPolicy and C51Policy for simplify forward function. 8. move `with torch.no_grad():` from `_target_q` to BasePolicy By running "python3 atari_qrdqn.py --task "PongNoFrameskip-v4" --batch-size 64", get best_result': '19.8 ± 0.40', in epoch 8.	2021-01-28 09:27:05 +08:00
Jialu Zhu	a511cb4779	Add offline trainer and discrete BCQ algorithm (#263 ) The result needs to be tuned after `done` issue fixed. Co-authored-by: n+e <trinkle23897@gmail.com>	2021-01-20 18:13:04 +08:00

1 2

83 Commits