Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							dd16818ce4 
							
						 
					 
					
						
						
							
							implement REDQ based on original contribution by @Jimenius ( #623 )  
						
						... 
						
						
						
						Co-authored-by: Minhui Li
 <limh@lamda.nju.edu.cn> 
						
						
					 
					
						2022-05-01 00:06:00 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							41afc2584a 
							
						 
					 
					
						
						
							
							Convert RL Unplugged Atari datasets to tianshou ReplayBuffer ( #621 )  
						
						
						
						
					 
					
						2022-04-29 19:33:28 +08:00 
						 
				 
			
				
					
						
							
							
								Chengqi Duan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5eab7dc218 
							
						 
					 
					
						
						
							
							Add  Atari Results ( #600 )  
						
						
						
						
					 
					
						2022-04-24 20:44:54 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5c9afe72f3 
							
						 
					 
					
						
						
							
							Update Mujoco Bemchmark's webpage ( #606 )  
						
						
						
						
					 
					
						2022-04-24 01:11:33 +08:00 
						 
				 
			
				
					
						
							
							
								Alex Nikulkov 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							92456cdb68 
							
						 
					 
					
						
						
							
							Add learning rate scheduler to BasePolicy ( #598 )  
						
						
						
						
					 
					
						2022-04-17 23:52:30 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2a9c9289e5 
							
						 
					 
					
						
						
							
							rename save_fn to save_best_fn to avoid ambiguity ( #575 )  
						
						... 
						
						
						
						This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper. 
						
						
					 
					
						2022-03-22 04:29:27 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9cb74e60c9 
							
						 
					 
					
						
						
							
							Add imitation baselines for offline RL ( #566 )  
						
						... 
						
						
						
						add imitation baselines for offline RL; make the choice of env/task and D4RL dataset explicit; on expert datasets, IL easily outperforms; after reading the D4RL paper, I'll rerun the exps on medium data 
						
						
					 
					
						2022-03-12 21:33:54 +08:00 
						 
				 
			
				
					
						
							
							
								Chengqi Duan 
							
						 
					 
					
						
						
						
						
							
						
						
							ad2e1eaea0 
							
						 
					 
					
						
						
							
							Fix WandbLogger import error in Atari examples ( #562 )  
						
						
						
						
					 
					
						2022-03-08 08:38:56 -05:00 
						 
				 
			
				
					
						
							
							
								Costa Huang 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							df3d7f582b 
							
						 
					 
					
						
						
							
							Update WandbLogger implementation ( #558 )  
						
						... 
						
						
						
						* Use `global_step` as the x-axis for wandb
* Use Tensorboard SummaryWritter as core with `wandb.init(..., sync_tensorboard=True)`
* Update all atari examples with wandb
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2022-03-07 06:40:47 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2377f2f186 
							
						 
					 
					
						
						
							
							Implement Generative Adversarial Imitation Learning (GAIL) ( #550 )  
						
						... 
						
						
						
						Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531 , #173 ) 
						
						
					 
					
						2022-03-06 23:57:15 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							97df511a13 
							
						 
					 
					
						
						
							
							Add VizDoom PPO example and results ( #533 )  
						
						... 
						
						
						
						* update vizdoom ppo example
* update README with results 
						
						
					 
					
						2022-02-25 09:33:34 +08:00 
						 
				 
			
				
					
						
							
							
								Chengqi Duan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							23fbc3b712 
							
						 
					 
					
						
						
							
							upgrade gym version to >=0.21, fix related CI and update examples/atari ( #534 )  
						
						... 
						
						
						
						Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2022-02-25 07:40:33 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d29188ee77 
							
						 
					 
					
						
						
							
							update atari ppo slots ( #529 )  
						
						
						
						
					 
					
						2022-02-13 04:04:21 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							40289b8b0e 
							
						 
					 
					
						
						
							
							Add atari ppo example ( #523 )  
						
						... 
						
						
						
						I needed a policy gradient baseline myself and it has been requested several times (#497 , #374 , #440 ). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py  as a reference for hyper-parameters.
Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in https://github.com/DLR-RM/rl-baselines3-zoo/issues/156 . 
						
						
					 
					
						2022-02-11 06:45:06 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c25926dd8f 
							
						 
					 
					
						
						
							
							Formalize variable names ( #509 )  
						
						... 
						
						
						
						Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2022-01-30 00:53:56 +08:00 
						 
				 
			
				
					
						
							
							
								Bernard Tan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bc53ead273 
							
						 
					 
					
						
						
							
							Implement CQLPolicy and offline_cql example ( #506 )  
						
						
						
						
					 
					
						2022-01-16 05:30:21 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a59d96d041 
							
						 
					 
					
						
						
							
							Add Intrinsic Curiosity Module ( #503 )  
						
						
						
						
					 
					
						2022-01-15 02:43:48 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3592f45446 
							
						 
					 
					
						
						
							
							Fix critic network for Discrete CRR ( #485 )  
						
						... 
						
						
						
						- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies;
- Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
- Add `writer.flush()` in TensorboardLogger to ensure real-time result;
- Enable `test_collector=None` in 3 trainers to turn off testing during training;
- Updates the Atari offline results in README.md;
- Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments. 
						
						
					 
					
						2021-11-28 23:10:28 +08:00 
						 
				 
			
				
					
						
							
							
								Bernard Tan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5c5a3db94e 
							
						 
					 
					
						
						
							
							Implement BCQPolicy and offline_bcq example ( #480 )  
						
						... 
						
						
						
						This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py. 
						
						
					 
					
						2021-11-22 22:21:02 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							098d466467 
							
						 
					 
					
						
						
							
							fix atari wrapper to be deterministic ( #467 )  
						
						
						
						
					 
					
						2021-10-19 22:26:11 +08:00 
						 
				 
			
				
					
						
							
							
								Ayush Chaurasia 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							22d7bf38c8 
							
						 
					 
					
						
						
							
							Improve W&B logger ( #441 )  
						
						... 
						
						
						
						- rename WandBLogger -> WandbLogger
- add save_data and restore_data
- allow more input arguments for wandb init
- integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py
- documentation update 
						
						
					 
					
						2021-09-24 21:52:23 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e8f8cdfa41 
							
						 
					 
					
						
						
							
							fix logger.write error in atari script ( #444 )  
						
						... 
						
						
						
						- fix a bug in #427 : logger.write should pass a dict
- change SubprocVectorEnv to ShmemVectorEnv in atari
- increase logger interval for eps 
						
						
					 
					
						2021-09-09 00:51:39 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fc251ab0b8 
							
						 
					 
					
						
						
							
							bump to v0.4.3 ( #432 )  
						
						... 
						
						
						
						* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check 
						
						
					 
					
						2021-09-03 05:05:04 +08:00 
						 
				 
			
				
					
						
							
							
								Andriy Drozdyuk 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8a5e2190f7 
							
						 
					 
					
						
						
							
							Add Weights and Biases Logger ( #427 )  
						
						... 
						
						
						
						- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2021-08-30 22:35:02 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							291be08d43 
							
						 
					 
					
						
						
							
							Add Rainbow DQN ( #386 )  
						
						... 
						
						
						
						- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network 
						
						
					 
					
						2021-08-29 23:34:59 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ebaca6f8da 
							
						 
					 
					
						
						
							
							add vizdoom example, bump version to 0.4.2 ( #384 )  
						
						
						
						
					 
					
						2021-06-26 18:08:41 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c0bc8e00ca 
							
						 
					 
					
						
						
							
							Add Fully-parameterized Quantile Function ( #376 )  
						
						
						
						
					 
					
						2021-06-15 11:59:02 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							21b2b22cd7 
							
						 
					 
					
						
						
							
							update iqn results and reward plots ( #377 )  
						
						
						
						
					 
					
						2021-06-10 09:05:25 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f3169b4c1f 
							
						 
					 
					
						
						
							
							Add Implicit Quantile Network ( #371 )  
						
						
						
						
					 
					
						2021-05-29 09:44:23 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							458028a326 
							
						 
					 
					
						
						
							
							fix docs ( #373 )  
						
						... 
						
						
						
						- fix css style error
- fix mujoco benchmark result 
						
						
					 
					
						2021-05-23 12:43:03 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8f7bc65ac7 
							
						 
					 
					
						
						
							
							Add discrete Critic Regularized Regression ( #367 )  
						
						
						
						
					 
					
						2021-05-19 13:29:56 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b5c3ddabfa 
							
						 
					 
					
						
						
							
							Add discrete Conservative Q-Learning for offline RL ( #359 )  
						
						... 
						
						
						
						Co-authored-by: Yi Su <yi.su@antgroup.com>
Co-authored-by: Yi Su <yi.su@antfin.com> 
						
						
					 
					
						2021-05-12 09:24:48 +08:00 
						 
				 
			
				
					
						
							
							
								Ark 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							84f58636eb 
							
						 
					 
					
						
						
							
							Make trainer resumable ( #350 )  
						
						... 
						
						
						
						- specify tensorboard >= 2.5.0
- add `save_checkpoint_fn` and `resume_from_log` in trainer
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
					 
					
						2021-05-06 08:53:53 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bbc3c3e32d 
							
						 
					 
					
						
						
							
							Add numerical analysis tool and interactive plot ( #341 )  
						
						... 
						
						
						
						Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
					 
					
						2021-04-22 12:49:54 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							844d7703c3 
							
						 
					 
					
						
						
							
							NPG Mujoco benchmark release ( #347 )  
						
						
						
						
					 
					
						2021-04-21 16:31:20 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c059f98abf 
							
						 
					 
					
						
						
							
							fix atari_bcq ( #345 )  
						
						
						
						
					 
					
						2021-04-20 22:59:21 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a57503c0aa 
							
						 
					 
					
						
						
							
							TRPO benchmark release ( #340 )  
						
						
						
						
					 
					
						2021-04-19 17:05:06 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							333b8fbd66 
							
						 
					 
					
						
						
							
							add plotter ( #335 )  
						
						... 
						
						
						
						Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
					 
					
						2021-04-14 14:06:36 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							dd4a01132c 
							
						 
					 
					
						
						
							
							Fix SAC loss explode ( #333 )  
						
						... 
						
						
						
						* change SAC action_bound_method to "clip" (tanh is hardcoded in forward)
* docstring update
* modelbase -> modelbased 
						
						
					 
					
						2021-04-04 17:33:35 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6426a39796 
							
						 
					 
					
						
						
							
							ppo benchmark ( #330 )  
						
						
						
						
					 
					
						2021-03-30 11:50:35 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1730a9008a 
							
						 
					 
					
						
						
							
							A2C benchmark for mujoco ( #325 )  
						
						
						
						
					 
					
						2021-03-28 13:12:43 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3ac67d9974 
							
						 
					 
					
						
						
							
							refactor A2C/PPO, change behavior of value normalization ( #321 )  
						
						
						
						
					 
					
						2021-03-25 10:12:39 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							47c77899d5 
							
						 
					 
					
						
						
							
							Add REINFORCE benchmark for mujoco ( #320 )  
						
						
						
						
					 
					
						2021-03-24 19:59:53 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4d92952a7b 
							
						 
					 
					
						
						
							
							Remap action to fit gym's action space ( #313 )  
						
						... 
						
						
						
						Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
					 
					
						2021-03-21 16:45:50 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e605bdea94 
							
						 
					 
					
						
						
							
							MuJoCo Benchmark - DDPG, TD3, SAC ( #305 )  
						
						... 
						
						
						
						Releasing Tianshou's SOTA benchmark of 9 out of 13 environments from the MuJoCo Gym task suite. 
						
						
					 
					
						2021-03-07 19:21:02 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							31e7f445d1 
							
						 
					 
					
						
						
							
							fix vecenv action_space randomness ( #300 )  
						
						
						
						
					 
					
						2021-03-01 15:44:03 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f22b539761 
							
						 
					 
					
						
						
							
							Remove reward_normaliztion option in offpolicy algorithm ( #298 )  
						
						... 
						
						
						
						* remove rew_norm in nstep implementation
* improve test
* remove runnable/
* various doc fix
Co-authored-by: n+e <trinkle23897@gmail.com> 
						
						
					 
					
						2021-02-27 11:20:43 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3108b9db0d 
							
						 
					 
					
						
						
							
							Add Timelimit trick to optimize policies ( #296 )  
						
						... 
						
						
						
						* consider timelimit.truncated in calculating returns by default
* remove ignore_done 
						
						
					 
					
						2021-02-26 13:23:18 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
						
						
							
						
						
							9b61bc620c 
							
						 
					 
					
						
						
							
							add logger ( #295 )  
						
						... 
						
						
						
						This PR focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally.
Things changed:
1. trainer now accepts logger (BasicLogger or LazyLogger) instead of writer;
2. remove utils.SummaryWriter; 
						
						
					 
					
						2021-02-24 14:48:42 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7036073649 
							
						 
					 
					
						
						
							
							Trainer refactor : some definition change ( #293 )  
						
						... 
						
						
						
						This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation. 
						
						
					 
					
						2021-02-21 13:06:02 +08:00