Bernard Tan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bc53ead273 
							
						 
					 
					
						
						
							
							Implement CQLPolicy and offline_cql example ( #506 )  
						
						 
						
						
						
						
							
						
					 
					
						2022-01-16 05:30:21 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a59d96d041 
							
						 
					 
					
						
						
							
							Add Intrinsic Curiosity Module ( #503 )  
						
						 
						
						
						
						
							
						
					 
					
						2022-01-15 02:43:48 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Markus28 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a2d76d1276 
							
						 
					 
					
						
						
							
							Remove reset_buffer() from reset method ( #501 )  
						
						 
						
						
						
						
							
						
					 
					
						2022-01-12 16:46:28 -08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3592f45446 
							
						 
					 
					
						
						
							
							Fix critic network for Discrete CRR ( #485 )  
						
						 
						
						... 
						
						
						
						- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies;
- Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
- Add `writer.flush()` in TensorboardLogger to ensure real-time result;
- Enable `test_collector=None` in 3 trainers to turn off testing during training;
- Updates the Atari offline results in README.md;
- Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments. 
						
						
							
  v0.4.5
 
						
					 
					
						2021-11-28 23:10:28 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Bernard Tan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5c5a3db94e 
							
						 
					 
					
						
						
							
							Implement BCQPolicy and offline_bcq example ( #480 )  
						
						 
						
						... 
						
						
						
						This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py. 
						
						
							
						
					 
					
						2021-11-22 22:21:02 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							94d3b27db9 
							
						 
					 
					
						
						
							
							fix tqdm issue ( #481 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-11-19 00:17:44 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Markus28 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8f19a86966 
							
						 
					 
					
						
						
							
							Implements set_env_attr and get_env_attr for vector environments ( #478 )  
						
						 
						
						... 
						
						
						
						close  #473  
						
						
							
						
					 
					
						2021-11-03 00:08:00 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							098d466467 
							
						 
					 
					
						
						
							
							fix atari wrapper to be deterministic ( #467 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-10-19 22:26:11 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jiayi Weng 
							
						 
					 
					
						
						
						
						
							
						
						
							b9eedc516e 
							
						 
					 
					
						
						
							
							bump to 0.4.4  
						
						 
						
						
						
						
							
  v0.4.4
 
						
					 
					
						2021-10-13 12:22:24 -04:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ayush Chaurasia 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							63d752ee0b 
							
						 
					 
					
						
						
							
							W&B: Add usage in the docs ( #463 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-10-13 23:28:25 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							926ec0b9b1 
							
						 
					 
					
						
						
							
							update save_fn in trainer ( #459 )  
						
						 
						
						... 
						
						
						
						- collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger)
- save_fn() will be called at the beginning of trainer 
						
						
							
						
					 
					
						2021-10-13 21:25:24 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e45e2096d8 
							
						 
					 
					
						
						
							
							add multi-GPU support ( #461 )  
						
						 
						
						... 
						
						
						
						add a new class DataParallelNet 
						
						
							
						
					 
					
						2021-10-06 01:39:14 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5df64800f4 
							
						 
					 
					
						
						
							
							final fix for actor_critic shared head parameters ( #458 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-10-04 23:19:07 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ayush Chaurasia 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							22d7bf38c8 
							
						 
					 
					
						
						
							
							Improve W&B logger ( #441 )  
						
						 
						
						... 
						
						
						
						- rename WandBLogger -> WandbLogger
- add save_data and restore_data
- allow more input arguments for wandb init
- integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py
- documentation update 
						
						
							
						
					 
					
						2021-09-24 21:52:23 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e8f8cdfa41 
							
						 
					 
					
						
						
							
							fix logger.write error in atari script ( #444 )  
						
						 
						
						... 
						
						
						
						- fix a bug in #427 : logger.write should pass a dict
- change SubprocVectorEnv to ShmemVectorEnv in atari
- increase logger interval for eps 
						
						
							
						
					 
					
						2021-09-09 00:51:39 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fc251ab0b8 
							
						 
					 
					
						
						
							
							bump to v0.4.3 ( #432 )  
						
						 
						
						... 
						
						
						
						* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check 
						
						
							
  v0.4.3
 
						
					 
					
						2021-09-03 05:05:04 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ending Hsiao 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a740496a51 
							
						 
					 
					
						
						
							
							fix dual clip implementation ( #435 )  
						
						 
						
						... 
						
						
						
						close  #433  
						
						
							
						
					 
					
						2021-09-02 21:43:14 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Andriy Drozdyuk 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8a5e2190f7 
							
						 
					 
					
						
						
							
							Add Weights and Biases Logger ( #427 )  
						
						 
						
						... 
						
						
						
						- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
							
						
					 
					
						2021-08-30 22:35:02 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e4f4f0e144 
							
						 
					 
					
						
						
							
							fix docs build failure and a bug in a2c/ppo optimizer ( #428 )  
						
						 
						
						... 
						
						
						
						* fix rtfd build
* list + list -> set.union
* change seed of test_qrdqn
* add py39 test 
						
						
							
						
					 
					
						2021-08-30 02:07:03 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							291be08d43 
							
						 
					 
					
						
						
							
							Add Rainbow DQN ( #386 )  
						
						 
						
						... 
						
						
						
						- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network 
						
						
							
						
					 
					
						2021-08-29 23:34:59 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Andriy Drozdyuk 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d161059c3d 
							
						 
					 
					
						
						
							
							Replaced indice by plural indices ( #422 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-08-20 21:58:44 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								deeplook 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							728b88b92d 
							
						 
					 
					
						
						
							
							Fix conda install command ( #419 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-08-16 18:56:01 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5b7732a29b 
							
						 
					 
					
						
						
							
							make ppo discrete test script more general ( #418 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-08-15 21:37:37 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bba30f83d1 
							
						 
					 
					
						
						
							
							fix sb2's coverage ( #412 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-08-10 17:43:27 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Miguel Morales 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							42538f8e58 
							
						 
					 
					
						
						
							
							Update README.md ( #410 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-08-10 09:14:20 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0674ff628a 
							
						 
					 
					
						
						
							
							Cite Tianshou's latest paper ( #406 )  
						
						 
						
						... 
						
						
						
						* Cite Tianshou's latest paper
* update new version README
* change order
Co-authored-by: Jiayi Weng <wengj@sea.com> 
						
						
							
						
					 
					
						2021-08-10 08:35:01 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Andriy Drozdyuk 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							18d2f25eff 
							
						 
					 
					
						
						
							
							Remove warnings about the use of save_fn across trainers ( #408 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-08-04 09:56:00 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c19876179a 
							
						 
					 
					
						
						
							
							add env_id in preprocess fn ( #391 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-07-05 09:50:39 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ebaca6f8da 
							
						 
					 
					
						
						
							
							add vizdoom example, bump version to 0.4.2 ( #384 )  
						
						 
						
						
						
						
							
  v0.4.2
 
						
					 
					
						2021-06-26 18:08:41 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c0bc8e00ca 
							
						 
					 
					
						
						
							
							Add Fully-parameterized Quantile Function ( #376 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-06-15 11:59:02 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							21b2b22cd7 
							
						 
					 
					
						
						
							
							update iqn results and reward plots ( #377 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-06-10 09:05:25 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f3169b4c1f 
							
						 
					 
					
						
						
							
							Add Implicit Quantile Network ( #371 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-05-29 09:44:23 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							458028a326 
							
						 
					 
					
						
						
							
							fix docs ( #373 )  
						
						 
						
						... 
						
						
						
						- fix css style error
- fix mujoco benchmark result 
						
						
							
						
					 
					
						2021-05-23 12:43:03 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ark 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							655d5fb14f 
							
						 
					 
					
						
						
							
							Allow researchers to choose whether to use Double DQN ( #368 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-05-21 10:53:34 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8f7bc65ac7 
							
						 
					 
					
						
						
							
							Add discrete Critic Regularized Regression ( #367 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-05-19 13:29:56 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b5c3ddabfa 
							
						 
					 
					
						
						
							
							Add discrete Conservative Q-Learning for offline RL ( #359 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Yi Su <yi.su@antgroup.com>
Co-authored-by: Yi Su <yi.su@antfin.com> 
						
						
							
						
					 
					
						2021-05-12 09:24:48 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ark 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							84f58636eb 
							
						 
					 
					
						
						
							
							Make trainer resumable ( #350 )  
						
						 
						
						... 
						
						
						
						- specify tensorboard >= 2.5.0
- add `save_checkpoint_fn` and `resume_from_log` in trainer
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
							
						
					 
					
						2021-05-06 08:53:53 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yuge Zhang 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f4e05d585a 
							
						 
					 
					
						
						
							
							Support deterministic evaluation for onpolicy algorithms ( #354 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-04-27 21:22:39 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ff4d3cd714 
							
						 
					 
					
						
						
							
							Support different state size and fix exception in venv.__del__ ( #352 )  
						
						 
						
						... 
						
						
						
						- Batch: do not raise error when it finds list of np.array with different shape[0].
- Venv's obs: add try...except block for np.stack(obs_list)
- remove venv.__del__ since it is buggy 
						
						
							
						
					 
					
						2021-04-25 15:23:46 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bbc3c3e32d 
							
						 
					 
					
						
						
							
							Add numerical analysis tool and interactive plot ( #341 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
							
						
					 
					
						2021-04-22 12:49:54 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							844d7703c3 
							
						 
					 
					
						
						
							
							NPG Mujoco benchmark release ( #347 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-04-21 16:31:20 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1dcf65fe21 
							
						 
					 
					
						
						
							
							Add NPG policy ( #344 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-04-21 09:52:15 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c059f98abf 
							
						 
					 
					
						
						
							
							fix atari_bcq ( #345 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-04-20 22:59:21 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a57503c0aa 
							
						 
					 
					
						
						
							
							TRPO benchmark release ( #340 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-04-19 17:05:06 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f68cb78ed7 
							
						 
					 
					
						
						
							
							Add self-hosted runner for GPU checks ( #339 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-04-18 16:57:37 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5057b5c89e 
							
						 
					 
					
						
						
							
							Add TRPO policy ( #337 )  
						
						 
						
						
						
						
							
						
					 
					
						2021-04-16 20:37:12 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							333b8fbd66 
							
						 
					 
					
						
						
							
							add plotter ( #335 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
							
						
					 
					
						2021-04-14 14:06:36 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							dd4a01132c 
							
						 
					 
					
						
						
							
							Fix SAC loss explode ( #333 )  
						
						 
						
						... 
						
						
						
						* change SAC action_bound_method to "clip" (tanh is hardcoded in forward)
* docstring update
* modelbase -> modelbased 
						
						
							
  v0.4.1
 
						
					 
					
						2021-04-04 17:33:35 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							825da9bc53 
							
						 
					 
					
						
						
							
							add cross-platform test and release 0.4.1 ( #331 )  
						
						 
						
						... 
						
						
						
						* bump to 0.4.1
* add cross-platform test 
						
						
							
						
					 
					
						2021-03-31 15:14:22 +08:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							09692c84fe 
							
						 
					 
					
						
						
							
							fix numpy>=1.20 typing check ( #323 )  
						
						 
						
						... 
						
						
						
						Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list). 
						
						
							
						
					 
					
						2021-03-30 16:06:03 +08:00