Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d29188ee77 
							
						 
					 
					
						
						
							
							update atari ppo slots ( #529 )  
						
						
						
						
							
						
					 
					
						2022-02-13 04:04:21 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							40289b8b0e 
							
						 
					 
					
						
						
							
							Add atari ppo example ( #523 )  
						
						... 
						
						
						
						I needed a policy gradient baseline myself and it has been requested several times (#497 , #374 , #440 ). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py  as a reference for hyper-parameters.
Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in https://github.com/DLR-RM/rl-baselines3-zoo/issues/156 . 
						
						
							
						
					 
					
						2022-02-11 06:45:06 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3d697aa4c6 
							
						 
					 
					
						
						
							
							make unit test faster ( #522 )  
						
						... 
						
						
						
						* test cache expert data in offline training
* faster cql test
* faster tests
* use dummy
* test ray dependency 
						
						
							
						
					 
					
						2022-02-09 00:24:52 +08:00 
						 
				 
			
				
					
						
							
							
								Chengqi Duan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9c100e0705 
							
						 
					 
					
						
						
							
							Enable venvs.reset() concurrent execution ( #517 )  
						
						... 
						
						
						
						- change the internal API name of worker: send_action -> send, get_result -> recv (align with envpool)
- add a timing test for venvs.reset() to make sure the concurrent execution
- change venvs.reset() logic
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
							
						
					 
					
						2022-02-08 00:40:01 +08:00 
						 
				 
			
				
					
						
							
							
								Kenneth Schröder 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							cd7654bfd5 
							
						 
					 
					
						
						
							
							Fixing casts to int by to_torch_as(...) calls in policies when using discrete actions ( #521 )  
						
						
						
						
							
						
					 
					
						2022-02-07 03:42:46 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c25926dd8f 
							
						 
					 
					
						
						
							
							Formalize variable names ( #509 )  
						
						... 
						
						
						
						Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
							
						
					 
					
						2022-01-30 00:53:56 +08:00 
						 
				 
			
				
					
						
							
							
								Bernard Tan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bc53ead273 
							
						 
					 
					
						
						
							
							Implement CQLPolicy and offline_cql example ( #506 )  
						
						
						
						
							
						
					 
					
						2022-01-16 05:30:21 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a59d96d041 
							
						 
					 
					
						
						
							
							Add Intrinsic Curiosity Module ( #503 )  
						
						
						
						
							
						
					 
					
						2022-01-15 02:43:48 +08:00 
						 
				 
			
				
					
						
							
							
								Markus28 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a2d76d1276 
							
						 
					 
					
						
						
							
							Remove reset_buffer() from reset method ( #501 )  
						
						
						
						
							
						
					 
					
						2022-01-12 16:46:28 -08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3592f45446 
							
						 
					 
					
						
						
							
							Fix critic network for Discrete CRR ( #485 )  
						
						... 
						
						
						
						- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies;
- Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
- Add `writer.flush()` in TensorboardLogger to ensure real-time result;
- Enable `test_collector=None` in 3 trainers to turn off testing during training;
- Updates the Atari offline results in README.md;
- Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments. 
						
						
							
 
						
					 
					
						2021-11-28 23:10:28 +08:00 
						 
				 
			
				
					
						
							
							
								Bernard Tan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5c5a3db94e 
							
						 
					 
					
						
						
							
							Implement BCQPolicy and offline_bcq example ( #480 )  
						
						... 
						
						
						
						This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py. 
						
						
							
						
					 
					
						2021-11-22 22:21:02 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							94d3b27db9 
							
						 
					 
					
						
						
							
							fix tqdm issue ( #481 )  
						
						
						
						
							
						
					 
					
						2021-11-19 00:17:44 +08:00 
						 
				 
			
				
					
						
							
							
								Markus28 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8f19a86966 
							
						 
					 
					
						
						
							
							Implements set_env_attr and get_env_attr for vector environments ( #478 )  
						
						... 
						
						
						
						close  #473  
					
						2021-11-03 00:08:00 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							098d466467 
							
						 
					 
					
						
						
							
							fix atari wrapper to be deterministic ( #467 )  
						
						
						
						
							
						
					 
					
						2021-10-19 22:26:11 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
						
						
							
						
						
							b9eedc516e 
							
						 
					 
					
						
						
							
							bump to 0.4.4  
						
						
						
						
							
 
						
					 
					
						2021-10-13 12:22:24 -04:00 
						 
				 
			
				
					
						
							
							
								Ayush Chaurasia 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							63d752ee0b 
							
						 
					 
					
						
						
							
							W&B: Add usage in the docs ( #463 )  
						
						
						
						
							
						
					 
					
						2021-10-13 23:28:25 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							926ec0b9b1 
							
						 
					 
					
						
						
							
							update save_fn in trainer ( #459 )  
						
						... 
						
						
						
						- collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger)
- save_fn() will be called at the beginning of trainer 
						
						
							
						
					 
					
						2021-10-13 21:25:24 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e45e2096d8 
							
						 
					 
					
						
						
							
							add multi-GPU support ( #461 )  
						
						... 
						
						
						
						add a new class DataParallelNet 
						
						
							
						
					 
					
						2021-10-06 01:39:14 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5df64800f4 
							
						 
					 
					
						
						
							
							final fix for actor_critic shared head parameters ( #458 )  
						
						
						
						
							
						
					 
					
						2021-10-04 23:19:07 +08:00 
						 
				 
			
				
					
						
							
							
								Ayush Chaurasia 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							22d7bf38c8 
							
						 
					 
					
						
						
							
							Improve W&B logger ( #441 )  
						
						... 
						
						
						
						- rename WandBLogger -> WandbLogger
- add save_data and restore_data
- allow more input arguments for wandb init
- integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py
- documentation update 
						
						
							
						
					 
					
						2021-09-24 21:52:23 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e8f8cdfa41 
							
						 
					 
					
						
						
							
							fix logger.write error in atari script ( #444 )  
						
						... 
						
						
						
						- fix a bug in #427 : logger.write should pass a dict
- change SubprocVectorEnv to ShmemVectorEnv in atari
- increase logger interval for eps 
						
						
							
						
					 
					
						2021-09-09 00:51:39 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fc251ab0b8 
							
						 
					 
					
						
						
							
							bump to v0.4.3 ( #432 )  
						
						... 
						
						
						
						* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check 
						
						
							
 
						
					 
					
						2021-09-03 05:05:04 +08:00 
						 
				 
			
				
					
						
							
							
								Ending Hsiao 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a740496a51 
							
						 
					 
					
						
						
							
							fix dual clip implementation ( #435 )  
						
						... 
						
						
						
						close  #433  
					
						2021-09-02 21:43:14 +08:00 
						 
				 
			
				
					
						
							
							
								Andriy Drozdyuk 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8a5e2190f7 
							
						 
					 
					
						
						
							
							Add Weights and Biases Logger ( #427 )  
						
						... 
						
						
						
						- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
							
						
					 
					
						2021-08-30 22:35:02 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e4f4f0e144 
							
						 
					 
					
						
						
							
							fix docs build failure and a bug in a2c/ppo optimizer ( #428 )  
						
						... 
						
						
						
						* fix rtfd build
* list + list -> set.union
* change seed of test_qrdqn
* add py39 test 
						
						
							
						
					 
					
						2021-08-30 02:07:03 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							291be08d43 
							
						 
					 
					
						
						
							
							Add Rainbow DQN ( #386 )  
						
						... 
						
						
						
						- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network 
						
						
							
						
					 
					
						2021-08-29 23:34:59 +08:00 
						 
				 
			
				
					
						
							
							
								Andriy Drozdyuk 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d161059c3d 
							
						 
					 
					
						
						
							
							Replaced indice by plural indices ( #422 )  
						
						
						
						
							
						
					 
					
						2021-08-20 21:58:44 +08:00 
						 
				 
			
				
					
						
							
							
								deeplook 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							728b88b92d 
							
						 
					 
					
						
						
							
							Fix conda install command ( #419 )  
						
						
						
						
							
						
					 
					
						2021-08-16 18:56:01 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5b7732a29b 
							
						 
					 
					
						
						
							
							make ppo discrete test script more general ( #418 )  
						
						
						
						
							
						
					 
					
						2021-08-15 21:37:37 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bba30f83d1 
							
						 
					 
					
						
						
							
							fix sb2's coverage ( #412 )  
						
						
						
						
							
						
					 
					
						2021-08-10 17:43:27 +08:00 
						 
				 
			
				
					
						
							
							
								Miguel Morales 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							42538f8e58 
							
						 
					 
					
						
						
							
							Update README.md ( #410 )  
						
						
						
						
							
						
					 
					
						2021-08-10 09:14:20 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0674ff628a 
							
						 
					 
					
						
						
							
							Cite Tianshou's latest paper ( #406 )  
						
						... 
						
						
						
						* Cite Tianshou's latest paper
* update new version README
* change order
Co-authored-by: Jiayi Weng <wengj@sea.com> 
						
						
							
						
					 
					
						2021-08-10 08:35:01 +08:00 
						 
				 
			
				
					
						
							
							
								Andriy Drozdyuk 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							18d2f25eff 
							
						 
					 
					
						
						
							
							Remove warnings about the use of save_fn across trainers ( #408 )  
						
						
						
						
							
						
					 
					
						2021-08-04 09:56:00 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c19876179a 
							
						 
					 
					
						
						
							
							add env_id in preprocess fn ( #391 )  
						
						
						
						
							
						
					 
					
						2021-07-05 09:50:39 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ebaca6f8da 
							
						 
					 
					
						
						
							
							add vizdoom example, bump version to 0.4.2 ( #384 )  
						
						
						
						
							
 
						
					 
					
						2021-06-26 18:08:41 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c0bc8e00ca 
							
						 
					 
					
						
						
							
							Add Fully-parameterized Quantile Function ( #376 )  
						
						
						
						
							
						
					 
					
						2021-06-15 11:59:02 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							21b2b22cd7 
							
						 
					 
					
						
						
							
							update iqn results and reward plots ( #377 )  
						
						
						
						
							
						
					 
					
						2021-06-10 09:05:25 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f3169b4c1f 
							
						 
					 
					
						
						
							
							Add Implicit Quantile Network ( #371 )  
						
						
						
						
							
						
					 
					
						2021-05-29 09:44:23 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							458028a326 
							
						 
					 
					
						
						
							
							fix docs ( #373 )  
						
						... 
						
						
						
						- fix css style error
- fix mujoco benchmark result 
						
						
							
						
					 
					
						2021-05-23 12:43:03 +08:00 
						 
				 
			
				
					
						
							
							
								Ark 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							655d5fb14f 
							
						 
					 
					
						
						
							
							Allow researchers to choose whether to use Double DQN ( #368 )  
						
						
						
						
							
						
					 
					
						2021-05-21 10:53:34 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8f7bc65ac7 
							
						 
					 
					
						
						
							
							Add discrete Critic Regularized Regression ( #367 )  
						
						
						
						
							
						
					 
					
						2021-05-19 13:29:56 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b5c3ddabfa 
							
						 
					 
					
						
						
							
							Add discrete Conservative Q-Learning for offline RL ( #359 )  
						
						... 
						
						
						
						Co-authored-by: Yi Su <yi.su@antgroup.com>
Co-authored-by: Yi Su <yi.su@antfin.com> 
						
						
							
						
					 
					
						2021-05-12 09:24:48 +08:00 
						 
				 
			
				
					
						
							
							
								Ark 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							84f58636eb 
							
						 
					 
					
						
						
							
							Make trainer resumable ( #350 )  
						
						... 
						
						
						
						- specify tensorboard >= 2.5.0
- add `save_checkpoint_fn` and `resume_from_log` in trainer
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
							
						
					 
					
						2021-05-06 08:53:53 +08:00 
						 
				 
			
				
					
						
							
							
								Yuge Zhang 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f4e05d585a 
							
						 
					 
					
						
						
							
							Support deterministic evaluation for onpolicy algorithms ( #354 )  
						
						
						
						
							
						
					 
					
						2021-04-27 21:22:39 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ff4d3cd714 
							
						 
					 
					
						
						
							
							Support different state size and fix exception in venv.__del__ ( #352 )  
						
						... 
						
						
						
						- Batch: do not raise error when it finds list of np.array with different shape[0].
- Venv's obs: add try...except block for np.stack(obs_list)
- remove venv.__del__ since it is buggy 
						
						
							
						
					 
					
						2021-04-25 15:23:46 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bbc3c3e32d 
							
						 
					 
					
						
						
							
							Add numerical analysis tool and interactive plot ( #341 )  
						
						... 
						
						
						
						Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
							
						
					 
					
						2021-04-22 12:49:54 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							844d7703c3 
							
						 
					 
					
						
						
							
							NPG Mujoco benchmark release ( #347 )  
						
						
						
						
							
						
					 
					
						2021-04-21 16:31:20 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1dcf65fe21 
							
						 
					 
					
						
						
							
							Add NPG policy ( #344 )  
						
						
						
						
							
						
					 
					
						2021-04-21 09:52:15 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c059f98abf 
							
						 
					 
					
						
						
							
							fix atari_bcq ( #345 )  
						
						
						
						
							
						
					 
					
						2021-04-20 22:59:21 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a57503c0aa 
							
						 
					 
					
						
						
							
							TRPO benchmark release ( #340 )  
						
						
						
						
							
						
					 
					
						2021-04-19 17:05:06 +08:00