Markus Krimmel 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b0c8d28a7d 
							
						 
					 
					
						
						
							
							Added pre-commit ( #752 )  
						
						... 
						
						
						
						- This PR adds the checks that are defined in the Makefile as pre-commit
hooks.
- Hopefully, the checks are equivalent to those from the Makefile, but I
can't guarantee it.
- CI remains as it is.
- As I pointed out on discord, I experienced some conflicts between
flake8 and yapf, so it might be better to transition to some other
combination (e.g. black). 
						
						
					 
					
						2022-10-02 08:57:45 -07:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							278c91a222 
							
						 
					 
					
						
						
							
							Update citation and contributor ( #721 )  
						
						... 
						
						
						
						* update citation
* update contributor
* pass lint 
						
						
					 
					
						2022-08-10 20:06:51 -07:00 
						 
				 
			
				
					
						
							
							
								Wenhao Chen 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f270e88461 
							
						 
					 
					
						
						
							
							Do not allow async simulation for test collector ( #705 )  
						
						
						
						
					 
					
						2022-07-22 16:23:55 -07:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							99c99bb09a 
							
						 
					 
					
						
						
							
							Fix 2 bugs and refactor RunningMeanStd to support dict obs norm ( #695 )  
						
						... 
						
						
						
						* fix  #689 
* fix  #672 
* refactor RMS class
* fix  #688  
						
						
					 
					
						2022-07-14 22:52:56 -07:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							df35718992 
							
						 
					 
					
						
						
							
							Implement TD3+BC for offline RL ( #660 )  
						
						... 
						
						
						
						- implement TD3+BC for offline RL;
- fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting; 
						
						
					 
					
						2022-06-07 00:39:37 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5ecea2402e 
							
						 
					 
					
						
						
							
							Fix save_checkpoint_fn return value ( #659 )  
						
						... 
						
						
						
						- Fix save_checkpoint_fn return value to checkpoint_path;
- Fix wrong link in doc;
- Fix an off-by-one bug in trainer iterator. 
						
						
					 
					
						2022-06-03 01:07:07 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6ad5b520fa 
							
						 
					 
					
						
						
							
							Fix sphinx build error ( #655 )  
						
						
						
						
					 
					
						2022-06-01 13:56:04 +08:00 
						 
				 
			
				
					
						
							
							
								Anas BELFADIL 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							53e6b0408d 
							
						 
					 
					
						
						
							
							Add BranchingDQN for large discrete action spaces ( #618 )  
						
						
						
						
					 
					
						2022-05-15 21:40:32 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bf8f63ffc3 
							
						 
					 
					
						
						
							
							use envpool in vizdoom example, update doc ( #634 )  
						
						
						
						
					 
					
						2022-05-09 00:42:16 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							dd16818ce4 
							
						 
					 
					
						
						
							
							implement REDQ based on original contribution by @Jimenius ( #623 )  
						
						... 
						
						
						
						Co-authored-by: Minhui Li
 <limh@lamda.nju.edu.cn> 
						
						
					 
					
						2022-05-01 00:06:00 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7f23748347 
							
						 
					 
					
						
						
							
							Compare Atari results with dopamine and OpenAI Baselines ( #616 )  
						
						
						
						
					 
					
						2022-04-27 21:10:45 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
						
						
							
						
						
							876e6b186e 
							
						 
					 
					
						
						
							
							hot fix mujoco benchmark  
						
						
						
						
					 
					
						2022-04-24 16:49:40 -04:00 
						 
				 
			
				
					
						
							
							
								Chengqi Duan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5eab7dc218 
							
						 
					 
					
						
						
							
							Add  Atari Results ( #600 )  
						
						
						
						
					 
					
						2022-04-24 20:44:54 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5c9afe72f3 
							
						 
					 
					
						
						
							
							Update Mujoco Bemchmark's webpage ( #606 )  
						
						
						
						
					 
					
						2022-04-24 01:11:33 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							57ecebde38 
							
						 
					 
					
						
						
							
							Add jupyter notebook tutorials using Google Colaboratory ( #599 )  
						
						
						
						
					 
					
						2022-04-19 20:58:52 +08:00 
						 
				 
			
				
					
						
							
							
								Alex Nikulkov 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							92456cdb68 
							
						 
					 
					
						
						
							
							Add learning rate scheduler to BasePolicy ( #598 )  
						
						
						
						
					 
					
						2022-04-17 23:52:30 +08:00 
						 
				 
			
				
					
						
							
							
								Yifei Cheng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6fc6857812 
							
						 
					 
					
						
						
							
							Update Multi-agent RL docs, upgrade pettingzoo ( #595 )  
						
						... 
						
						
						
						* update multi-agent docs, upgrade pettingzoo
* avoid pettingzoo deprecation warning
* fix pistonball tests
* codestyle 
						
						
					 
					
						2022-04-16 23:17:53 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2a9c9289e5 
							
						 
					 
					
						
						
							
							rename save_fn to save_best_fn to avoid ambiguity ( #575 )  
						
						... 
						
						
						
						This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper. 
						
						
					 
					
						2022-03-22 04:29:27 +08:00 
						 
				 
			
				
					
						
							
							
								Jose Antonio Martin H 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							10d919052b 
							
						 
					 
					
						
						
							
							Add Trainers as generators ( #559 )  
						
						... 
						
						
						
						The new proposed feature is to have trainers as generators.
The usage pattern is:
```python
trainer = OnPolicyTrainer(...)
for epoch, epoch_stat, info in trainer:
    print(f"Epoch: {epoch}")
    print(epoch_stat)
    print(info)
    do_something_with_policy()
    query_something_about_policy()
    make_a_plot_with(epoch_stat)
    display(info)
```
- epoch int: the epoch number
- epoch_stat dict: a large collection of metrics of the current epoch, including stat
- info dict: the usual dict out of the non-generator version of the trainer
You can even iterate on several different trainers at the same time:
```python
trainer1 = OnPolicyTrainer(...)
trainer2 = OnPolicyTrainer(...)
for result1, result2, ... in zip(trainer1, trainer2, ...):
    compare_results(result1, result2, ...)
```
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2022-03-18 00:26:14 +08:00 
						 
				 
			
				
					
						
							
							
								Andrea Boscolo Camiletto 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2336a7db1b 
							
						 
					 
					
						
						
							
							fixed typo in rainbow DQN paper reference ( #569 )  
						
						... 
						
						
						
						* fixed typo in rainbow DQN paper ref
* fix gym==0.23 ci failure
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2022-03-16 21:38:51 +08:00 
						 
				 
			
				
					
						
							
							
								Costa Huang 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							df3d7f582b 
							
						 
					 
					
						
						
							
							Update WandbLogger implementation ( #558 )  
						
						... 
						
						
						
						* Use `global_step` as the x-axis for wandb
* Use Tensorboard SummaryWritter as core with `wandb.init(..., sync_tensorboard=True)`
* Update all atari examples with wandb
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2022-03-07 06:40:47 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2377f2f186 
							
						 
					 
					
						
						
							
							Implement Generative Adversarial Imitation Learning (GAIL) ( #550 )  
						
						... 
						
						
						
						Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531 , #173 ) 
						
						
					 
					
						2022-03-06 23:57:15 +08:00 
						 
				 
			
				
					
						
							
							
								Chengqi Duan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d85bc19269 
							
						 
					 
					
						
						
							
							update dqn tutorial and add envpool to docs ( #526 )  
						
						... 
						
						
						
						Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2022-02-15 06:39:47 +08:00 
						 
				 
			
				
					
						
							
							
								Chengqi Duan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9c100e0705 
							
						 
					 
					
						
						
							
							Enable venvs.reset() concurrent execution ( #517 )  
						
						... 
						
						
						
						- change the internal API name of worker: send_action -> send, get_result -> recv (align with envpool)
- add a timing test for venvs.reset() to make sure the concurrent execution
- change venvs.reset() logic
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2022-02-08 00:40:01 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c25926dd8f 
							
						 
					 
					
						
						
							
							Formalize variable names ( #509 )  
						
						... 
						
						
						
						Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2022-01-30 00:53:56 +08:00 
						 
				 
			
				
					
						
							
							
								Bernard Tan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bc53ead273 
							
						 
					 
					
						
						
							
							Implement CQLPolicy and offline_cql example ( #506 )  
						
						
						
						
					 
					
						2022-01-16 05:30:21 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a59d96d041 
							
						 
					 
					
						
						
							
							Add Intrinsic Curiosity Module ( #503 )  
						
						
						
						
					 
					
						2022-01-15 02:43:48 +08:00 
						 
				 
			
				
					
						
							
							
								Bernard Tan 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5c5a3db94e 
							
						 
					 
					
						
						
							
							Implement BCQPolicy and offline_bcq example ( #480 )  
						
						... 
						
						
						
						This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py. 
						
						
					 
					
						2021-11-22 22:21:02 +08:00 
						 
				 
			
				
					
						
							
							
								Ayush Chaurasia 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							63d752ee0b 
							
						 
					 
					
						
						
							
							W&B: Add usage in the docs ( #463 )  
						
						
						
						
					 
					
						2021-10-13 23:28:25 +08:00 
						 
				 
			
				
					
						
							
							
								Jiayi Weng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e45e2096d8 
							
						 
					 
					
						
						
							
							add multi-GPU support ( #461 )  
						
						... 
						
						
						
						add a new class DataParallelNet 
						
						
					 
					
						2021-10-06 01:39:14 +08:00 
						 
				 
			
				
					
						
							
							
								Ayush Chaurasia 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							22d7bf38c8 
							
						 
					 
					
						
						
							
							Improve W&B logger ( #441 )  
						
						... 
						
						
						
						- rename WandBLogger -> WandbLogger
- add save_data and restore_data
- allow more input arguments for wandb init
- integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py
- documentation update 
						
						
					 
					
						2021-09-24 21:52:23 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fc251ab0b8 
							
						 
					 
					
						
						
							
							bump to v0.4.3 ( #432 )  
						
						... 
						
						
						
						* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check 
						
						
					 
					
						2021-09-03 05:05:04 +08:00 
						 
				 
			
				
					
						
							
							
								Andriy Drozdyuk 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8a5e2190f7 
							
						 
					 
					
						
						
							
							Add Weights and Biases Logger ( #427 )  
						
						... 
						
						
						
						- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 
						
						
					 
					
						2021-08-30 22:35:02 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e4f4f0e144 
							
						 
					 
					
						
						
							
							fix docs build failure and a bug in a2c/ppo optimizer ( #428 )  
						
						... 
						
						
						
						* fix rtfd build
* list + list -> set.union
* change seed of test_qrdqn
* add py39 test 
						
						
					 
					
						2021-08-30 02:07:03 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							291be08d43 
							
						 
					 
					
						
						
							
							Add Rainbow DQN ( #386 )  
						
						... 
						
						
						
						- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network 
						
						
					 
					
						2021-08-29 23:34:59 +08:00 
						 
				 
			
				
					
						
							
							
								Andriy Drozdyuk 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d161059c3d 
							
						 
					 
					
						
						
							
							Replaced indice by plural indices ( #422 )  
						
						
						
						
					 
					
						2021-08-20 21:58:44 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c19876179a 
							
						 
					 
					
						
						
							
							add env_id in preprocess fn ( #391 )  
						
						
						
						
					 
					
						2021-07-05 09:50:39 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c0bc8e00ca 
							
						 
					 
					
						
						
							
							Add Fully-parameterized Quantile Function ( #376 )  
						
						
						
						
					 
					
						2021-06-15 11:59:02 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f3169b4c1f 
							
						 
					 
					
						
						
							
							Add Implicit Quantile Network ( #371 )  
						
						
						
						
					 
					
						2021-05-29 09:44:23 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							458028a326 
							
						 
					 
					
						
						
							
							fix docs ( #373 )  
						
						... 
						
						
						
						- fix css style error
- fix mujoco benchmark result 
						
						
					 
					
						2021-05-23 12:43:03 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8f7bc65ac7 
							
						 
					 
					
						
						
							
							Add discrete Critic Regularized Regression ( #367 )  
						
						
						
						
					 
					
						2021-05-19 13:29:56 +08:00 
						 
				 
			
				
					
						
							
							
								Yi Su 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b5c3ddabfa 
							
						 
					 
					
						
						
							
							Add discrete Conservative Q-Learning for offline RL ( #359 )  
						
						... 
						
						
						
						Co-authored-by: Yi Su <yi.su@antgroup.com>
Co-authored-by: Yi Su <yi.su@antfin.com> 
						
						
					 
					
						2021-05-12 09:24:48 +08:00 
						 
				 
			
				
					
						
							
							
								Ark 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							84f58636eb 
							
						 
					 
					
						
						
							
							Make trainer resumable ( #350 )  
						
						... 
						
						
						
						- specify tensorboard >= 2.5.0
- add `save_checkpoint_fn` and `resume_from_log` in trainer
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
					 
					
						2021-05-06 08:53:53 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ff4d3cd714 
							
						 
					 
					
						
						
							
							Support different state size and fix exception in venv.__del__ ( #352 )  
						
						... 
						
						
						
						- Batch: do not raise error when it finds list of np.array with different shape[0].
- Venv's obs: add try...except block for np.stack(obs_list)
- remove venv.__del__ since it is buggy 
						
						
					 
					
						2021-04-25 15:23:46 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							bbc3c3e32d 
							
						 
					 
					
						
						
							
							Add numerical analysis tool and interactive plot ( #341 )  
						
						... 
						
						
						
						Co-authored-by: Trinkle23897 <trinkle23897@gmail.com> 
						
						
					 
					
						2021-04-22 12:49:54 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1dcf65fe21 
							
						 
					 
					
						
						
							
							Add NPG policy ( #344 )  
						
						
						
						
					 
					
						2021-04-21 09:52:15 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5057b5c89e 
							
						 
					 
					
						
						
							
							Add TRPO policy ( #337 )  
						
						
						
						
					 
					
						2021-04-16 20:37:12 +08:00 
						 
				 
			
				
					
						
							
							
								ChenDRAG 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6426a39796 
							
						 
					 
					
						
						
							
							ppo benchmark ( #330 )  
						
						
						
						
					 
					
						2021-03-30 11:50:35 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8963a14327 
							
						 
					 
					
						
						
							
							fix exception in tutorials/dqn.rst ( #327 )  
						
						
						
						
					 
					
						2021-03-26 12:57:00 +08:00 
						 
				 
			
				
					
						
							
							
								n+e 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0c7117dd55 
							
						 
					 
					
						
						
							
							fix concepts.rst with regard to new buffer behavior ( #316 )  
						
						... 
						
						
						
						fix  #315  
					
						2021-03-20 21:46:36 +08:00