Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							ba1b3e54eb 
							
						 
					 
					
						
						
							
							fix   #69  
						
						
						
						
							
						
					 
					
						2020-06-01 08:30:09 +08:00 
						 
				 
			
				
					
						
							
							
								Alexis DUBURCQ 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1fce527c77 
							
						 
					 
					
						
						
							
							Fix 'to_tensor' dtype/device forwarding for Batch over Batch. ( #68 )  
						
						... 
						
						
						
						* Fix Batch to_torch method not updating dtype/device of already converted data.
* Fix dtype/device to forwarded by to_tensor for Batch over Batch.
* Add Unit test to check to_torch dtype/device recursive forwarding.
* Batch UT check accessing data using both dict and class style.
* Fix utils to_tensor dtype/device forwarding. Add Unit tests.
* Fix UT.
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
Co-authored-by: n+e <463003665@qq.com> 
						
						
							
						
					 
					
						2020-05-30 21:40:31 +08:00 
						 
				 
			
				
					
						
							
							
								Alexis DUBURCQ 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							529a4cf44c 
							
						 
					 
					
						
						
							
							Add pickle support for Batch. Fix VectorEnv. ( #67 )  
						
						... 
						
						
						
						* Fix vecenv.
* Add pickle support for Batch class.
* Add Batch pickle Unit Test.
* Fix lint.
* Swap Batch UT.
* Fix lint.
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu> 
						
						
							
						
					 
					
						2020-05-30 21:29:33 +08:00 
						 
				 
			
				
					
						
							
							
								Alexis DUBURCQ 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							dd3e2130bb 
							
						 
					 
					
						
						
							
							Infer the right dtype for replay buffers. ( #64 )  
						
						
						
						
							
						
					 
					
						2020-05-29 22:27:03 +08:00 
						 
				 
			
				
					
						
							
							
								Alexis DUBURCQ 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8af7196a9a 
							
						 
					 
					
						
						
							
							Robust conversion from/to numpy/pytorch ( #63 )  
						
						... 
						
						
						
						* Enable to convert Batch data back to torch.
* Add torch converter to collector.
* Fix
* Move to_numpy/to_torch convert in dedicated utils.py.
* Use to_numpy/to_torch to convert arrays.
* fix lint
* fix
* Add unit test to check Batch from/to numpy.
* Fix Batch over Batch.
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu> 
						
						
							
						
					 
					
						2020-05-29 20:45:21 +08:00 
						 
				 
			
				
					
						
							
							
								Alexis DUBURCQ 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b5093ecb56 
							
						 
					 
					
						
						
							
							Minor refactor for Batch class. ( #61 )  
						
						... 
						
						
						
						* Minor refactor for Batch class.
* Fix.
* Add back key sorting.
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu> 
						
						
							
						
					 
					
						2020-05-29 17:56:46 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							be9ce44290 
							
						 
					 
					
						
						
							
							fix   #59  
						
						
						
						
							
						
					 
					
						2020-05-29 11:49:47 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							d2b2fa87c0 
							
						 
					 
					
						
						
							
							fix   #56  
						
						
						
						
							
						
					 
					
						2020-05-29 08:03:37 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							de556fd22d 
							
						 
					 
					
						
						
							
							item3 of  #51  
						
						
						
						
							
						
					 
					
						2020-05-27 11:02:23 +08:00 
						 
				 
			
				
					
						
							
							
								magicly 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6237cc0d52 
							
						 
					 
					
						
						
							
							fix dqn zero eps ( #52 )  
						
						... 
						
						
						
						Co-authored-by: liyan <liyan1@digisky.com> 
						
						
							
						
					 
					
						2020-05-21 11:35:41 +08:00 
						 
				 
			
				
					
						
							
							
								Imone 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							57bca16f94 
							
						 
					 
					
						
						
							
							Fix log_prob and PPO dual_clip ( #49 )  
						
						... 
						
						
						
						* Added DiagGaussian to fix log_probg
* Disable PPO dual_clip 
						
						
							
						
					 
					
						2020-05-18 16:23:35 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							70122dc03d 
							
						 
					 
					
						
						
							
							oinit with 0 bias  
						
						
						
						
							
						
					 
					
						2020-05-17 17:06:20 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							3271c92609 
							
						 
					 
					
						
						
							
							orthogonal init for ppo in test script  
						
						
						
						
							
						
					 
					
						2020-05-16 20:27:01 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							0eef0ca198 
							
						 
					 
					
						
						
							
							fix optional type syntax  
						
						
						
						
							
						
					 
					
						2020-05-16 20:08:32 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							3243484f8e 
							
						 
					 
					
						
						
							
							show stat in pytest  
						
						
						
						
							
						
					 
					
						2020-05-16 08:48:12 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							9b26137cd2 
							
						 
					 
					
						
						
							
							add type annotation  
						
						
						
						
							
						
					 
					
						2020-05-12 11:31:47 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							075825325e 
							
						 
					 
					
						
						
							
							add preprocess_fn ( #42 )  
						
						
						
						
							
						
					 
					
						2020-05-05 13:39:51 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							04b091d975 
							
						 
					 
					
						
						
							
							fix max-grad-norm err in a2c ( #46 )  
						
						
						
						
							
						
					 
					
						2020-05-04 12:33:04 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							c2a7caf806 
							
						 
					 
					
						
						
							
							add recurrent actor and critic  
						
						
						
						
							
						
					 
					
						2020-04-30 16:31:40 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							134f787e24 
							
						 
					 
					
						
						
							
							reserve 'policy' keyword in replay buffer  
						
						
						
						
							
						
					 
					
						2020-04-29 17:48:48 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							e58fc78546 
							
						 
					 
					
						
						
							
							build docs  
						
						
						
						
							
						
					 
					
						2020-04-29 14:16:38 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							bb2f833d0e 
							
						 
					 
					
						
						
							
							support Batch of Batch and fix bugs ( #38 )  
						
						
						
						
							
						
					 
					
						2020-04-29 12:14:53 +08:00 
						 
				 
			
				
					
						
							
							
								nicoguertler 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8f718d9b13 
							
						 
					 
					
						
						
							
							Fix log_prob in SAC ( #41 )  
						
						
						
						
							
						
					 
					
						2020-04-28 23:44:15 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							69e4b3d301 
							
						 
					 
					
						
						
							
							fix setup err on building docs  
						
						
						
						
							
						
					 
					
						2020-04-28 21:11:40 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							80d661907e 
							
						 
					 
					
						
						
							
							Multimodal obs ( #38 ,  #27 ,  #25 )  
						
						
						
						
							
						
					 
					
						2020-04-28 20:56:02 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							959955fa2a 
							
						 
					 
					
						
						
							
							fix historical issues  
						
						
						
						
							
						
					 
					
						2020-04-26 16:13:51 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							6b96f124ae 
							
						 
					 
					
						
						
							
							fix pdqn  
						
						
						
						
							
 
						
					 
					
						2020-04-26 15:11:20 +08:00 
						 
				 
			
				
					
						
							
							
								rocknamx 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b23749463e 
							
						 
					 
					
						
						
							
							Prioritized DQN ( #30 )  
						
						... 
						
						
						
						* add sum_tree.py
* add prioritized replay buffer
* del sum_tree.py
* fix some format issues
* fix weight_update bug
* simply replace replaybuffer in test_dqn without weight update
* weight default set to 1
* fix sampling bug when buffer is not full
* rename parameter
* fix formula error, add accuracy check
* add PrioritizedDQN test
* add test_pdqn.py
* add update_weight() doc
* add ref of prio dqn in readme.md and index.rst
* restore test_dqn.py, fix args of test_pdqn.py 
						
						
							
						
					 
					
						2020-04-26 12:05:58 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							70290346ea 
							
						 
					 
					
						
						
							
							compatible with torch==1.5.0 ( fix   #37 )  
						
						
						
						
							
						
					 
					
						2020-04-26 11:04:45 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							8812eaa502 
							
						 
					 
					
						
						
							
							fix   #36  
						
						
						
						
							
						
					 
					
						2020-04-23 22:06:18 +08:00 
						 
				 
			
				
					
						
							
							
								Minghao Zhang 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							205698dd66 
							
						 
					 
					
						
						
							
							fix   #33  ( #34 )  
						
						
						
						
							
						
					 
					
						2020-04-21 15:36:08 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							4fd826761c 
							
						 
					 
					
						
						
							
							enable null buffer in test collector  
						
						
						
						
							
						
					 
					
						2020-04-20 11:50:18 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							815f3522bb 
							
						 
					 
					
						
						
							
							imitation with discrete action space  
						
						
						
						
							
						
					 
					
						2020-04-20 11:25:20 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							6bf1ea644d 
							
						 
					 
					
						
						
							
							fix ppo  
						
						
						
						
							
						
					 
					
						2020-04-19 14:30:42 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							680fc0ffbe 
							
						 
					 
					
						
						
							
							gae  
						
						
						
						
							
						
					 
					
						2020-04-14 21:11:06 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							7b65d43394 
							
						 
					 
					
						
						
							
							vanilla imitation learning  
						
						
						
						
							
						
					 
					
						2020-04-13 19:37:27 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							befdfb07e8 
							
						 
					 
					
						
						
							
							polish docs  
						
						
						
						
							
						
					 
					
						2020-04-11 19:29:46 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							6a244d1fbb 
							
						 
					 
					
						
						
							
							save_fn  
						
						
						
						
							
						
					 
					
						2020-04-11 16:54:27 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							74407e13da 
							
						 
					 
					
						
						
							
							env info log_fn ( #28 )  
						
						
						
						
							
						
					 
					
						2020-04-10 18:02:05 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							ecfcb9f295 
							
						 
					 
					
						
						
							
							fix docs  
						
						
						
						
							
						
					 
					
						2020-04-10 11:16:33 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							3cc22b7c0c 
							
						 
					 
					
						
						
							
							__call__ -> forward  
						
						
						
						
							
						
					 
					
						2020-04-10 10:47:16 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							13086b7f64 
							
						 
					 
					
						
						
							
							add ignore_obs_next in buffer  
						
						
						
						
							
						
					 
					
						2020-04-10 09:01:17 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							19f2cce294 
							
						 
					 
					
						
						
							
							seealso and change policy dir structure  
						
						
						
						
							
						
					 
					
						2020-04-09 21:36:53 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							6da80e045a 
							
						 
					 
					
						
						
							
							fix rnn ( #19 ), add __repr__, and  fix   #26  
						
						
						
						
							
						
					 
					
						2020-04-09 19:53:45 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							86572c66d4 
							
						 
					 
					
						
						
							
							maybe finished rnn?  
						
						
						
						
							
						
					 
					
						2020-04-08 21:13:15 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							d9d2763dad 
							
						 
					 
					
						
						
							
							first version with full documentation  
						
						
						
						
							
 
						
					 
					
						2020-04-07 11:50:34 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							6c8edf6a3a 
							
						 
					 
					
						
						
							
							codecov badge  
						
						
						
						
							
						
					 
					
						2020-04-07 11:17:10 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							e0809ff135 
							
						 
					 
					
						
						
							
							add policy docs ( #21 )  
						
						
						
						
							
						
					 
					
						2020-04-06 19:36:59 +08:00 
						 
				 
			
				
					
						
							
							
								Trinkle23897 
							
						 
					 
					
						
						
						
						
							
						
						
							610390c132 
							
						 
					 
					
						
						
							
							add docs of collector and trainer ( #20 )  
						
						
						
						
							
						
					 
					
						2020-04-05 18:34:45 +08:00 
						 
				 
			
				
					
						
							
							
								Oblivion 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4d4d0daf9e 
							
						 
					 
					
						
						
							
							Performance improve ( #18 )  
						
						... 
						
						
						
						* improve performance
set one thread for NN
replace detach() op with torch.no_grad()
* fix pep 8 errors 
						
						
							
						
					 
					
						2020-04-05 09:10:21 +08:00