Alexis DUBURCQ
1fce527c77
Fix 'to_tensor' dtype/device forwarding for Batch over Batch. ( #68 )
...
* Fix Batch to_torch method not updating dtype/device of already converted data.
* Fix dtype/device to forwarded by to_tensor for Batch over Batch.
* Add Unit test to check to_torch dtype/device recursive forwarding.
* Batch UT check accessing data using both dict and class style.
* Fix utils to_tensor dtype/device forwarding. Add Unit tests.
* Fix UT.
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
Co-authored-by: n+e <463003665@qq.com>
2020-05-30 21:40:31 +08:00
Alexis DUBURCQ
529a4cf44c
Add pickle support for Batch. Fix VectorEnv. ( #67 )
...
* Fix vecenv.
* Add pickle support for Batch class.
* Add Batch pickle Unit Test.
* Fix lint.
* Swap Batch UT.
* Fix lint.
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-30 21:29:33 +08:00
Alexis DUBURCQ
dd3e2130bb
Infer the right dtype for replay buffers. ( #64 )
2020-05-29 22:27:03 +08:00
Alexis DUBURCQ
8af7196a9a
Robust conversion from/to numpy/pytorch ( #63 )
...
* Enable to convert Batch data back to torch.
* Add torch converter to collector.
* Fix
* Move to_numpy/to_torch convert in dedicated utils.py.
* Use to_numpy/to_torch to convert arrays.
* fix lint
* fix
* Add unit test to check Batch from/to numpy.
* Fix Batch over Batch.
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-29 20:45:21 +08:00
Alexis DUBURCQ
b5093ecb56
Minor refactor for Batch class. ( #61 )
...
* Minor refactor for Batch class.
* Fix.
* Add back key sorting.
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-29 17:56:46 +08:00
Trinkle23897
be9ce44290
fix #59
2020-05-29 11:49:47 +08:00
Trinkle23897
d2b2fa87c0
fix #56
2020-05-29 08:03:37 +08:00
Trinkle23897
de556fd22d
item3 of #51
2020-05-27 11:02:23 +08:00
magicly
6237cc0d52
fix dqn zero eps ( #52 )
...
Co-authored-by: liyan <liyan1@digisky.com>
2020-05-21 11:35:41 +08:00
Imone
57bca16f94
Fix log_prob and PPO dual_clip ( #49 )
...
* Added DiagGaussian to fix log_probg
* Disable PPO dual_clip
2020-05-18 16:23:35 +08:00
Trinkle23897
70122dc03d
oinit with 0 bias
2020-05-17 17:06:20 +08:00
Trinkle23897
3271c92609
orthogonal init for ppo in test script
2020-05-16 20:27:01 +08:00
Trinkle23897
0eef0ca198
fix optional type syntax
2020-05-16 20:08:32 +08:00
Trinkle23897
3243484f8e
show stat in pytest
2020-05-16 08:48:12 +08:00
Trinkle23897
9b26137cd2
add type annotation
2020-05-12 11:31:47 +08:00
Trinkle23897
075825325e
add preprocess_fn ( #42 )
2020-05-05 13:39:51 +08:00
Trinkle23897
04b091d975
fix max-grad-norm err in a2c ( #46 )
2020-05-04 12:33:04 +08:00
Trinkle23897
c2a7caf806
add recurrent actor and critic
2020-04-30 16:31:40 +08:00
Trinkle23897
134f787e24
reserve 'policy' keyword in replay buffer
2020-04-29 17:48:48 +08:00
Trinkle23897
e58fc78546
build docs
2020-04-29 14:16:38 +08:00
Trinkle23897
bb2f833d0e
support Batch of Batch and fix bugs ( #38 )
2020-04-29 12:14:53 +08:00
nicoguertler
8f718d9b13
Fix log_prob in SAC ( #41 )
2020-04-28 23:44:15 +08:00
Trinkle23897
69e4b3d301
fix setup err on building docs
2020-04-28 21:11:40 +08:00
Trinkle23897
80d661907e
Multimodal obs ( #38 , #27 , #25 )
2020-04-28 20:56:02 +08:00
Trinkle23897
959955fa2a
fix historical issues
2020-04-26 16:13:51 +08:00
Trinkle23897
6b96f124ae
fix pdqn
v0.2.2
2020-04-26 15:11:20 +08:00
rocknamx
b23749463e
Prioritized DQN ( #30 )
...
* add sum_tree.py
* add prioritized replay buffer
* del sum_tree.py
* fix some format issues
* fix weight_update bug
* simply replace replaybuffer in test_dqn without weight update
* weight default set to 1
* fix sampling bug when buffer is not full
* rename parameter
* fix formula error, add accuracy check
* add PrioritizedDQN test
* add test_pdqn.py
* add update_weight() doc
* add ref of prio dqn in readme.md and index.rst
* restore test_dqn.py, fix args of test_pdqn.py
2020-04-26 12:05:58 +08:00
Trinkle23897
70290346ea
compatible with torch==1.5.0 ( fix #37 )
2020-04-26 11:04:45 +08:00
Trinkle23897
8812eaa502
fix #36
2020-04-23 22:06:18 +08:00
Minghao Zhang
205698dd66
fix #33 ( #34 )
2020-04-21 15:36:08 +08:00
Trinkle23897
4fd826761c
enable null buffer in test collector
2020-04-20 11:50:18 +08:00
Trinkle23897
815f3522bb
imitation with discrete action space
2020-04-20 11:25:20 +08:00
Trinkle23897
6bf1ea644d
fix ppo
2020-04-19 14:30:42 +08:00
Trinkle23897
680fc0ffbe
gae
2020-04-14 21:11:06 +08:00
Trinkle23897
7b65d43394
vanilla imitation learning
2020-04-13 19:37:27 +08:00
Trinkle23897
befdfb07e8
polish docs
2020-04-11 19:29:46 +08:00
Trinkle23897
6a244d1fbb
save_fn
2020-04-11 16:54:27 +08:00
Trinkle23897
74407e13da
env info log_fn ( #28 )
2020-04-10 18:02:05 +08:00
Trinkle23897
ecfcb9f295
fix docs
2020-04-10 11:16:33 +08:00
Trinkle23897
3cc22b7c0c
__call__ -> forward
2020-04-10 10:47:16 +08:00
Trinkle23897
13086b7f64
add ignore_obs_next in buffer
2020-04-10 09:01:17 +08:00
Trinkle23897
19f2cce294
seealso and change policy dir structure
2020-04-09 21:36:53 +08:00
Trinkle23897
6da80e045a
fix rnn ( #19 ), add __repr__, and fix #26
2020-04-09 19:53:45 +08:00
Trinkle23897
86572c66d4
maybe finished rnn?
2020-04-08 21:13:15 +08:00
Trinkle23897
d9d2763dad
first version with full documentation
v0.2.1
2020-04-07 11:50:34 +08:00
Trinkle23897
6c8edf6a3a
codecov badge
2020-04-07 11:17:10 +08:00
Trinkle23897
e0809ff135
add policy docs ( #21 )
2020-04-06 19:36:59 +08:00
Trinkle23897
610390c132
add docs of collector and trainer ( #20 )
2020-04-05 18:34:45 +08:00
Oblivion
4d4d0daf9e
Performance improve ( #18 )
...
* improve performance
set one thread for NN
replace detach() op with torch.no_grad()
* fix pep 8 errors
2020-04-05 09:10:21 +08:00
Trinkle23897
b6c9db6b0b
docs for env
2020-04-04 21:02:06 +08:00