176 Commits

Author SHA1 Message Date
Trinkle23897
a655334d00 change batch.append to batch.cat 2020-06-20 22:23:12 +08:00
Trinkle23897
aff0f9aee0 fix append batch over batch 2020-06-20 22:03:22 +08:00
Trinkle23897
81e4a16ef2 fix a bug in re-index replay buffer (fix #82) 2020-06-17 16:37:51 +08:00
danagi
c59ad40aef
Add auto alpha tuning and exploration noise for sac. (#80)
Add class BaseNoise and GaussianNoise for the concept of exploration noise.
Add new test for sac tested in MountainCarContinuous-v0,
which should benefits from the two above new feature.
2020-06-16 22:17:28 +08:00
Trinkle23897
3774258cc7 fix unittest 2020-06-11 09:07:45 +08:00
Trinkle23897
1a914336f7 add random action in collector (fix #78) 2020-06-11 08:57:37 +08:00
Trinkle23897
f1951780ab fix a bug of storing batch over batch data into buffer 2020-06-09 18:46:14 +08:00
Trinkle23897
560116d0b2 cheat sheet 2020-06-08 21:53:00 +08:00
Alexis DUBURCQ
52be533d06
Enable getattr for SubprocVecEnv. (#74)
* Enable getattr for SubprovVecEnv.

* Consistent API between VectorEnv and SubprocVecEnv.

* Avoid code duplication. Add unit tests.

* Add docstring.

* Test more branches.

* Fix UT.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-05 17:17:43 +08:00
Trinkle23897
dc451dfe88 nstep all (fix #51) 2020-06-03 13:59:47 +08:00
Trinkle23897
ff81a18f42 compute_nstep_returns (item 2 of #51) 2020-06-02 22:29:50 +08:00
Trinkle23897
ba1b3e54eb fix #69 2020-06-01 08:30:09 +08:00
Alexis DUBURCQ
1fce527c77
Fix 'to_tensor' dtype/device forwarding for Batch over Batch. (#68)
* Fix Batch to_torch method not updating dtype/device of already converted data.

* Fix dtype/device to forwarded by to_tensor for Batch over Batch.

* Add Unit test to check to_torch dtype/device recursive forwarding.

* Batch UT check accessing data using both dict and class style.

* Fix utils to_tensor dtype/device forwarding. Add Unit tests.

* Fix UT.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
Co-authored-by: n+e <463003665@qq.com>
2020-05-30 21:40:31 +08:00
Alexis DUBURCQ
529a4cf44c
Add pickle support for Batch. Fix VectorEnv. (#67)
* Fix vecenv.

* Add pickle support for Batch class.

* Add Batch pickle Unit Test.

* Fix lint.

* Swap Batch UT.

* Fix lint.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-30 21:29:33 +08:00
Alexis DUBURCQ
8af7196a9a
Robust conversion from/to numpy/pytorch (#63)
* Enable to convert Batch data back to torch.

* Add torch converter to collector.

* Fix

* Move to_numpy/to_torch convert in dedicated utils.py.

* Use to_numpy/to_torch to convert arrays.

* fix lint

* fix

* Add unit test to check Batch from/to numpy.

* Fix Batch over Batch.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-29 20:45:21 +08:00
Trinkle23897
d2b2fa87c0 fix #56 2020-05-29 08:03:37 +08:00
Trinkle23897
de556fd22d item3 of #51 2020-05-27 11:02:23 +08:00
Imone
57bca16f94
Fix log_prob and PPO dual_clip (#49)
* Added DiagGaussian to fix log_probg

* Disable PPO dual_clip
2020-05-18 16:23:35 +08:00
Trinkle23897
70122dc03d oinit with 0 bias 2020-05-17 17:06:20 +08:00
Trinkle23897
3271c92609 orthogonal init for ppo in test script 2020-05-16 20:27:01 +08:00
Trinkle23897
0eef0ca198 fix optional type syntax 2020-05-16 20:08:32 +08:00
Trinkle23897
075825325e add preprocess_fn (#42) 2020-05-05 13:39:51 +08:00
Trinkle23897
c2a7caf806 add recurrent actor and critic 2020-04-30 16:31:40 +08:00
Trinkle23897
bb2f833d0e support Batch of Batch and fix bugs (#38) 2020-04-29 12:14:53 +08:00
Trinkle23897
80d661907e Multimodal obs (#38, #27, #25) 2020-04-28 20:56:02 +08:00
Trinkle23897
959955fa2a fix historical issues 2020-04-26 16:13:51 +08:00
Trinkle23897
6b96f124ae fix pdqn 2020-04-26 15:11:20 +08:00
rocknamx
b23749463e
Prioritized DQN (#30)
* add sum_tree.py

* add prioritized replay buffer

* del sum_tree.py

* fix some format issues

* fix weight_update bug

* simply replace replaybuffer in test_dqn without weight update

* weight default set to 1

* fix sampling bug when buffer is not full

* rename parameter

* fix formula error, add accuracy check

* add PrioritizedDQN test

* add test_pdqn.py

* add update_weight() doc

* add ref of prio dqn in readme.md and index.rst

* restore test_dqn.py, fix args of test_pdqn.py
2020-04-26 12:05:58 +08:00
Trinkle23897
815f3522bb imitation with discrete action space 2020-04-20 11:25:20 +08:00
Trinkle23897
6bf1ea644d fix ppo 2020-04-19 14:30:42 +08:00
Trinkle23897
680fc0ffbe gae 2020-04-14 21:11:06 +08:00
Trinkle23897
7b65d43394 vanilla imitation learning 2020-04-13 19:37:27 +08:00
Trinkle23897
6a244d1fbb save_fn 2020-04-11 16:54:27 +08:00
Trinkle23897
74407e13da env info log_fn (#28) 2020-04-10 18:02:05 +08:00
Trinkle23897
3cc22b7c0c __call__ -> forward 2020-04-10 10:47:16 +08:00
Trinkle23897
13086b7f64 add ignore_obs_next in buffer 2020-04-10 09:01:17 +08:00
Trinkle23897
6da80e045a fix rnn (#19), add __repr__, and fix #26 2020-04-09 19:53:45 +08:00
Trinkle23897
86572c66d4 maybe finished rnn? 2020-04-08 21:13:15 +08:00
Trinkle23897
e0809ff135 add policy docs (#21) 2020-04-06 19:36:59 +08:00
Oblivion
4d4d0daf9e
Performance improve (#18)
* improve performance

set one thread for NN
replace detach() op with torch.no_grad()

* fix pep 8 errors
2020-04-05 09:10:21 +08:00
Trinkle23897
b6c9db6b0b docs for env 2020-04-04 21:02:06 +08:00
Trinkle23897
974ade8019 add some docs 2020-04-03 21:28:12 +08:00
ShenDezhou
4da857d86e
Fix windows env setup bugs and other typo. (#11) 2020-03-31 17:22:32 +08:00
Trinkle23897
d9e4b9d16f upd doc 2020-03-29 10:22:03 +08:00
Trinkle23897
f68f23292e update readme and force flake8 2020-03-28 13:27:01 +08:00
Minghao Zhang
068c4068ec
fix atari/mujoco env (#7)
* update atari.py

* fix setup.py
pass the pytest

* fix setup.py
pass the pytest

* add args "render"

* change the tensorboard writter

* change the tensorboard writter

* change device, render, tensorboard log location

* change device, render, tensorboard log location

* remove some wrong local files

* fix some tab mistakes and the envs name in continuous/test_xx.py

* add examples and point robot maze environment

* fix some bugs during testing examples

* add dqn network and fix some args

* change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally

* add a warning to collector

* rm some unrelated files

* reformat

* fix a bug in test_dqn due to the model wrong selection

* change atari frame skip and observation to improve performance

* readd some files

* change import

* modified readme

* rm tensorboard log

* update atari and mujoco which are ignored

* rm the wrong lines
2020-03-28 12:03:49 +08:00
Trinkle23897
c42990c725 add rllib result and fix pep8 2020-03-28 09:43:35 +08:00
Minghao Zhang
77068af526
add examples, fix some bugs (#5)
* update atari.py

* fix setup.py
pass the pytest

* fix setup.py
pass the pytest

* add args "render"

* change the tensorboard writter

* change the tensorboard writter

* change device, render, tensorboard log location

* change device, render, tensorboard log location

* remove some wrong local files

* fix some tab mistakes and the envs name in continuous/test_xx.py

* add examples and point robot maze environment

* fix some bugs during testing examples

* add dqn network and fix some args

* change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally

* add a warning to collector

* rm some unrelated files

* reformat

* fix a bug in test_dqn due to the model wrong selection
2020-03-28 07:27:18 +08:00
Trinkle23897
44f911bc31 add pytorch drl result 2020-03-27 09:04:29 +08:00
Trinkle23897
519f9f20d0 update readme 2020-03-26 17:32:51 +08:00