29 Commits

Author SHA1 Message Date
youkaichao
e767de044b
Remove dummy net code (#123)
* remove dummy net; delete two files

* split code to have backbone and head

* rename class

* change torch.float to torch.float32

* use flatten(1) instead of view(batch, -1)

* remove dummy net in docs

* bugfix for rnn

* fix cuda error

* minor fix of docs

* do not change the example code in dqn tutorial, since it is for demonstration

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-09 22:57:01 +08:00
danagi
13828f6309
added noise param to collector for test phase, fixed examples to adapt modification (#86)
* Add auto alpha tuning and exploration noise for sac.
Add class BaseNoise and GaussianNoise for the concept of exploration noise.
Add new test for sac tested in MountainCarContinuous-v0,
which should benefits from the two above new feature.

* add exploration noise to collector, fix example to adapt modification
2020-06-23 07:20:51 +08:00
Trinkle23897
e8b44bbaf4 move sac_mcc to examples (runtime too long) 2020-06-22 21:39:00 +08:00
danagi
c59ad40aef
Add auto alpha tuning and exploration noise for sac. (#80)
Add class BaseNoise and GaussianNoise for the concept of exploration noise.
Add new test for sac tested in MountainCarContinuous-v0,
which should benefits from the two above new feature.
2020-06-16 22:17:28 +08:00
Trinkle23897
dc451dfe88 nstep all (fix #51) 2020-06-03 13:59:47 +08:00
Trinkle23897
ff81a18f42 compute_nstep_returns (item 2 of #51) 2020-06-02 22:29:50 +08:00
Trinkle23897
de556fd22d item3 of #51 2020-05-27 11:02:23 +08:00
Imone
57bca16f94
Fix log_prob and PPO dual_clip (#49)
* Added DiagGaussian to fix log_probg

* Disable PPO dual_clip
2020-05-18 16:23:35 +08:00
Trinkle23897
70122dc03d oinit with 0 bias 2020-05-17 17:06:20 +08:00
Trinkle23897
3271c92609 orthogonal init for ppo in test script 2020-05-16 20:27:01 +08:00
Trinkle23897
c2a7caf806 add recurrent actor and critic 2020-04-30 16:31:40 +08:00
Trinkle23897
815f3522bb imitation with discrete action space 2020-04-20 11:25:20 +08:00
Trinkle23897
6bf1ea644d fix ppo 2020-04-19 14:30:42 +08:00
Trinkle23897
680fc0ffbe gae 2020-04-14 21:11:06 +08:00
Trinkle23897
7b65d43394 vanilla imitation learning 2020-04-13 19:37:27 +08:00
Trinkle23897
6a244d1fbb save_fn 2020-04-11 16:54:27 +08:00
Oblivion
4d4d0daf9e
Performance improve (#18)
* improve performance

set one thread for NN
replace detach() op with torch.no_grad()

* fix pep 8 errors
2020-04-05 09:10:21 +08:00
Trinkle23897
974ade8019 add some docs 2020-04-03 21:28:12 +08:00
Trinkle23897
c42990c725 add rllib result and fix pep8 2020-03-28 09:43:35 +08:00
Minghao Zhang
77068af526
add examples, fix some bugs (#5)
* update atari.py

* fix setup.py
pass the pytest

* fix setup.py
pass the pytest

* add args "render"

* change the tensorboard writter

* change the tensorboard writter

* change device, render, tensorboard log location

* change device, render, tensorboard log location

* remove some wrong local files

* fix some tab mistakes and the envs name in continuous/test_xx.py

* add examples and point robot maze environment

* fix some bugs during testing examples

* add dqn network and fix some args

* change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally

* add a warning to collector

* rm some unrelated files

* reformat

* fix a bug in test_dqn due to the model wrong selection
2020-03-28 07:27:18 +08:00
Trinkle23897
44f911bc31 add pytorch drl result 2020-03-27 09:04:29 +08:00
Trinkle23897
519f9f20d0 update readme 2020-03-26 17:32:51 +08:00
Trinkle23897
c505cd8205 update readme 2020-03-26 11:42:34 +08:00
Trinkle23897
fdc969b830 fix collector 2020-03-25 14:08:28 +08:00
Trinkle23897
e95218e295 sac 2020-03-23 17:17:41 +08:00
Trinkle23897
30a0fc079c td3 2020-03-23 11:34:52 +08:00
Trinkle23897
a87563b8e6 add demo of ppo continuous action task 2020-03-21 17:04:42 +08:00
Trinkle23897
c173f7bfbc fix ddpg 2020-03-21 15:31:31 +08:00
Trinkle23897
8bd8246b16 refract test code 2020-03-21 10:58:01 +08:00