16 Commits

Author SHA1 Message Date
youkaichao
3a08e27ed4 Standardized behavior of Batch.cat and misc code refactor (#137)
* code refactor; remove unused kwargs; add reward_normalization for dqn

* bugfix for __setitem__ with torch.Tensor; add Batch.condense

* minor fix

* support cat with empty Batch

* remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases

* support stack with empty Batch

* remove condense

* refactor code to reflect the shared / partial / reserved categories of keys

* add is_empty(recursive=False)

* doc fix

* docfix and bugfix for _is_batch_set

* add doc for key reservation

* bugfix for algebra operators

* fix cat with lens hint

* code refactor

* bugfix for storing None

* use ValueError instead of exception

* hide lens away from users

* add comment for __cat

* move the computation of the initial value of lens in cat_ itself.

* change the place of doc string

* doc fix for Batch doc string

* change recursive to recurse

* doc string fix

* minor fix for batch doc
2020-07-20 15:54:18 +08:00
danagi
60cfc373f8
fix #98, support #99 (#102)
* Add auto alpha tuning and exploration noise for sac.
Add class BaseNoise and GaussianNoise for the concept of exploration noise.
Add new test for sac tested in MountainCarContinuous-v0,
which should benefits from the two above new feature.

* add exploration noise to collector, fix example to adapt modification

* fix #98

* enable off-policy to update multiple times in one step. (#99)
2020-06-27 21:40:09 +08:00
Trinkle23897
0eef0ca198 fix optional type syntax 2020-05-16 20:08:32 +08:00
Trinkle23897
9b26137cd2 add type annotation 2020-05-12 11:31:47 +08:00
Trinkle23897
075825325e add preprocess_fn (#42) 2020-05-05 13:39:51 +08:00
Trinkle23897
7b65d43394 vanilla imitation learning 2020-04-13 19:37:27 +08:00
Trinkle23897
6a244d1fbb save_fn 2020-04-11 16:54:27 +08:00
Trinkle23897
74407e13da env info log_fn (#28) 2020-04-10 18:02:05 +08:00
Trinkle23897
86572c66d4 maybe finished rnn? 2020-04-08 21:13:15 +08:00
Trinkle23897
e0809ff135 add policy docs (#21) 2020-04-06 19:36:59 +08:00
Trinkle23897
610390c132 add docs of collector and trainer (#20) 2020-04-05 18:34:45 +08:00
Oblivion
9380368ca3
add an example of bullet env (experiment from jiqizhixin) (#15)
* add_pybullet_ens_test

test on pybullet envs
modify some log config

* delete DS_Store file

* add pybullet_envs test

add HalfCheetahBulletEnv-v0 test
modify log config

* fix pep 8 errors

* add pybullet to dev

* delete a line

* by pass F401

* add log_interval to onpolicy_trainer

* add comments

* Update halfcheetahBullet_v0_sac.py
2020-04-04 11:46:18 +08:00
Trinkle23897
c42990c725 add rllib result and fix pep8 2020-03-28 09:43:35 +08:00
Minghao Zhang
77068af526
add examples, fix some bugs (#5)
* update atari.py

* fix setup.py
pass the pytest

* fix setup.py
pass the pytest

* add args "render"

* change the tensorboard writter

* change the tensorboard writter

* change device, render, tensorboard log location

* change device, render, tensorboard log location

* remove some wrong local files

* fix some tab mistakes and the envs name in continuous/test_xx.py

* add examples and point robot maze environment

* fix some bugs during testing examples

* add dqn network and fix some args

* change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally

* add a warning to collector

* rm some unrelated files

* reformat

* fix a bug in test_dqn due to the model wrong selection
2020-03-28 07:27:18 +08:00
Trinkle23897
fdc969b830 fix collector 2020-03-25 14:08:28 +08:00
Trinkle23897
75364cd986 ppo and early stop 2020-03-20 19:52:29 +08:00