Tianshou

Author	SHA1	Message	Date
youkaichao	3a08e27ed4	Standardized behavior of Batch.cat and misc code refactor (#137 ) * code refactor; remove unused kwargs; add reward_normalization for dqn * bugfix for __setitem__ with torch.Tensor; add Batch.condense * minor fix * support cat with empty Batch * remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases * support stack with empty Batch * remove condense * refactor code to reflect the shared / partial / reserved categories of keys * add is_empty(recursive=False) * doc fix * docfix and bugfix for _is_batch_set * add doc for key reservation * bugfix for algebra operators * fix cat with lens hint * code refactor * bugfix for storing None * use ValueError instead of exception * hide lens away from users * add comment for __cat * move the computation of the initial value of lens in cat_ itself. * change the place of doc string * doc fix for Batch doc string * change recursive to recurse * doc string fix * minor fix for batch doc	2020-07-20 15:54:18 +08:00
danagi	60cfc373f8	fix #98 , support #99 (#102 ) * Add auto alpha tuning and exploration noise for sac. Add class BaseNoise and GaussianNoise for the concept of exploration noise. Add new test for sac tested in MountainCarContinuous-v0, which should benefits from the two above new feature. * add exploration noise to collector, fix example to adapt modification * fix #98 * enable off-policy to update multiple times in one step. (#99)	2020-06-27 21:40:09 +08:00
Trinkle23897	0eef0ca198	fix optional type syntax	2020-05-16 20:08:32 +08:00
Trinkle23897	9b26137cd2	add type annotation	2020-05-12 11:31:47 +08:00
Trinkle23897	075825325e	add preprocess_fn (#42 )	2020-05-05 13:39:51 +08:00
Trinkle23897	7b65d43394	vanilla imitation learning	2020-04-13 19:37:27 +08:00
Trinkle23897	6a244d1fbb	save_fn	2020-04-11 16:54:27 +08:00
Trinkle23897	74407e13da	env info log_fn (#28 )	2020-04-10 18:02:05 +08:00
Trinkle23897	86572c66d4	maybe finished rnn?	2020-04-08 21:13:15 +08:00
Trinkle23897	e0809ff135	add policy docs (#21 )	2020-04-06 19:36:59 +08:00
Trinkle23897	610390c132	add docs of collector and trainer (#20 )	2020-04-05 18:34:45 +08:00
Oblivion	9380368ca3	add an example of bullet env (experiment from jiqizhixin) (#15 ) * add_pybullet_ens_test test on pybullet envs modify some log config * delete DS_Store file * add pybullet_envs test add HalfCheetahBulletEnv-v0 test modify log config * fix pep 8 errors * add pybullet to dev * delete a line * by pass F401 * add log_interval to onpolicy_trainer * add comments * Update halfcheetahBullet_v0_sac.py	2020-04-04 11:46:18 +08:00
Trinkle23897	c42990c725	add rllib result and fix pep8	2020-03-28 09:43:35 +08:00
Minghao Zhang	77068af526	add examples, fix some bugs (#5 ) * update atari.py * fix setup.py pass the pytest * fix setup.py pass the pytest * add args "render" * change the tensorboard writter * change the tensorboard writter * change device, render, tensorboard log location * change device, render, tensorboard log location * remove some wrong local files * fix some tab mistakes and the envs name in continuous/test_xx.py * add examples and point robot maze environment * fix some bugs during testing examples * add dqn network and fix some args * change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally * add a warning to collector * rm some unrelated files * reformat * fix a bug in test_dqn due to the model wrong selection	2020-03-28 07:27:18 +08:00
Trinkle23897	fdc969b830	fix collector	2020-03-25 14:08:28 +08:00
Trinkle23897	75364cd986	ppo and early stop	2020-03-20 19:52:29 +08:00

16 Commits