Tianshou

Author	SHA1	Message	Date
Alexis DUBURCQ	3086b5c31d	Buffer refactoring to support batch over batch reliably (#93 ) * Fix support of batch over batch for Buffer. * Do not use internal __dict__ attribute to store batch data since it breaks inheritance. * Various fixes. * Improve robustness of Batch/Buffer by avoiding direct attribute assignment. Buffer refactoring. * Add axis optional argument to Batch stack method. * Add item assignment to Batch class. * Fix list support for Buffer. * Convert list to np.array by default for efficiency. * Add missing unit test for Batch. Fix unit tests. * Batch item assignment is now robust to key order. * Do not use getattr/setattr explicity for simplicity. * More flexible __setitem__. * Fixes * Remove broacasting at Batch level since it is unreliable. * Forbid item assignement for inconsistent batches. * Implement broadcasting at Buffer level. * Add more unit test for Batch item assignment. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-06-25 20:39:30 +08:00
rocknamx	506cc97ba5	fix #91 (#94 )	2020-06-25 07:02:59 +08:00
Alexis DUBURCQ	49f43e9f1f	Fix Batch to numpy compatibility (#92 ) * Fix Batch to numpy compatibility. * Fix Batch unit tests. * Fix linter * Add Batch shape method. * Remove shape and add size. Enable to reserve keys using empty batch/list. * Fix linter and unit tests. * Batch init using list of Batch. * Add unit tests. * Fix Batch __len__. * Fix unit tests. * Fix slicing * Add missing slicing unit tests. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-06-24 21:43:48 +08:00
Alexis DUBURCQ	ebc551a25e	Fix support of 0-dim numpy array (#89 ) * Fix support of 0-dim numpy array. * Do not raise exception if Batch index does not make sense since it breaks existing code. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-06-24 06:55:24 +08:00
Alexis DUBURCQ	d7dd3105bc	Fix tuple support. (#88 )	2020-06-23 23:37:26 +08:00
Alexis DUBURCQ	ec270759ab	Batch refactoring (#87 ) * Enable to stack Batch instances. Add Batch cat static method. Rename cat in cat_ since inplace. * Properly handle Batch init using np.array of dict. * WIP * Get rid of metadata. * Update UT. Replace cat by cat_ everywhere. * Do not sort Batch keys anymore for efficiency. Add items method. * Fix cat copy issue. * Add unit test to chack cat and stack methods. * Remove used import. * Fix linter issues. * Fix unit tests. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-06-23 22:50:59 +08:00
danagi	13828f6309	added noise param to collector for test phase, fixed examples to adapt modification (#86 ) * Add auto alpha tuning and exploration noise for sac. Add class BaseNoise and GaussianNoise for the concept of exploration noise. Add new test for sac tested in MountainCarContinuous-v0, which should benefits from the two above new feature. * add exploration noise to collector, fix example to adapt modification	2020-06-23 07:20:51 +08:00
Trinkle23897	a655334d00	change batch.append to batch.cat	2020-06-20 22:23:12 +08:00
Trinkle23897	aff0f9aee0	fix append batch over batch	2020-06-20 22:03:22 +08:00
Trinkle23897	81e4a16ef2	fix a bug in re-index replay buffer (fix #82 )	2020-06-17 16:37:51 +08:00
Trinkle23897	1a914336f7	add random action in collector (fix #78 )	2020-06-11 08:57:37 +08:00
Trinkle23897	f1951780ab	fix a bug of storing batch over batch data into buffer	2020-06-09 18:46:14 +08:00
Trinkle23897	560116d0b2	cheat sheet	2020-06-08 21:53:00 +08:00
Alexis DUBURCQ	66be5641b6	Fix to_numpy. (#73 ) Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-06-04 22:32:05 +08:00
Trinkle23897	dc451dfe88	nstep all (fix #51 )	2020-06-03 13:59:47 +08:00
Trinkle23897	ba1b3e54eb	fix #69	2020-06-01 08:30:09 +08:00
Alexis DUBURCQ	1fce527c77	Fix 'to_tensor' dtype/device forwarding for Batch over Batch. (#68 ) * Fix Batch to_torch method not updating dtype/device of already converted data. * Fix dtype/device to forwarded by to_tensor for Batch over Batch. * Add Unit test to check to_torch dtype/device recursive forwarding. * Batch UT check accessing data using both dict and class style. * Fix utils to_tensor dtype/device forwarding. Add Unit tests. * Fix UT. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu> Co-authored-by: n+e <463003665@qq.com>	2020-05-30 21:40:31 +08:00
Alexis DUBURCQ	529a4cf44c	Add pickle support for Batch. Fix VectorEnv. (#67 ) * Fix vecenv. * Add pickle support for Batch class. * Add Batch pickle Unit Test. * Fix lint. * Swap Batch UT. * Fix lint. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-05-30 21:29:33 +08:00
Alexis DUBURCQ	dd3e2130bb	Infer the right dtype for replay buffers. (#64 )	2020-05-29 22:27:03 +08:00
Alexis DUBURCQ	8af7196a9a	Robust conversion from/to numpy/pytorch (#63 ) * Enable to convert Batch data back to torch. * Add torch converter to collector. * Fix * Move to_numpy/to_torch convert in dedicated utils.py. * Use to_numpy/to_torch to convert arrays. * fix lint * fix * Add unit test to check Batch from/to numpy. * Fix Batch over Batch. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-05-29 20:45:21 +08:00
Alexis DUBURCQ	b5093ecb56	Minor refactor for Batch class. (#61 ) * Minor refactor for Batch class. * Fix. * Add back key sorting. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-05-29 17:56:46 +08:00
Trinkle23897	be9ce44290	fix #59	2020-05-29 11:49:47 +08:00
Trinkle23897	d2b2fa87c0	fix #56	2020-05-29 08:03:37 +08:00
Trinkle23897	de556fd22d	item3 of #51	2020-05-27 11:02:23 +08:00
Trinkle23897	0eef0ca198	fix optional type syntax	2020-05-16 20:08:32 +08:00
Trinkle23897	9b26137cd2	add type annotation	2020-05-12 11:31:47 +08:00
Trinkle23897	075825325e	add preprocess_fn (#42 )	2020-05-05 13:39:51 +08:00
Trinkle23897	c2a7caf806	add recurrent actor and critic	2020-04-30 16:31:40 +08:00
Trinkle23897	134f787e24	reserve 'policy' keyword in replay buffer	2020-04-29 17:48:48 +08:00
Trinkle23897	bb2f833d0e	support Batch of Batch and fix bugs (#38 )	2020-04-29 12:14:53 +08:00
Trinkle23897	80d661907e	Multimodal obs (#38 , #27 , #25 )	2020-04-28 20:56:02 +08:00
rocknamx	b23749463e	Prioritized DQN (#30 ) * add sum_tree.py * add prioritized replay buffer * del sum_tree.py * fix some format issues * fix weight_update bug * simply replace replaybuffer in test_dqn without weight update * weight default set to 1 * fix sampling bug when buffer is not full * rename parameter * fix formula error, add accuracy check * add PrioritizedDQN test * add test_pdqn.py * add update_weight() doc * add ref of prio dqn in readme.md and index.rst * restore test_dqn.py, fix args of test_pdqn.py	2020-04-26 12:05:58 +08:00
Trinkle23897	4fd826761c	enable null buffer in test collector	2020-04-20 11:50:18 +08:00
Trinkle23897	6bf1ea644d	fix ppo	2020-04-19 14:30:42 +08:00
Trinkle23897	7b65d43394	vanilla imitation learning	2020-04-13 19:37:27 +08:00
Trinkle23897	6a244d1fbb	save_fn	2020-04-11 16:54:27 +08:00
Trinkle23897	74407e13da	env info log_fn (#28 )	2020-04-10 18:02:05 +08:00
Trinkle23897	13086b7f64	add ignore_obs_next in buffer	2020-04-10 09:01:17 +08:00
Trinkle23897	19f2cce294	seealso and change policy dir structure	2020-04-09 21:36:53 +08:00
Trinkle23897	6da80e045a	fix rnn (#19 ), add __repr__, and fix #26	2020-04-09 19:53:45 +08:00
Trinkle23897	86572c66d4	maybe finished rnn?	2020-04-08 21:13:15 +08:00
Trinkle23897	e0809ff135	add policy docs (#21 )	2020-04-06 19:36:59 +08:00
Trinkle23897	610390c132	add docs of collector and trainer (#20 )	2020-04-05 18:34:45 +08:00
Oblivion	4d4d0daf9e	Performance improve (#18 ) * improve performance set one thread for NN replace detach() op with torch.no_grad() * fix pep 8 errors	2020-04-05 09:10:21 +08:00
Trinkle23897	b6c9db6b0b	docs for env	2020-04-04 21:02:06 +08:00
Trinkle23897	974ade8019	add some docs	2020-04-03 21:28:12 +08:00
Trinkle23897	04208e6cce	update some tutorial	2020-03-30 22:52:25 +08:00
Trinkle23897	f23b0dfac9	add ListReplayBuffer	2020-03-28 15:14:41 +08:00
Trinkle23897	c42990c725	add rllib result and fix pep8	2020-03-28 09:43:35 +08:00
Minghao Zhang	77068af526	add examples, fix some bugs (#5 ) * update atari.py * fix setup.py pass the pytest * fix setup.py pass the pytest * add args "render" * change the tensorboard writter * change the tensorboard writter * change device, render, tensorboard log location * change device, render, tensorboard log location * remove some wrong local files * fix some tab mistakes and the envs name in continuous/test_xx.py * add examples and point robot maze environment * fix some bugs during testing examples * add dqn network and fix some args * change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally * add a warning to collector * rm some unrelated files * reformat * fix a bug in test_dqn due to the model wrong selection	2020-03-28 07:27:18 +08:00

1 2 3 4

169 Commits