Tianshou

Author	SHA1	Message	Date
n+e	fc251ab0b8	bump to v0.4.3 (#432 ) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check	2021-09-03 05:05:04 +08:00
n+e	454c86c469	fix venv seed, add TOC in docs, and split buffer.py into several files (#303 ) Things changed in this PR: - various docs update, add TOC - split buffer into several files - fix venv action_space randomness	2021-03-02 12:28:28 +08:00
ChenDRAG	150d0ec51b	Step collector implementation (#280 ) This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-19 10:33:49 +08:00
ChenDRAG	f0129f4ca7	Add CachedReplayBuffer and ReplayBufferManager (#278 ) This is the second commit of 6 commits mentioned in #274, which features minor refactor of ReplayBuffer and adding two new ReplayBuffer classes called CachedReplayBuffer and ReplayBufferManager. You can check #274 for more detail. 1. Add ReplayBufferManager (handle a list of buffers) and CachedReplayBuffer; 2. Make sure the reserved keys cannot be edited by methods like `buffer.done = xxx`; 3. Add `set_batch` method for manually choosing the batch the ReplayBuffer wants to handle; 4. Add `sample_index` method, same as `sample` but only return index instead of both index and batch data; 5. Add `prev` (one-step previous transition index), `next` (one-step next transition index) and `unfinished_index` (the last modified index whose done==False); 6. Separate `alloc_fn` method for allocating new memory for `self._meta` when a new `(key, value)` pair comes in; 7. Move buffer's documentation to `docs/tutorials/concepts.rst`. Co-authored-by: n+e <trinkle23897@gmail.com>	2021-01-29 12:23:18 +08:00
n+e	c91def6cbc	code format and update function signatures (#213 ) Cherry-pick from #200 - update the function signature - format code-style - move _compile into separate functions - fix a bug in to_torch and to_numpy (Batch) - remove None in action_range In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black))	2020-09-12 15:39:01 +08:00
n+e	140b1c2cab	Improve PER (#159 ) - use segment tree to rewrite the previous PrioReplayBuffer code, add the test - enable all Q-learning algorithms to use PER	2020-08-06 10:26:24 +08:00
Trinkle23897	dc451dfe88	nstep all (fix #51 )	2020-06-03 13:59:47 +08:00
Alexis DUBURCQ	8af7196a9a	Robust conversion from/to numpy/pytorch (#63 ) * Enable to convert Batch data back to torch. * Add torch converter to collector. * Fix * Move to_numpy/to_torch convert in dedicated utils.py. * Use to_numpy/to_torch to convert arrays. * fix lint * fix * Add unit test to check Batch from/to numpy. * Fix Batch over Batch. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-05-29 20:45:21 +08:00
Trinkle23897	f23b0dfac9	add ListReplayBuffer	2020-03-28 15:14:41 +08:00
Trinkle23897	f16e05c0e7	maybe finished collector?	2020-03-13 17:49:22 +08:00
Trinkle23897	f58c1397c6	half of collector	2020-03-12 22:20:33 +08:00
Trinkle23897	0dfb900e29	env and data	2020-03-11 09:09:56 +08:00

12 Commits