Tianshou

Author	SHA1	Message	Date
ChenDRAG	9b61bc620c	add logger (#295 ) This PR focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally. Things changed: 1. trainer now accepts logger (BasicLogger or LazyLogger) instead of writer; 2. remove utils.SummaryWriter;	2021-02-24 14:48:42 +08:00
Trinkle23897	e99e1b0fdd	Improve buffer.prev() & buffer.next() (#294 )	2021-02-22 19:19:22 +08:00
ChenDRAG	150d0ec51b	Step collector implementation (#280 ) This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-19 10:33:49 +08:00
n+e	c838f2f0e9	fix 2 bugs of batch (#284 ) 1. `_create_value(Batch(a={}, b=[1, 2, 3]), 10, False)` before: ```python TypeError: cannot concatenate with Batch() which is scalar ``` after: ```python Batch( a: Batch(), b: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), ) ``` 2. creating keys in a batch's subkey, e.g. ```python a = Batch(info={"key1": [0, 1], "key2": [2, 3]}) a[0] = Batch(info={"key1": 2, "key3": 4}) print(a) ``` before: ```python Batch( info: Batch( key1: array([0, 1]), key2: array([0, 3]), ), ) ``` after: ```python ValueError: Creating keys is not supported by item assignment. ``` 3. small optimization for `Batch.stack_` and `Batch.cat_`, raise ValueError when receiving invalid data format.	2021-02-02 19:28:05 +08:00
ChenDRAG	f0129f4ca7	Add CachedReplayBuffer and ReplayBufferManager (#278 ) This is the second commit of 6 commits mentioned in #274, which features minor refactor of ReplayBuffer and adding two new ReplayBuffer classes called CachedReplayBuffer and ReplayBufferManager. You can check #274 for more detail. 1. Add ReplayBufferManager (handle a list of buffers) and CachedReplayBuffer; 2. Make sure the reserved keys cannot be edited by methods like `buffer.done = xxx`; 3. Add `set_batch` method for manually choosing the batch the ReplayBuffer wants to handle; 4. Add `sample_index` method, same as `sample` but only return index instead of both index and batch data; 5. Add `prev` (one-step previous transition index), `next` (one-step next transition index) and `unfinished_index` (the last modified index whose done==False); 6. Separate `alloc_fn` method for allocating new memory for `self._meta` when a new `(key, value)` pair comes in; 7. Move buffer's documentation to `docs/tutorials/concepts.rst`. Co-authored-by: n+e <trinkle23897@gmail.com>	2021-01-29 12:23:18 +08:00
ChenDRAG	a633a6a028	update utils.network (#275 ) This is the first commit of 6 commits mentioned in #274, which features 1. Refactor of `Class Net` to support any form of MLP. 2. Enable type check in utils.network. 3. Relative change in docs/test/examples. 4. Move atari-related network to examples/atari/atari_network.py Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-01-20 16:54:13 +08:00
wizardsheng	c6f2648e87	Add C51 algorithm (#266 ) This is the PR for C51algorithm: https://arxiv.org/abs/1707.06887 1. add C51 policy in tianshou/policy/modelfree/c51.py. 2. add C51 net in tianshou/utils/net/discrete.py. 3. add C51 atari example in examples/atari/atari_c51.py. 4. add C51 statement in tianshou/policy/__init__.py. 5. add C51 test in test/discrete/test_c51.py. 6. add C51 atari results in examples/atari/results/c51/. By running "python3 atari_c51.py --task "PongNoFrameskip-v4" --batch-size 64", get best_result': '20.50 ± 0.50', in epoch 9. By running "python3 atari_c51.py --task "BreakoutNoFrameskip-v4" --n-step 1 --epoch 40", get best_reward: 407.400000 ± 31.155096 in epoch 39.	2021-01-06 10:17:45 +08:00
Nico Gürtler	5d13d8a453	Saving and loading replay buffer with HDF5 (#261 ) As mentioned in #260, this pull request is about an implementation of saving and loading the replay buffer with HDF5.	2020-12-17 08:58:43 +08:00
rocknamx	c97aa4065e	add singleton pattern version of summary_writter (#230 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2020-10-31 16:38:54 +08:00
n+e	5ed6c1c7aa	change the step in trainer (#235 ) This PR separates the `global_step` into `env_step` and `gradient_step`. In the future, the data from the collecting state will be stored under `env_step`, and the data from the updating state will be stored under `gradient_step`. Others: - add `rew_std` and `best_result` into the monitor - fix network unbounded in `test/continuous/test_sac_with_il.py` and `examples/box2d/bipedal_hardcore_sac.py` - change the dependency of ray to 1.0.0 since ray-project/ray#10134 has been resolved	2020-10-04 21:55:43 +08:00
n+e	b86d78766b	fix docs and add docstring check (#210 ) - fix broken links and out-of-the-date content - add pydocstyle and doc8 check - remove collector.seed and collector.render	2020-09-11 07:55:37 +08:00
Trinkle23897	34f714a677	Numba acceleration (#193 ) Training FPS improvement (base commit is 94bfb32): test_pdqn: 1660 (without numba) -> 1930 discrete/test_ppo: 5100 -> 5170 since nstep has little impact on overall performance, the unit test result is: GAE: 4.1s -> 0.057s nstep: 0.3s -> 0.15s (little improvement) Others: - fix a bug in ttt set_eps - keep only sumtree in segment tree implementation - dirty fix for asyncVenv check_id test	2020-09-02 13:03:32 +08:00
yingchengyang	5b49192a48	DQN Atari examples (#187 ) This PR aims to provide the script of Atari DQN setting: - A speedrun of PongNoFrameskip-v4 (finished, about half an hour in i7-8750 + GTX1060 with 1M environment steps) - A general script for all atari game Since we use multiple env for simulation, the result is slightly different from the original paper, but consider to be acceptable. It also adds another parameter save_only_last_obs for replay buffer in order to save the memory. Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-08-30 05:48:09 +08:00
n+e	94bfb32cc1	optimize training procedure and improve code coverage (#189 ) 1. add policy.eval() in all test scripts' "watch performance" 2. remove dict return support for collector preprocess_fn 3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)` 4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184) 5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard 6. add test_returns (both GAE and nstep) 7. change the type-checking order in batch.py and converter.py in order to meet the most often case first 8. fix shape inconsistency for torch.Tensor in replay buffer 9. remove `**kwargs` in ReplayBuffer 10. remove default value in batch.split() and add merge_last argument (#185) 11. improve nstep efficiency 12. add max_batchsize in onpolicy algorithms 13. potential bugfix for subproc.wait 14. fix RecurrentActorProb 15. improve the code-coverage (from 90% to 95%) and remove the dead code 16. fix some incorrect type annotation The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).	2020-08-27 12:15:18 +08:00
youkaichao	a9f9940d17	code refactor for venv (#179 ) - Refacor code to remove duplicate code - Enable async simulation for all vector envs - Remove `collector.close` and rename `VectorEnv` to `DummyVectorEnv` The abstraction of vector env changed. Prior to this pr, each vector env is almost independent. After this pr, each env is wrapped into a worker, and vector envs differ with their worker type. In fact, users can just use `BaseVectorEnv` with different workers, I keep `SubprocVectorEnv`, `ShmemVectorEnv` for backward compatibility. Co-authored-by: n+e <463003665@qq.com> Co-authored-by: magicly <magicly007@gmail.com>	2020-08-19 15:00:24 +08:00
n+e	311a2beafb	Pickle compatible for replay buffer and improve buffer.get (#182 ) fix #84 and make buffer more efficient	2020-08-16 16:26:23 +08:00
youkaichao	7f3b817b24	add policy.update to enable post process and remove collector.sample (#180 ) * add policy.update to enable post process and remove collector.sample * update doc in policy concept * remove collector.sample in doc * doc update of concepts * docs * polish * polish policy * remove collector.sample in docs * minor fix * Apply suggestions from code review just a test * doc fix Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-08-15 16:10:42 +08:00
n+e	140b1c2cab	Improve PER (#159 ) - use segment tree to rewrite the previous PrioReplayBuffer code, add the test - enable all Q-learning algorithms to use PER	2020-08-06 10:26:24 +08:00
ChenDRAG	f2bcc55a25	ShmemVectorEnv Implementation (#174 ) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-08-04 13:39:05 +08:00
youkaichao	ad395b5235	bugfix for test_async_env (#171 )	2020-07-28 20:06:01 +08:00
Alexis DUBURCQ	e024afab8c	Asynchronous sampling vector environment (#134 ) Fix #103 Co-authored-by: youkaichao <youkaichao@126.com> Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-07-26 18:01:21 +08:00
n+e	38a95c19da	Yet another 3 fix (#160 ) 1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix #162 of multi-dim action	2020-07-24 17:38:12 +08:00
youkaichao	bfeffe1f97	unify single-env and multi-env in collector (#157 ) Unify the implementation with multi-environments (wrap a single environment in a multi-environment with one envs) to greatly simplify the code. This changed the behavior of single-environment. Prior to this pr, for single environment, collector.collect(n_step=n) will step n steps. After this pr, for single environment, collector.collect(n_step=n) will step m episodes until the steps are greater than n. That is to say, collectors now always collect full episodes.	2020-07-23 16:40:53 +08:00
Alexis DUBURCQ	865ef6c693	Improve to_torch/to_numpy converters (#147 ) * Enable converting list/tuple back and forth from/to numpy/torch. * Add fallbacks. * Fix PEP8 * Update unit tests. * Type annotation. Robust dtype check. * List of object are converted individually, as a single tensor otherwise. * Improve robustness of _to_array_with_correct_type * Add unit tests. * Do not catch exception at _to_array_with_correct_type level. * Use _parse_value * Fix PEP8 * Fix _parse_value list output type fallback. * Catch torch exception. * Do not convert torch tensor during fallback. * Improve unit tests. * Add unit tests. * FIx missing import * Remove support of numpy arrays of tensors for Batch value parser. * Forbid numpy arrays of tensors. * Fix PEP8. * Fix comment. * Reduce _parse_value branch number. * Fix None value. * Forward error message for debugging purpose. * Fix _is_scalar. * More specific try/catch blocks. * Fix exception chaining. * Fix PEP8. * Fix _is_scalar. * Fix missing corner case. * Fix PEP8. * Allow Batch empty key. * Fix multi-dim array datatype check. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-07-21 16:47:56 +08:00
ChenDRAG	d09b69e594	buffer update bug fix (#154 ) * buffer update bug fix * some fix in buffer update * polish Co-authored-by: n+e <463003665@qq.com>	2020-07-20 22:12:57 +08:00
youkaichao	fe5555d2a1	write tutorials to specify the standard of Batch (#142 ) * add doc for len exceptions * doc move; unify is_scalar_value function * remove some issubclass check * bugfix for shape of Batch(a=1) * keep moving doc * keep writing batch tutorial * draft version of Batch tutorial done * improving doc * keep improving doc * batch tutorial done * rename _is_number * rename _is_scalar * shape property do not raise exception * restore some doc string * grammarly [ci skip] * grammarly + fix warning of building docs * polish docs * trim and re-arrange batch tutorial * go straight to the point * minor fix for batch doc * add shape / len in basic usage * keep improving tutorial * unify _to_array_with_correct_type to remove duplicate code * delegate type convertion to Batch.__init__ * further delegate type convertion to Batch.__init__ * bugfix for setattr * add a _parse_value function * remove dummy function call * polish docs Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-07-20 15:54:18 +08:00
youkaichao	3a08e27ed4	Standardized behavior of Batch.cat and misc code refactor (#137 ) * code refactor; remove unused kwargs; add reward_normalization for dqn * bugfix for __setitem__ with torch.Tensor; add Batch.condense * minor fix * support cat with empty Batch * remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases * support stack with empty Batch * remove condense * refactor code to reflect the shared / partial / reserved categories of keys * add is_empty(recursive=False) * doc fix * docfix and bugfix for _is_batch_set * add doc for key reservation * bugfix for algebra operators * fix cat with lens hint * code refactor * bugfix for storing None * use ValueError instead of exception * hide lens away from users * add comment for __cat * move the computation of the initial value of lens in cat_ itself. * change the place of doc string * doc fix for Batch doc string * change recursive to recurse * doc string fix * minor fix for batch doc	2020-07-20 15:54:18 +08:00
youkaichao	26fb87433d	Improve collector (#125 ) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-07-13 17:33:01 +08:00
youkaichao	5599a6d1a6	Fix padding of inconsistent keys with Batch.stack and Batch.cat (#130 ) * re-implement Batch.stack and add testcases * add doc for Batch.stack * reuse _create_values and refactor stack_ & cat_ * fix pep8 * fix docs * raise exception for stacking with partial keys and axis!=0 * minor fix * minor fix Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-07-13 17:33:01 +08:00
youkaichao	affeec13de	Improve Batch (#128 ) * minor polish * improve and implement Batch.cat_ * bugfix for buffer.sample with field impt_weight * restore the usage of a.cat_(b) * fix 2 bugs in batch and add corresponding unittest * code fix for update * update is_empty to recognize empty over empty; bugfix for len * bugfix for update and add testcase * add testcase of update * fix docs * fix docs * fix docs [ci skip] * fix docs [ci skip] Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-07-13 17:33:01 +08:00
youkaichao	2564e989fb	Improve Batch (#126 ) * make sure the key type of Batch is string, and add unit tests * add is_empty() function and unit tests * enable cat of mixing dict and Batch, just like stack	2020-07-13 17:33:01 +08:00
youkaichao	ff99662fe6	bugfix for update with empty buffer; remove duplicate variable _weight_sum in PrioritizedReplayBuffer (#120 ) * bugfix for update with empty buffer; remove duplicate variable _weight_sum in PrioritizedReplayBuffer * point out that ListReplayBuffer cannot be sampled * remove useless _amortization_counter variable	2020-07-10 08:24:11 +08:00
Alexis DUBURCQ	aa3c453f42	Raise exception for Batch __getitem__. (#119 ) * Raise exception for Batch __getitem__. * Try fixing access to reserved key. * Simpler patch. * Add unit test to check indexing empty Batch raises an exception. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-07-08 22:29:37 +08:00
youkaichao	7f9a1f1328	add type check for each element rather than the first element (#112 ) This PR does the following: - improvement: dramatic reduce of the call to _is_batch_set - bugfix: list(Batch()) fail; Batch(a=[torch.ones(3), torch.ones(3)]) fail; - misc: add type check for each element rather than the first element; add test case; _create_value with torch.Tensor does not have np.object type;	2020-07-08 21:00:00 +08:00
Alexis DUBURCQ	69caf89908	Fix to_torch converters (#111 ) * Fix to_torch converters. * to_torch now convert any object Torch Tensor-compatible. * Fix linter. * Fix Batch to_torch to convert any Torch Tensor-compatible data. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-07-07 18:40:55 +08:00
youkaichao	8913bf36b1	change Batch.empty to in-place fill; add copy option for Batch construction (#110 ) * in-place empty_ for Batch * change Batch.empty to in-place fill; add copy option for Batch construction * type signiture & remove shadow names for copy * add doc for data type (only support numbers and object data type) * add unit test for Batch copy * fix pep8 * add test case for Batch.empty * doc fix * fix pep8 * use object to test Batch * test commit * refact * change Batch(copy) testcase * minor fix Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-07-06 20:30:15 +08:00
n+e	db0e2e5cd2	Advanced Batch slicing & minor fix of RNN support (#106 ) * add shape property and modify __getitem__ * change Batch.size to Batch.shape * setattr * Batch.empty * remove scalar in advanced slicing * modify empty_ and __getitem__ * missing testcase * fix empty	2020-06-30 18:02:44 +08:00
Trinkle23897	e0f4862d01	store RNN hidden states in policy._state and add sample_avail in buffer (#19 )	2020-06-29 12:18:52 +08:00
Alexis DUBURCQ	a951a32487	Enable partial stacking at Batch level (#100 ) * Enable stacking of partially matching Batch instances. * Fix list support for getitem. * Fix Batch 'size' method. * Update Batch documentation.	2020-06-27 09:06:40 +08:00
Alexis DUBURCQ	70aa7bf93e	Use lower-level API to reduce overhead. (#97 ) * Use lower-level API to reduce overhead. * Further improvements. * Buffer _add_to_buffer improvement. * Do not use _data field to store Batch data to avoid overhead. Add back _meta field in Buffer. * Restore metadata attribute to store batch in Buffer. * Move out nested methods. * Update try/catch instead of actual check to efficiency. * Remove unsed branches for efficiency. * Use np.array over list when possible for efficiency. * Final performance improvement. * Add unit tests for Batch size method. * Add missing stack unit tests. * Enforce Buffer initialization to zero. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-06-26 18:37:50 +08:00
Alexis DUBURCQ	3086b5c31d	Buffer refactoring to support batch over batch reliably (#93 ) * Fix support of batch over batch for Buffer. * Do not use internal __dict__ attribute to store batch data since it breaks inheritance. * Various fixes. * Improve robustness of Batch/Buffer by avoiding direct attribute assignment. Buffer refactoring. * Add axis optional argument to Batch stack method. * Add item assignment to Batch class. * Fix list support for Buffer. * Convert list to np.array by default for efficiency. * Add missing unit test for Batch. Fix unit tests. * Batch item assignment is now robust to key order. * Do not use getattr/setattr explicity for simplicity. * More flexible __setitem__. * Fixes * Remove broacasting at Batch level since it is unreliable. * Forbid item assignement for inconsistent batches. * Implement broadcasting at Buffer level. * Add more unit test for Batch item assignment. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-06-25 20:39:30 +08:00
Alexis DUBURCQ	49f43e9f1f	Fix Batch to numpy compatibility (#92 ) * Fix Batch to numpy compatibility. * Fix Batch unit tests. * Fix linter * Add Batch shape method. * Remove shape and add size. Enable to reserve keys using empty batch/list. * Fix linter and unit tests. * Batch init using list of Batch. * Add unit tests. * Fix Batch __len__. * Fix unit tests. * Fix slicing * Add missing slicing unit tests. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-06-24 21:43:48 +08:00
Alexis DUBURCQ	ebc551a25e	Fix support of 0-dim numpy array (#89 ) * Fix support of 0-dim numpy array. * Do not raise exception if Batch index does not make sense since it breaks existing code. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-06-24 06:55:24 +08:00
Alexis DUBURCQ	d7dd3105bc	Fix tuple support. (#88 )	2020-06-23 23:37:26 +08:00
Alexis DUBURCQ	ec270759ab	Batch refactoring (#87 ) * Enable to stack Batch instances. Add Batch cat static method. Rename cat in cat_ since inplace. * Properly handle Batch init using np.array of dict. * WIP * Get rid of metadata. * Update UT. Replace cat by cat_ everywhere. * Do not sort Batch keys anymore for efficiency. Add items method. * Fix cat copy issue. * Add unit test to chack cat and stack methods. * Remove used import. * Fix linter issues. * Fix unit tests. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-06-23 22:50:59 +08:00
Trinkle23897	a655334d00	change batch.append to batch.cat	2020-06-20 22:23:12 +08:00
Trinkle23897	aff0f9aee0	fix append batch over batch	2020-06-20 22:03:22 +08:00
Trinkle23897	81e4a16ef2	fix a bug in re-index replay buffer (fix #82 )	2020-06-17 16:37:51 +08:00
Trinkle23897	3774258cc7	fix unittest	2020-06-11 09:07:45 +08:00
Trinkle23897	1a914336f7	add random action in collector (fix #78 )	2020-06-11 08:57:37 +08:00

1 2

74 Commits