Tianshou

Author	SHA1	Message	Date
Michael Panchenko	b12983622b	Docs: added sorting order for autogenerated toc	2023-12-04 13:52:46 +01:00
Michael Panchenko	a5685619ce	Docs: generate all api docs automatically Reinstate the -W option Several overall improvements in docs Fixed multiple links	2023-12-04 11:48:09 +01:00
Michael Panchenko	006577da08	WIP - restructure doc files	2023-12-04 11:48:09 +01:00
Michael Panchenko	d4b6d9b250	WIP - restructure doc files	2023-12-04 11:47:40 +01:00
carlocagnetta	fa55217118	Remove get_started.rst page with links to outdated notebooks	2023-12-04 11:47:09 +01:00
carlocagnetta	ca2e4a1d96	Fix docs index	2023-12-04 11:45:52 +01:00
carlocagnetta	8f0c62ace3	Documentation update: jupyter-book running on ReadTheDocs including tutorial notebooks	2023-12-04 11:45:51 +01:00
Dominik Jain	89ce40edc0	Docs: Add tianshou.highlevel to docs build via auto-generated .rst files	2023-10-18 22:45:23 +02:00
Jiayi Weng	6449a43261	Fix documentation build (#951 ) Close #941 rtfd build link: https://readthedocs.org/projects/tianshou/builds/22019877/ Also -- fix two small issues reported by users, see #928 and #930 Note: I created the branch in thu-ml:tianshou instead of Trinkle23897:tianshou to quickly check the rtfd build. It's not a good process since every commit would trigger twice CI pipelines :(	2023-09-26 08:24:08 -07:00
Jiayi Weng	61182450b6	add py.typed, drop 3.6/3.7, support 3.11 (#910 ) closing #892 #901	2023-08-10 14:13:46 -07:00
Jiayi Weng	d5d521b329	fix conda installation command (#830 ) close #828	2023-03-19 17:40:47 -07:00
Juno T	d42a5fb354	Hindsight Experience Replay as a replay buffer (#753 ) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png)	2022-10-30 16:54:54 -07:00
Yi Su	df35718992	Implement TD3+BC for offline RL (#660 ) - implement TD3+BC for offline RL; - fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;	2022-06-07 00:39:37 +08:00
Anas BELFADIL	53e6b0408d	Add BranchingDQN for large discrete action spaces (#618 )	2022-05-15 21:40:32 +08:00
Yi Su	dd16818ce4	implement REDQ based on original contribution by @Jimenius (#623 ) Co-authored-by: Minhui Li <limh@lamda.nju.edu.cn>	2022-05-01 00:06:00 +08:00
ChenDRAG	57ecebde38	Add jupyter notebook tutorials using Google Colaboratory (#599 )	2022-04-19 20:58:52 +08:00
Andrea Boscolo Camiletto	2336a7db1b	fixed typo in rainbow DQN paper reference (#569 ) * fixed typo in rainbow DQN paper ref * fix gym==0.23 ci failure Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-03-16 21:38:51 +08:00
Yi Su	2377f2f186	Implement Generative Adversarial Imitation Learning (GAIL) (#550 ) Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531, #173)	2022-03-06 23:57:15 +08:00
Chengqi Duan	d85bc19269	update dqn tutorial and add envpool to docs (#526 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-02-15 06:39:47 +08:00
Bernard Tan	bc53ead273	Implement CQLPolicy and offline_cql example (#506 )	2022-01-16 05:30:21 +08:00
Yi Su	a59d96d041	Add Intrinsic Curiosity Module (#503 )	2022-01-15 02:43:48 +08:00
Bernard Tan	5c5a3db94e	Implement BCQPolicy and offline_bcq example (#480 ) This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning). Example usage is in the examples/offline/offline_bcq.py.	2021-11-22 22:21:02 +08:00
Ayush Chaurasia	63d752ee0b	W&B: Add usage in the docs (#463 )	2021-10-13 23:28:25 +08:00
Jiayi Weng	e45e2096d8	add multi-GPU support (#461 ) add a new class DataParallelNet	2021-10-06 01:39:14 +08:00
Ayush Chaurasia	22d7bf38c8	Improve W&B logger (#441 ) - rename WandBLogger -> WandbLogger - add save_data and restore_data - allow more input arguments for wandb init - integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py - documentation update	2021-09-24 21:52:23 +08:00
Yi Su	291be08d43	Add Rainbow DQN (#386 ) - add RainbowPolicy - add `set_beta` method in prio_buffer - add NoisyLinear in utils/network	2021-08-29 23:34:59 +08:00
Yi Su	c0bc8e00ca	Add Fully-parameterized Quantile Function (#376 )	2021-06-15 11:59:02 +08:00
Yi Su	f3169b4c1f	Add Implicit Quantile Network (#371 )	2021-05-29 09:44:23 +08:00
n+e	458028a326	fix docs (#373 ) - fix css style error - fix mujoco benchmark result	2021-05-23 12:43:03 +08:00
Yi Su	8f7bc65ac7	Add discrete Critic Regularized Regression (#367 )	2021-05-19 13:29:56 +08:00
Yi Su	b5c3ddabfa	Add discrete Conservative Q-Learning for offline RL (#359 ) Co-authored-by: Yi Su <yi.su@antgroup.com> Co-authored-by: Yi Su <yi.su@antfin.com>	2021-05-12 09:24:48 +08:00
ChenDRAG	1dcf65fe21	Add NPG policy (#344 )	2021-04-21 09:52:15 +08:00
ChenDRAG	5057b5c89e	Add TRPO policy (#337 )	2021-04-16 20:37:12 +08:00
ChenDRAG	6426a39796	ppo benchmark (#330 )	2021-03-30 11:50:35 +08:00
wizardsheng	1eb6137645	Add QR-DQN algorithm (#276 ) This is the PR for QR-DQN algorithm: https://arxiv.org/abs/1710.10044 1. add QR-DQN policy in tianshou/policy/modelfree/qrdqn.py. 2. add QR-DQN net in examples/atari/atari_network.py. 3. add QR-DQN atari example in examples/atari/atari_qrdqn.py. 4. add QR-DQN statement in tianshou/policy/init.py. 5. add QR-DQN unit test in test/discrete/test_qrdqn.py. 6. add QR-DQN atari results in examples/atari/results/qrdqn/. 7. add compute_q_value in DQNPolicy and C51Policy for simplify forward function. 8. move `with torch.no_grad():` from `_target_q` to BasePolicy By running "python3 atari_qrdqn.py --task "PongNoFrameskip-v4" --batch-size 64", get best_result': '19.8 ± 0.40', in epoch 8.	2021-01-28 09:27:05 +08:00
Jialu Zhu	a511cb4779	Add offline trainer and discrete BCQ algorithm (#263 ) The result needs to be tuned after `done` issue fixed. Co-authored-by: n+e <trinkle23897@gmail.com>	2021-01-20 18:13:04 +08:00
wizardsheng	c6f2648e87	Add C51 algorithm (#266 ) This is the PR for C51algorithm: https://arxiv.org/abs/1707.06887 1. add C51 policy in tianshou/policy/modelfree/c51.py. 2. add C51 net in tianshou/utils/net/discrete.py. 3. add C51 atari example in examples/atari/atari_c51.py. 4. add C51 statement in tianshou/policy/__init__.py. 5. add C51 test in test/discrete/test_c51.py. 6. add C51 atari results in examples/atari/results/c51/. By running "python3 atari_c51.py --task "PongNoFrameskip-v4" --batch-size 64", get best_result': '20.50 ± 0.50', in epoch 9. By running "python3 atari_c51.py --task "BreakoutNoFrameskip-v4" --n-step 1 --epoch 40", get best_reward: 407.400000 ± 31.155096 in epoch 39.	2021-01-06 10:17:45 +08:00
n+e	d87d31a705	Update Anaconda support (#228 ) conda install -c conda-forge tianshou Related PR: conda-forge/staged-recipes#12719	2020-09-25 15:07:36 +08:00
Yao Feng	dcfcbb37f4	add PSRL policy (#202 ) Add PSRL policy in tianshou/policy/modelbase/psrl.py. Co-authored-by: n+e <trinkle23897@cmu.edu>	2020-09-23 20:57:33 +08:00
rocknamx	bf39b9ef7d	clarify updating state (#224 ) Add an indicator(i.e. `self.learning`) of learning will be convenient for distinguishing state of policy. Meanwhile, the state of `self.training` will be undisputed in the training stage. Related issue: #211 Others: - fix a bug in DDQN: target_q could not be sampled from np.random.rand - fix a bug in DQN atari net: it should add a ReLU before the last layer - fix a bug in collector timing Co-authored-by: n+e <463003665@qq.com>	2020-09-22 16:28:46 +08:00
n+e	b284ace102	type check in unit test (#200 ) Fix #195: Add mypy test in .github/workflows/docs_and_lint.yml. Also remove the out-of-the-date api	2020-09-13 19:31:50 +08:00
n+e	b86d78766b	fix docs and add docstring check (#210 ) - fix broken links and out-of-the-date content - add pydocstyle and doc8 check - remove collector.seed and collector.render	2020-09-11 07:55:37 +08:00
n+e	64af7ea839	fix critical bugs in MAPolicy and docs update (#207 ) - fix a bug in MAPolicy: `buffer.rew = Batch()` doesn't change `buffer.rew` (thanks mypy) - polish examples/box2d/bipedal_hardcore_sac.py - several docs update - format setup.py and bump version to 0.2.7	2020-09-08 21:10:48 +08:00
n+e	8bb8ecba6e	set policy.eval() before collector.collect (#204 ) * fix #203 * no_grad argument in collector.collect	2020-09-06 16:20:16 +08:00
n+e	140b1c2cab	Improve PER (#159 ) - use segment tree to rewrite the previous PrioReplayBuffer code, add the test - enable all Q-learning algorithms to use PER	2020-08-06 10:26:24 +08:00
yingchengyang	99a1d40e85	Dueling DQN (#170 ) Co-authored-by: n+e <463003665@qq.com>	2020-07-29 19:44:42 +08:00
n+e	bd9c3c7f8d	docs fix and v0.2.5 (#156 ) * pre * update docs * update docs * $ in bash * size -> hidden_layer_size * doctest * doctest again * filter a warning * fix bug * fix examples * test fail * test succ	2020-07-22 14:42:08 +08:00
youkaichao	8c32d99c65	Add multi-agent example: tic-tac-toe (#122 ) * make fileds with empty Batch rather than None after reset * dummy code * remove dummy * add reward_length argument for collector * Improve Batch (#126) * make sure the key type of Batch is string, and add unit tests * add is_empty() function and unit tests * enable cat of mixing dict and Batch, just like stack * bugfix for reward_length * add get_final_reward_fn argument to collector to deal with marl * minor polish * remove multibuf * minor polish * improve and implement Batch.cat_ * bugfix for buffer.sample with field impt_weight * restore the usage of a.cat_(b) * fix 2 bugs in batch and add corresponding unittest * code fix for update * update is_empty to recognize empty over empty; bugfix for len * bugfix for update and add testcase * add testcase of update * make fileds with empty Batch rather than None after reset * dummy code * remove dummy * add reward_length argument for collector * bugfix for reward_length * add get_final_reward_fn argument to collector to deal with marl * make sure the key type of Batch is string, and add unit tests * add is_empty() function and unit tests * enable cat of mixing dict and Batch, just like stack * dummy code * remove dummy * add multi-agent example: tic-tac-toe * move TicTacToeEnv to a separate file * remove dummy MANet * code refactor * move tic-tac-toe example to test * update doc with marl-example * fix docs * reduce the threshold * revert * update player id to start from 1 and change player to agent; keep coding * add reward_length argument for collector * Improve Batch (#128) * minor polish * improve and implement Batch.cat_ * bugfix for buffer.sample with field impt_weight * restore the usage of a.cat_(b) * fix 2 bugs in batch and add corresponding unittest * code fix for update * update is_empty to recognize empty over empty; bugfix for len * bugfix for update and add testcase * add testcase of update * fix docs * fix docs * fix docs [ci skip] * fix docs [ci skip] Co-authored-by: Trinkle23897 <463003665@qq.com> * refact * re-implement Batch.stack and add testcases * add doc for Batch.stack * reward_metric * modify flag * minor fix * reuse _create_values and refactor stack_ & cat_ * fix pep8 * fix reward stat in collector * fix stat of collector, simplify test/base/env.py * fix docs * minor fix * raise exception for stacking with partial keys and axis!=0 * minor fix * minor fix * minor fix * marl-examples * add condense; bugfix for torch.Tensor; code refactor * marl example can run now * enable tic tac toe with larger board size and win-size * add test dependency * Fix padding of inconsistent keys with Batch.stack and Batch.cat (#130) * re-implement Batch.stack and add testcases * add doc for Batch.stack * reuse _create_values and refactor stack_ & cat_ * fix pep8 * fix docs * raise exception for stacking with partial keys and axis!=0 * minor fix * minor fix Co-authored-by: Trinkle23897 <463003665@qq.com> * stash * let agent learn to play as agent 2 which is harder * code refactor * Improve collector (#125) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com> * marl for tic-tac-toe and general gomoku * update default gamma to 0.1 for tic tac toe to win earlier * fix name typo; change default game config; add rew_norm option * fix pep8 * test commit * mv test dir name * add rew flag * fix torch.optim import error and madqn rew_norm * remove useless kwargs * Vector env enable select worker (#132) * Enable selecting worker for vector env step method. * Update collector to match new vecenv selective worker behavior. * Bug fix. * Fix rebase Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu> * show the last move of tictactoe by capital letters * add multi-agent tutorial * fix link * Standardized behavior of Batch.cat and misc code refactor (#137) * code refactor; remove unused kwargs; add reward_normalization for dqn * bugfix for __setitem__ with torch.Tensor; add Batch.condense * minor fix * support cat with empty Batch * remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases * support stack with empty Batch * remove condense * refactor code to reflect the shared / partial / reserved categories of keys * add is_empty(recursive=False) * doc fix * docfix and bugfix for _is_batch_set * add doc for key reservation * bugfix for algebra operators * fix cat with lens hint * code refactor * bugfix for storing None * use ValueError instead of exception * hide lens away from users * add comment for __cat * move the computation of the initial value of lens in cat_ itself. * change the place of doc string * doc fix for Batch doc string * change recursive to recurse * doc string fix * minor fix for batch doc * write tutorials to specify the standard of Batch (#142) * add doc for len exceptions * doc move; unify is_scalar_value function * remove some issubclass check * bugfix for shape of Batch(a=1) * keep moving doc * keep writing batch tutorial * draft version of Batch tutorial done * improving doc * keep improving doc * batch tutorial done * rename _is_number * rename _is_scalar * shape property do not raise exception * restore some doc string * grammarly [ci skip] * grammarly + fix warning of building docs * polish docs * trim and re-arrange batch tutorial * go straight to the point * minor fix for batch doc * add shape / len in basic usage * keep improving tutorial * unify _to_array_with_correct_type to remove duplicate code * delegate type convertion to Batch.__init__ * further delegate type convertion to Batch.__init__ * bugfix for setattr * add a _parse_value function * remove dummy function call * polish docs Co-authored-by: Trinkle23897 <463003665@qq.com> * bugfix for mapolicy * pretty code * remove debug code; remove condense * doc fix * check before get_agents in tutorials/tictactoe * tutorial * fix * minor fix for batch doc * minor polish * faster test_ttt * improve tic-tac-toe environment * change default epoch and step-per-epoch for tic-tac-toe * fix mapolicy * minor polish for mapolicy * 90% to 80% (need to change the tutorial) * win rate * show step number at board * simplify mapolicy * minor polish for mapolicy * remove MADQN * fix pep8 * change legal_actions to mask (need to update docs) * simplify maenv * fix typo * move basevecenv to single file * separate RandomAgent * update docs * grammarly * fix pep8 * win rate typo * format in cheatsheet * use bool mask directly * update doc for boolean mask Co-authored-by: Trinkle23897 <463003665@qq.com> Co-authored-by: Alexis DUBURCQ <alexis.duburcq@gmail.com> Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-07-21 14:59:49 +08:00
youkaichao	fe5555d2a1	write tutorials to specify the standard of Batch (#142 ) * add doc for len exceptions * doc move; unify is_scalar_value function * remove some issubclass check * bugfix for shape of Batch(a=1) * keep moving doc * keep writing batch tutorial * draft version of Batch tutorial done * improving doc * keep improving doc * batch tutorial done * rename _is_number * rename _is_scalar * shape property do not raise exception * restore some doc string * grammarly [ci skip] * grammarly + fix warning of building docs * polish docs * trim and re-arrange batch tutorial * go straight to the point * minor fix for batch doc * add shape / len in basic usage * keep improving tutorial * unify _to_array_with_correct_type to remove duplicate code * delegate type convertion to Batch.__init__ * further delegate type convertion to Batch.__init__ * bugfix for setattr * add a _parse_value function * remove dummy function call * polish docs Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-07-20 15:54:18 +08:00
Trinkle23897	513573ea82	add link	2020-06-08 22:20:52 +08:00

1 2

71 Commits