57 Commits

Author SHA1 Message Date
Mohammad Mahdi Rahimi
c7e2e56fac
Pettingzoo support (#494)
Co-authored-by: Rodrigo de Lazcano <r.l.p.v96@gmail.com>
Co-authored-by: J K Terry <justinkterry@gmail.com>
2022-02-15 22:56:45 +08:00
Chengqi Duan
9c100e0705
Enable venvs.reset() concurrent execution (#517)
- change the internal API name of worker: send_action -> send, get_result -> recv (align with envpool)
- add a timing test for venvs.reset() to make sure the concurrent execution
- change venvs.reset() logic

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-08 00:40:01 +08:00
ChenDRAG
c25926dd8f
Formalize variable names (#509)
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-01-30 00:53:56 +08:00
Markus28
8f19a86966
Implements set_env_attr and get_env_attr for vector environments (#478)
close #473
2021-11-03 00:08:00 +08:00
n+e
fc251ab0b8
bump to v0.4.3 (#432)
* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check
2021-09-03 05:05:04 +08:00
Yi Su
b5c3ddabfa
Add discrete Conservative Q-Learning for offline RL (#359)
Co-authored-by: Yi Su <yi.su@antgroup.com>
Co-authored-by: Yi Su <yi.su@antfin.com>
2021-05-12 09:24:48 +08:00
n+e
ff4d3cd714
Support different state size and fix exception in venv.__del__ (#352)
- Batch: do not raise error when it finds list of np.array with different shape[0].

- Venv's obs: add try...except block for np.stack(obs_list)

- remove venv.__del__ since it is buggy
2021-04-25 15:23:46 +08:00
n+e
09692c84fe
fix numpy>=1.20 typing check (#323)
Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list).
2021-03-30 16:06:03 +08:00
n+e
ec23c7efe9
fix qvalue mask_action error for obs_next (#310)
* fix #309
* remove for-loop in dqn expl_noise
2021-03-15 08:06:24 +08:00
ChenDRAG
243ab43b3c
support observation normalization in BaseVectorEnv (#308)
add RunningMeanStd
2021-03-11 20:50:20 +08:00
n+e
454c86c469
fix venv seed, add TOC in docs, and split buffer.py into several files (#303)
Things changed in this PR:

- various docs update, add TOC
- split buffer into several files
- fix venv action_space randomness
2021-03-02 12:28:28 +08:00
n+e
31e7f445d1
fix vecenv action_space randomness (#300) 2021-03-01 15:44:03 +08:00
rocknamx
c97aa4065e
add singleton pattern version of summary_writter (#230)
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2020-10-31 16:38:54 +08:00
n+e
b284ace102
type check in unit test (#200)
Fix #195: Add mypy test in .github/workflows/docs_and_lint.yml.

Also remove the out-of-the-date api
2020-09-13 19:31:50 +08:00
n+e
c91def6cbc
code format and update function signatures (#213)
Cherry-pick from #200 

- update the function signature
- format code-style
- move _compile into separate functions
- fix a bug in to_torch and to_numpy (Batch)
- remove None in action_range

In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black))
2020-09-12 15:39:01 +08:00
n+e
b86d78766b
fix docs and add docstring check (#210)
- fix broken links and out-of-the-date content
- add pydocstyle and doc8 check
- remove collector.seed and collector.render
2020-09-11 07:55:37 +08:00
n+e
94bfb32cc1
optimize training procedure and improve code coverage (#189)
1. add policy.eval() in all test scripts' "watch performance"
2. remove dict return support for collector preprocess_fn
3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)`
4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184)
5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard
6. add test_returns (both GAE and nstep)
7. change the type-checking order in batch.py and converter.py in order to meet the most often case first
8. fix shape inconsistency for torch.Tensor in replay buffer
9. remove `**kwargs` in ReplayBuffer
10. remove default value in batch.split() and add merge_last argument (#185)
11. improve nstep efficiency
12. add max_batchsize in onpolicy algorithms
13. potential bugfix for subproc.wait
14. fix RecurrentActorProb
15. improve the code-coverage (from 90% to 95%) and remove the dead code
16. fix some incorrect type annotation

The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).
2020-08-27 12:15:18 +08:00
youkaichao
a9f9940d17
code refactor for venv (#179)
- Refacor code to remove duplicate code

- Enable async simulation for all vector envs

- Remove `collector.close` and rename `VectorEnv` to `DummyVectorEnv`

The abstraction of vector env changed.

Prior to this pr, each vector env is almost independent.

After this pr, each env is wrapped into a worker, and vector envs differ with their worker type. In fact, users can just use `BaseVectorEnv` with different workers, I keep `SubprocVectorEnv`, `ShmemVectorEnv` for backward compatibility.

Co-authored-by: n+e <463003665@qq.com>
Co-authored-by: magicly <magicly007@gmail.com>
2020-08-19 15:00:24 +08:00
ChenDRAG
f2bcc55a25
ShmemVectorEnv Implementation (#174)
* add shmem vecenv, some add&fix in test_env

* generalize test_env IO

* pep8 fix

* comment update

* style change

* pep8 fix

* style fix

* minor fix

* fix a bug

* test fix

* change env

* testenv bug fix& shmem support recurse dict

* bugfix

* pep8 fix

* _NP_TO_CT enhance

* doc update

* docstring update

* pep8 fix

* style change

* style fix

* remove assert

* minor

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-08-04 13:39:05 +08:00
Alexis DUBURCQ
e024afab8c
Asynchronous sampling vector environment (#134)
Fix #103

Co-authored-by: youkaichao <youkaichao@126.com>
Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-26 18:01:21 +08:00
n+e
38a95c19da
Yet another 3 fix (#160)
1. DQN learn should keep eps=0

2. Add a warning of env.seed in VecEnv

3. fix #162 of multi-dim action
2020-07-24 17:38:12 +08:00
youkaichao
8c32d99c65
Add multi-agent example: tic-tac-toe (#122)
* make fileds with empty Batch rather than None after reset

* dummy code

* remove dummy

* add reward_length argument for collector

* Improve Batch (#126)

* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack

* bugfix for reward_length

* add get_final_reward_fn argument to collector to deal with marl

* minor polish

* remove multibuf

* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* make fileds with empty Batch rather than None after reset

* dummy code

* remove dummy

* add reward_length argument for collector

* bugfix for reward_length

* add get_final_reward_fn argument to collector to deal with marl

* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack

* dummy code

* remove dummy

* add multi-agent example: tic-tac-toe

* move TicTacToeEnv to a separate file

* remove dummy MANet

* code refactor

* move tic-tac-toe example to test

* update doc with marl-example

* fix docs

* reduce the threshold

* revert

* update player id to start from 1 and change player to agent; keep coding

* add reward_length argument for collector

* Improve Batch (#128)

* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* fix docs

* fix docs

* fix docs [ci skip]

* fix docs [ci skip]

Co-authored-by: Trinkle23897 <463003665@qq.com>

* refact

* re-implement Batch.stack and add testcases

* add doc for Batch.stack

* reward_metric

* modify flag

* minor fix

* reuse _create_values and refactor stack_ & cat_

* fix pep8

* fix reward stat in collector

* fix stat of collector, simplify test/base/env.py

* fix docs

* minor fix

* raise exception for stacking with partial keys and axis!=0

* minor fix

* minor fix

* minor fix

* marl-examples

* add condense; bugfix for torch.Tensor; code refactor

* marl example can run now

* enable tic tac toe with larger board size and win-size

* add test dependency

* Fix padding of inconsistent keys with Batch.stack and Batch.cat (#130)

* re-implement Batch.stack and add testcases

* add doc for Batch.stack

* reuse _create_values and refactor stack_ & cat_

* fix pep8

* fix docs

* raise exception for stacking with partial keys and axis!=0

* minor fix

* minor fix

Co-authored-by: Trinkle23897 <463003665@qq.com>

* stash

* let agent learn to play as agent 2 which is harder

* code refactor

* Improve collector (#125)

* remove multibuf

* reward_metric

* make fileds with empty Batch rather than None after reset

* many fixes and refactor
Co-authored-by: Trinkle23897 <463003665@qq.com>

* marl for tic-tac-toe and general gomoku

* update default gamma to 0.1 for tic tac toe to win earlier

* fix name typo; change default game config; add rew_norm option

* fix pep8

* test commit

* mv test dir name

* add rew flag

* fix torch.optim import error and madqn rew_norm

* remove useless kwargs

* Vector env enable select worker (#132)

* Enable selecting worker for vector env step method.

* Update collector to match new vecenv selective worker behavior.

* Bug fix.

* Fix rebase

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>

* show the last move of tictactoe by capital letters

* add multi-agent tutorial

* fix link

* Standardized behavior of Batch.cat and misc code refactor (#137)

* code refactor; remove unused kwargs; add reward_normalization for dqn

* bugfix for __setitem__ with torch.Tensor; add Batch.condense

* minor fix

* support cat with empty Batch

* remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases

* support stack with empty Batch

* remove condense

* refactor code to reflect the shared / partial / reserved categories of keys

* add is_empty(recursive=False)

* doc fix

* docfix and bugfix for _is_batch_set

* add doc for key reservation

* bugfix for algebra operators

* fix cat with lens hint

* code refactor

* bugfix for storing None

* use ValueError instead of exception

* hide lens away from users

* add comment for __cat

* move the computation of the initial value of lens in cat_ itself.

* change the place of doc string

* doc fix for Batch doc string

* change recursive to recurse

* doc string fix

* minor fix for batch doc

* write tutorials to specify the standard of Batch (#142)

* add doc for len exceptions

* doc move; unify is_scalar_value function

* remove some issubclass check

* bugfix for shape of Batch(a=1)

* keep moving doc

* keep writing batch tutorial

* draft version of Batch tutorial done

* improving doc

* keep improving doc

* batch tutorial done

* rename _is_number

* rename _is_scalar

* shape property do not raise exception

* restore some doc string

* grammarly [ci skip]

* grammarly + fix warning of building docs

* polish docs

* trim and re-arrange batch tutorial

* go straight to the point

* minor fix for batch doc

* add shape / len in basic usage

* keep improving tutorial

* unify _to_array_with_correct_type to remove duplicate code

* delegate type convertion to Batch.__init__

* further delegate type convertion to Batch.__init__

* bugfix for setattr

* add a _parse_value function

* remove dummy function call

* polish docs

Co-authored-by: Trinkle23897 <463003665@qq.com>

* bugfix for mapolicy

* pretty code

* remove debug code; remove condense

* doc fix

* check before get_agents in tutorials/tictactoe

* tutorial

* fix

* minor fix for batch doc

* minor polish

* faster test_ttt

* improve tic-tac-toe environment

* change default epoch and step-per-epoch for tic-tac-toe

* fix mapolicy

* minor polish for mapolicy

* 90% to 80% (need to change the tutorial)

* win rate

* show step number at board

* simplify mapolicy

* minor polish for mapolicy

* remove MADQN

* fix pep8

* change legal_actions to mask (need to update docs)

* simplify maenv

* fix typo

* move basevecenv to single file

* separate RandomAgent

* update docs

* grammarly

* fix pep8

* win rate typo

* format in cheatsheet

* use bool mask directly

* update doc for boolean mask

Co-authored-by: Trinkle23897 <463003665@qq.com>
Co-authored-by: Alexis DUBURCQ <alexis.duburcq@gmail.com>
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-21 14:59:49 +08:00
Alexis DUBURCQ
09e10e384f Vector env enable select worker (#132)
* Enable selecting worker for vector env step method.

* Update collector to match new vecenv selective worker behavior.

* Bug fix.

* Fix rebase

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-20 15:54:18 +08:00
n+e
47e8e2686c
move atari wrapper to examples and publish v0.2.4 (#124)
* move atari wrapper to examples

* consistency

* change drqn seed since it is quite unstable in current seed

* minor fix

* 0.2.4
2020-07-10 17:20:39 +08:00
youkaichao
268f9d0533
type signature correction (#83) 2020-06-20 09:57:16 +08:00
Trinkle23897
513573ea82 add link 2020-06-08 22:20:52 +08:00
Alexis DUBURCQ
52be533d06
Enable getattr for SubprocVecEnv. (#74)
* Enable getattr for SubprovVecEnv.

* Consistent API between VectorEnv and SubprocVecEnv.

* Avoid code duplication. Add unit tests.

* Add docstring.

* Test more branches.

* Fix UT.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-05 17:17:43 +08:00
Alexis DUBURCQ
529a4cf44c
Add pickle support for Batch. Fix VectorEnv. (#67)
* Fix vecenv.

* Add pickle support for Batch class.

* Add Batch pickle Unit Test.

* Fix lint.

* Swap Batch UT.

* Fix lint.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-30 21:29:33 +08:00
Trinkle23897
9b26137cd2 add type annotation 2020-05-12 11:31:47 +08:00
Trinkle23897
8812eaa502 fix #36 2020-04-23 22:06:18 +08:00
Minghao Zhang
205698dd66
fix #33 (#34) 2020-04-21 15:36:08 +08:00
Trinkle23897
19f2cce294 seealso and change policy dir structure 2020-04-09 21:36:53 +08:00
Trinkle23897
e0809ff135 add policy docs (#21) 2020-04-06 19:36:59 +08:00
Trinkle23897
610390c132 add docs of collector and trainer (#20) 2020-04-05 18:34:45 +08:00
Trinkle23897
b6c9db6b0b docs for env 2020-04-04 21:02:06 +08:00
Trinkle23897
7cb5146611 add docs of trick 2020-04-02 21:57:26 +08:00
Trinkle23897
0acd0d164c test api doc 2020-04-02 09:07:04 +08:00
Minghao Zhang
0b08a41610
move mujoco to examples (#12)
* move mujoco to examples

* fix the import mujoco bug

* flake8

* flake8

* rm __init__.py
2020-04-02 08:49:19 +08:00
Minghao Zhang
eb7fb37806
fix PointMaze (#8)
* update atari.py

* fix setup.py
pass the pytest

* fix setup.py
pass the pytest

* add args "render"

* change the tensorboard writter

* change the tensorboard writter

* change device, render, tensorboard log location

* change device, render, tensorboard log location

* remove some wrong local files

* fix some tab mistakes and the envs name in continuous/test_xx.py

* add examples and point robot maze environment

* fix some bugs during testing examples

* add dqn network and fix some args

* change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally

* add a warning to collector

* rm some unrelated files

* reformat

* fix a bug in test_dqn due to the model wrong selection

* change atari frame skip and observation to improve performance

* readd some files

* change import

* modified readme

* rm tensorboard log

* update atari and mujoco which are ignored

* rm the wrong lines

* readd the import of PointMaze

* fix a typo in test/discrete/net.py

* add a class declaration to pass the flake8

* fix flake8 errors
2020-03-28 14:36:12 +08:00
Trinkle23897
f68f23292e update readme and force flake8 2020-03-28 13:27:01 +08:00
Minghao Zhang
068c4068ec
fix atari/mujoco env (#7)
* update atari.py

* fix setup.py
pass the pytest

* fix setup.py
pass the pytest

* add args "render"

* change the tensorboard writter

* change the tensorboard writter

* change device, render, tensorboard log location

* change device, render, tensorboard log location

* remove some wrong local files

* fix some tab mistakes and the envs name in continuous/test_xx.py

* add examples and point robot maze environment

* fix some bugs during testing examples

* add dqn network and fix some args

* change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally

* add a warning to collector

* rm some unrelated files

* reformat

* fix a bug in test_dqn due to the model wrong selection

* change atari frame skip and observation to improve performance

* readd some files

* change import

* modified readme

* rm tensorboard log

* update atari and mujoco which are ignored

* rm the wrong lines
2020-03-28 12:03:49 +08:00
Minghao Zhang
3c0a09fefd
minor reformat (#2)
* update atari.py

* fix setup.py
pass the pytest

* fix setup.py
pass the pytest
2020-03-26 09:01:20 +08:00
Trinkle23897
fdc969b830 fix collector 2020-03-25 14:08:28 +08:00
Trinkle23897
75364cd986 ppo and early stop 2020-03-20 19:52:29 +08:00
Trinkle23897
64bab0b6a0 ddpg 2020-03-18 21:45:41 +08:00
Trinkle23897
6e563fe61a a2c 2020-03-17 20:22:37 +08:00
Trinkle23897
39de63592f finish pg 2020-03-17 11:37:31 +08:00
Trinkle23897
cef5de8b83 fix some bugs 2020-03-16 11:11:29 +08:00
Trinkle23897
5983c6b33d finish dqn 2020-03-15 17:41:00 +08:00
Trinkle23897
543e57cdbd clear 2020-03-13 21:47:17 +08:00