370 Commits

Author SHA1 Message Date
sunkafei
bc222e87a6
Fix #811 (#817) 2023-03-03 16:57:04 -08:00
Jiayi Weng
c8be85b240 fix readthedocs build error 2023-02-03 14:55:53 -08:00
Jiayi Weng
e8acf0dd46
Fix readthedocs build failure (#803) 2023-02-03 14:40:05 -08:00
Markus Krimmel
6c6c872523
Gymnasium Integration (#789)
Changes:
- Disclaimer in README
- Replaced all occurences of Gym with Gymnasium
- Removed code that is now dead since we no longer need to support the
old step API
- Updated type hints to only allow new step API
- Increased required version of envpool to support Gymnasium
- Increased required version of PettingZoo to support Gymnasium
- Updated `PettingZooEnv` to only use the new step API, removed hack to
also support old API
- I had to add some `# type: ignore` comments, due to new type hinting
in Gymnasium. I'm not that familiar with type hinting but I believe that
the issue is on the Gymnasium side and we are looking into it.
- Had to update `MyTestEnv` to support `options` kwarg
- Skip NNI tests because they still use OpenAI Gym
- Also allow `PettingZooEnv` in vector environment
- Updated doc page about ReplayBuffer to also talk about terminated and
truncated flags.

Still need to do: 
- Update the Jupyter notebooks in docs
- Check the entire code base for more dead code (from compatibility
stuff)
- Check the reset functions of all environments/wrappers in code base to
make sure they use the `options` kwarg
- Someone might want to check test_env_finite.py
- Is it okay to allow `PettingZooEnv` in vector environments? Might need
to update docs?
2023-02-03 11:57:27 -08:00
Jose Antonio Martin H
6019406cff
Add "act" to preprocess_fn call in collector. (#801)
This allows, for instance, to change the action registered into the
buffer when the environment modify the action.

Useful in offline learning for instance, since the true actions are in a
dataset and the actions of the agent are ignored.

- [ ] I have marked all applicable categories:
    + [ ] exception-raising fix
    + [ ] algorithm implementation fix
    + [ ] documentation modification
    + [X] new feature
- [X ] I have reformatted the code using `make format` (**required**)
- [X] I have checked the code using `make commit-checks` (**required**)
- [] If applicable, I have mentioned the relevant/related issue(s)
- [X] If applicable, I have listed every items in this Pull Request
below
2023-02-03 11:19:38 -08:00
janofsssun
774d3d8e83
Implement args/kwargs for init of norm_layers and activation (#788)
As mentioned in #770 , I have fixed the mismatch of args between the Net
and MLP. Also, in order to initialize the norm_layers and activations,
norm_args and act_args are added to the miniblock and related classes.
2022-12-26 19:58:03 -08:00
Jiayi Weng
1037627a5b
fix info not pass issue in PGPolicy (#787)
close #775
v0.4.11
2022-12-24 13:06:54 -08:00
Markus Krimmel
4c3791a459
Updated atari wrappers, fixed pre-commit (#781)
This PR addresses #772 (updates Atari wrappers to work with new Gym API)
and some additional issues:

- Pre-commit was using gitlab for flake8, which as of recently requires
authentication -> Replaced with GitHub
- Yapf was quietly failing in pre-commit. Changed it such that it fixes
formatting in-place
- There is an incompatibility between flake8 and yapf where yapf puts
binary operators after the line break and flake8 wants it before the
break. I added an exception for flake8.
- Also require `packaging` in setup.py

My changes shouldn't change the behaviour of the wrappers for older
versions, but please double check.
Idk whether it's just me, but there are always some incompatibilities
between yapf and flake8 that need to resolved manually. It might make
sense to try black instead.
2022-12-04 13:00:53 -08:00
Yi Su
662af52820
Fix Atari PPO example (#780)
- [x] I have marked all applicable categories:
    + [ ] exception-raising fix
    + [x] algorithm implementation fix
    + [ ] documentation modification
    + [ ] new feature
- [x] I have reformatted the code using `make format` (**required**)
- [x] I have checked the code using `make commit-checks` (**required**)
- [x] If applicable, I have mentioned the relevant/related issue(s)
- [x] If applicable, I have listed every items in this Pull Request
below

While trying to debug Atari PPO+LSTM, I found significant gap between
our Atari PPO example vs [CleanRL's Atari PPO w/
EnvPool](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_envpoolpy).
I tried to align our implementation with CleaRL's version, mostly in
hyper parameter choices, and got significant gain in Breakout, Qbert,
SpaceInvaders while on par in other games. After this fix, I would
suggest updating our [Atari
Benchmark](https://tianshou.readthedocs.io/en/master/tutorials/benchmark.html)
PPO experiments.

A few interesting findings:

- Layer initialization helps stabilize the training and enable the use
of larger learning rates; without it, larger learning rates will trigger
NaN gradient very quickly;
- ppo.py#L97-L101: this change helps training stability for reasons I do
not understand; also it makes the GPU usage higher.

Shoutout to [CleanRL](https://github.com/vwxyzjn/cleanrl) for a
well-tuned Atari PPO reference implementation!
2022-12-04 12:23:18 -08:00
ChenDRAG
929508ba77
Update experiment details of MuJoCo benchmark (#779)
Update the downloading url of the training logs and saved checkpoints
for MuJoCo tasks.
2022-11-26 10:18:22 -08:00
Will Dudley
b9a6d8b5f0
bugfixes: gym->gymnasium; render() update (#769)
Credits (names from the Farama Discord):

- @nrwahl2
- @APN-Pucky
- chattershuts
2022-11-11 12:25:35 -08:00
Yi Su
06aaad460e
Fix a bug in loading offline data (#768)
This PR fixes #766 .

Co-authored-by: Yi Su <yi_su@apple.com>
2022-11-03 16:12:33 -07:00
fzyzcjy
7ff12b909d
Tiny change since the tests are more than unit tests (#765)
IMHO, unit tests, compared with integration tests or end-to-end tests or
other tests, often means something that only tests a single
method/function/class/etc, and often has a lot of stubs and mocks so it
is far from a typical/real usage scenario. On the other hand,
integration tests or e2e tests mock less and are more like the real
case.

Tianshou says:

> ... tests include the full agent training procedure for all of the
implemented algorithms

It seems that this is more than unit test, and falls into the category
of integration or even e2e tests.
2022-11-01 07:20:20 -07:00
Juno T
d42a5fb354
Hindsight Experience Replay as a replay buffer (#753)
## implementation
I implemented HER solely as a replay buffer. It is done by temporarily
directly re-writing transitions storage (`self._meta`) during the
`sample_indices()` call. The original transitions are cached and will be
restored at the beginning of the next sampling or when other methods is
called. This will make sure that. for example, n-step return calculation
can be done without altering the policy.

There is also a problem with the original indices sampling. The sampled
indices are not guaranteed to be from different episodes. So I decided
to perform re-writing based on the episode. This guarantees that the
sampled transitions from the same episode will have the same re-written
goal. This also make the re-writing ratio calculation slightly differ
from the paper, but it won't be too different if there are many episodes
in the buffer.

In the current commit, HER replay buffer only support 'future' strategy
and online sampling. This is the best of HER in term of performance and
memory efficiency.

I also add a few more convenient replay buffers
(`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env
(`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a
simple example (examples/offline/fetch_her_ddpg.py).

## verification
I have added unit tests for almost everything I have implemented.
HER replay buffer was also tested using DDPG on [`FetchReach-v3`
env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used
default DDPG parameters from mujoco example and didn't tune anything
further to get this good result! (train script:
examples/offline/fetch_her_ddpg.py).


![Screen Shot 2022-10-02 at 19 22
53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png)
2022-10-30 16:54:54 -07:00
Jiayi Weng
41ae3461f6
bump version to 0.4.10 (#757) v0.4.10 2022-10-16 22:15:20 -07:00
Zodan Jodan
0181fe79a5
fix docs tictactoc dummy vector env #669 (#749)
a fix for #669
2022-10-03 17:41:31 -07:00
Markus Krimmel
128feb677f
Added support for new PettingZoo API (#751) 2022-10-02 09:33:12 -07:00
Markus Krimmel
b0c8d28a7d
Added pre-commit (#752)
- This PR adds the checks that are defined in the Makefile as pre-commit
hooks.
- Hopefully, the checks are equivalent to those from the Makefile, but I
can't guarantee it.
- CI remains as it is.
- As I pointed out on discord, I experienced some conflicts between
flake8 and yapf, so it might be better to transition to some other
combination (e.g. black).
2022-10-02 08:57:45 -07:00
Yuge Zhang
65c4e3d4cd
Fix NNI tests upon v2.9 upgrade (#750)
* Fix NNI tests upon v2.9 upgrade

* Un-ignore

* fix
2022-09-26 13:55:26 -07:00
Markus Krimmel
ea36dc5195
Changes to support Gym 0.26.0 (#748)
* Changes to support Gym 0.26.0

* Replace map by simpler list comprehension

* Use syntax that is compatible with python 3.7

* Format code

* Fix environment seeding in test environment, fix buffer_profile test

* Remove self.seed() from __init__

* Fix random number generation

* Fix throughput tests

* Fix tests

* Removed done field from Buffer, fixed throughput test, turned off wandb, fixed formatting, fixed type hints, allow preprocessing_fn with truncated and terminated arguments, updated docstrings

* fix lint

* fix

* fix import

* fix

* fix mypy

* pytest --ignore='test/3rd_party'

* Use correct step API in _SetAttrWrapper

* Format

* Fix mypy

* Format

* Fix pydocstyle.
2022-09-26 09:31:23 -07:00
Jiayi Weng
278c91a222
Update citation and contributor (#721)
* update citation

* update contributor

* pass lint
2022-08-10 20:06:51 -07:00
Jiayi Weng
0f59e38b12
Fix venv wrapper reset retval error with gym env (#712)
* Fix venv wrapper reset retval error with gym env

* fix lint
2022-07-31 11:00:38 -07:00
Wenhao Chen
f270e88461
Do not allow async simulation for test collector (#705) 2022-07-22 16:23:55 -07:00
Jiayi Weng
99c99bb09a
Fix 2 bugs and refactor RunningMeanStd to support dict obs norm (#695)
* fix #689

* fix #672

* refactor RMS class

* fix #688
2022-07-14 22:52:56 -07:00
Jiayi Weng
65054847ef
bump version to 0.4.9 (#684) v0.4.9 2022-07-05 01:07:16 +08:00
Yifei Cheng
43792bf5ab
Upgrade gym (#613)
fixes some deprecation warnings due to new changes in gym version 0.23:
- use `env.np_random.integers` instead of `env.np_random.randint`
- support `seed` and `return_info` arguments for reset (addresses https://github.com/thu-ml/tianshou/issues/605)
2022-06-28 06:52:21 +08:00
Anas BELFADIL
aba2d01d25
MultiDiscrete to discrete gym action space wrapper (#664)
Has been tested to work with DQN and a custom MultiDiscrete gym env.
2022-06-13 06:18:22 +08:00
Yifei Cheng
21b15803ac
Fix exception with watching pistonball environments (#663) 2022-06-12 03:12:48 +08:00
Yi Su
df35718992
Implement TD3+BC for offline RL (#660)
- implement TD3+BC for offline RL;
- fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;
2022-06-07 00:39:37 +08:00
Yi Su
9ce0a554dc
Add Atari SAC examples (#657)
- Add Atari (discrete) SAC examples;
- Fix a bug in Discrete SAC evaluation; default to deterministic mode.
2022-06-04 13:26:08 +08:00
Jiayi Weng
5ecea2402e
Fix save_checkpoint_fn return value (#659)
- Fix save_checkpoint_fn return value to checkpoint_path;
- Fix wrong link in doc;
- Fix an off-by-one bug in trainer iterator.
2022-06-03 01:07:07 +08:00
Jiayi Weng
6ad5b520fa
Fix sphinx build error (#655) 2022-06-01 13:56:04 +08:00
Jiayi Weng
109875d43d
Fix num_envs=test_num (#653)
* fix num_envs=test_num

* fix mypy
2022-05-30 12:38:47 +08:00
Michal Gregor
277138ca5b
Added support for clipping to DQNPolicy (#642)
* When clip_loss_grad=True is passed, Huber loss is used instead of the MSE loss.
* Made the argument's name more descriptive;
* Replaced the smooth L1 loss with the Huber loss, which has an identical form to the default parametrization, but seems to be better known in this context;
* Added a fuller description to the docstring;
2022-05-18 19:33:37 +08:00
Michal Gregor
c87b9f49bc
Add show_progress option for trainer (#641)
- A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar;
- Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm.
2022-05-17 23:41:59 +08:00
Anas BELFADIL
53e6b0408d
Add BranchingDQN for large discrete action spaces (#618) 2022-05-15 21:40:32 +08:00
Jiayi Weng
a03f19af72
fix pytest error on non-linux system (#638) 2022-05-12 20:52:55 +08:00
Jiayi Weng
bf8f63ffc3
use envpool in vizdoom example, update doc (#634) 2022-05-09 00:42:16 +08:00
Jiayi Weng
2a7c151738
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628)
- add VectorEnvWrapper and VectorEnvNormObs
- obs_rms store in policy save/load
- align mujoco scripts with atari: obs_norm, envpool, wandb and README
v0.4.8
2022-05-05 19:55:15 +08:00
Yi Su
a7c789f851
Improve data loading from D4RL and convert RL Unplugged to D4RL format (#624) 2022-05-04 04:37:52 +08:00
Yi Su
dd16818ce4
implement REDQ based on original contribution by @Jimenius (#623)
Co-authored-by: Minhui Li
 <limh@lamda.nju.edu.cn>
2022-05-01 00:06:00 +08:00
Yi Su
41afc2584a
Convert RL Unplugged Atari datasets to tianshou ReplayBuffer (#621) 2022-04-29 19:33:28 +08:00
ChenDRAG
7f23748347
Compare Atari results with dopamine and OpenAI Baselines (#616) 2022-04-27 21:10:45 +08:00
Jiayi Weng
876e6b186e hot fix mujoco benchmark 2022-04-24 16:49:40 -04:00
Chengqi Duan
5eab7dc218
Add Atari Results (#600) 2022-04-24 20:44:54 +08:00
ChenDRAG
5c9afe72f3
Update Mujoco Bemchmark's webpage (#606) 2022-04-24 01:11:33 +08:00
Squeemos
e01385ea30
Change action_dim to action_shape (#602)
Noticed that in IQN and FQF there were some mismatches in the docstrings. Figured I would make a pull request to make it match.
2022-04-22 08:09:57 +08:00
ChenDRAG
57ecebde38
Add jupyter notebook tutorials using Google Colaboratory (#599) 2022-04-19 20:58:52 +08:00
Alex Nikulkov
92456cdb68
Add learning rate scheduler to BasePolicy (#598) 2022-04-17 23:52:30 +08:00
Yifei Cheng
6fc6857812
Update Multi-agent RL docs, upgrade pettingzoo (#595)
* update multi-agent docs, upgrade pettingzoo

* avoid pettingzoo deprecation warning

* fix pistonball tests

* codestyle
2022-04-16 23:17:53 +08:00