217 Commits

Author SHA1 Message Date
Markus Krimmel
6c6c872523
Gymnasium Integration (#789)
Changes:
- Disclaimer in README
- Replaced all occurences of Gym with Gymnasium
- Removed code that is now dead since we no longer need to support the
old step API
- Updated type hints to only allow new step API
- Increased required version of envpool to support Gymnasium
- Increased required version of PettingZoo to support Gymnasium
- Updated `PettingZooEnv` to only use the new step API, removed hack to
also support old API
- I had to add some `# type: ignore` comments, due to new type hinting
in Gymnasium. I'm not that familiar with type hinting but I believe that
the issue is on the Gymnasium side and we are looking into it.
- Had to update `MyTestEnv` to support `options` kwarg
- Skip NNI tests because they still use OpenAI Gym
- Also allow `PettingZooEnv` in vector environment
- Updated doc page about ReplayBuffer to also talk about terminated and
truncated flags.

Still need to do: 
- Update the Jupyter notebooks in docs
- Check the entire code base for more dead code (from compatibility
stuff)
- Check the reset functions of all environments/wrappers in code base to
make sure they use the `options` kwarg
- Someone might want to check test_env_finite.py
- Is it okay to allow `PettingZooEnv` in vector environments? Might need
to update docs?
2023-02-03 11:57:27 -08:00
Will Dudley
b9a6d8b5f0
bugfixes: gym->gymnasium; render() update (#769)
Credits (names from the Farama Discord):

- @nrwahl2
- @APN-Pucky
- chattershuts
2022-11-11 12:25:35 -08:00
Juno T
d42a5fb354
Hindsight Experience Replay as a replay buffer (#753)
## implementation
I implemented HER solely as a replay buffer. It is done by temporarily
directly re-writing transitions storage (`self._meta`) during the
`sample_indices()` call. The original transitions are cached and will be
restored at the beginning of the next sampling or when other methods is
called. This will make sure that. for example, n-step return calculation
can be done without altering the policy.

There is also a problem with the original indices sampling. The sampled
indices are not guaranteed to be from different episodes. So I decided
to perform re-writing based on the episode. This guarantees that the
sampled transitions from the same episode will have the same re-written
goal. This also make the re-writing ratio calculation slightly differ
from the paper, but it won't be too different if there are many episodes
in the buffer.

In the current commit, HER replay buffer only support 'future' strategy
and online sampling. This is the best of HER in term of performance and
memory efficiency.

I also add a few more convenient replay buffers
(`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env
(`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a
simple example (examples/offline/fetch_her_ddpg.py).

## verification
I have added unit tests for almost everything I have implemented.
HER replay buffer was also tested using DDPG on [`FetchReach-v3`
env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used
default DDPG parameters from mujoco example and didn't tune anything
further to get this good result! (train script:
examples/offline/fetch_her_ddpg.py).


![Screen Shot 2022-10-02 at 19 22
53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png)
2022-10-30 16:54:54 -07:00
Jiayi Weng
41ae3461f6
bump version to 0.4.10 (#757) 2022-10-16 22:15:20 -07:00
Markus Krimmel
128feb677f
Added support for new PettingZoo API (#751) 2022-10-02 09:33:12 -07:00
Yuge Zhang
65c4e3d4cd
Fix NNI tests upon v2.9 upgrade (#750)
* Fix NNI tests upon v2.9 upgrade

* Un-ignore

* fix
2022-09-26 13:55:26 -07:00
Markus Krimmel
ea36dc5195
Changes to support Gym 0.26.0 (#748)
* Changes to support Gym 0.26.0

* Replace map by simpler list comprehension

* Use syntax that is compatible with python 3.7

* Format code

* Fix environment seeding in test environment, fix buffer_profile test

* Remove self.seed() from __init__

* Fix random number generation

* Fix throughput tests

* Fix tests

* Removed done field from Buffer, fixed throughput test, turned off wandb, fixed formatting, fixed type hints, allow preprocessing_fn with truncated and terminated arguments, updated docstrings

* fix lint

* fix

* fix import

* fix

* fix mypy

* pytest --ignore='test/3rd_party'

* Use correct step API in _SetAttrWrapper

* Format

* Fix mypy

* Format

* Fix pydocstyle.
2022-09-26 09:31:23 -07:00
Jiayi Weng
0f59e38b12
Fix venv wrapper reset retval error with gym env (#712)
* Fix venv wrapper reset retval error with gym env

* fix lint
2022-07-31 11:00:38 -07:00
Jiayi Weng
99c99bb09a
Fix 2 bugs and refactor RunningMeanStd to support dict obs norm (#695)
* fix #689

* fix #672

* refactor RMS class

* fix #688
2022-07-14 22:52:56 -07:00
Jiayi Weng
65054847ef
bump version to 0.4.9 (#684) 2022-07-05 01:07:16 +08:00
Yifei Cheng
43792bf5ab
Upgrade gym (#613)
fixes some deprecation warnings due to new changes in gym version 0.23:
- use `env.np_random.integers` instead of `env.np_random.randint`
- support `seed` and `return_info` arguments for reset (addresses https://github.com/thu-ml/tianshou/issues/605)
2022-06-28 06:52:21 +08:00
Anas BELFADIL
aba2d01d25
MultiDiscrete to discrete gym action space wrapper (#664)
Has been tested to work with DQN and a custom MultiDiscrete gym env.
2022-06-13 06:18:22 +08:00
Yifei Cheng
21b15803ac
Fix exception with watching pistonball environments (#663) 2022-06-12 03:12:48 +08:00
Yi Su
df35718992
Implement TD3+BC for offline RL (#660)
- implement TD3+BC for offline RL;
- fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;
2022-06-07 00:39:37 +08:00
Yi Su
9ce0a554dc
Add Atari SAC examples (#657)
- Add Atari (discrete) SAC examples;
- Fix a bug in Discrete SAC evaluation; default to deterministic mode.
2022-06-04 13:26:08 +08:00
Jiayi Weng
5ecea2402e
Fix save_checkpoint_fn return value (#659)
- Fix save_checkpoint_fn return value to checkpoint_path;
- Fix wrong link in doc;
- Fix an off-by-one bug in trainer iterator.
2022-06-03 01:07:07 +08:00
Michal Gregor
c87b9f49bc
Add show_progress option for trainer (#641)
- A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar;
- Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm.
2022-05-17 23:41:59 +08:00
Anas BELFADIL
53e6b0408d
Add BranchingDQN for large discrete action spaces (#618) 2022-05-15 21:40:32 +08:00
Jiayi Weng
a03f19af72
fix pytest error on non-linux system (#638) 2022-05-12 20:52:55 +08:00
Jiayi Weng
2a7c151738
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628)
- add VectorEnvWrapper and VectorEnvNormObs
- obs_rms store in policy save/load
- align mujoco scripts with atari: obs_norm, envpool, wandb and README
2022-05-05 19:55:15 +08:00
Yi Su
a7c789f851
Improve data loading from D4RL and convert RL Unplugged to D4RL format (#624) 2022-05-04 04:37:52 +08:00
Yi Su
dd16818ce4
implement REDQ based on original contribution by @Jimenius (#623)
Co-authored-by: Minhui Li
 <limh@lamda.nju.edu.cn>
2022-05-01 00:06:00 +08:00
Alex Nikulkov
92456cdb68
Add learning rate scheduler to BasePolicy (#598) 2022-04-17 23:52:30 +08:00
Yifei Cheng
6fc6857812
Update Multi-agent RL docs, upgrade pettingzoo (#595)
* update multi-agent docs, upgrade pettingzoo

* avoid pettingzoo deprecation warning

* fix pistonball tests

* codestyle
2022-04-16 23:17:53 +08:00
Jiayi Weng
18277497ed
fix py39 ci venv test failure (#593) 2022-04-12 22:29:39 +08:00
Jiayi Weng
2a9c9289e5
rename save_fn to save_best_fn to avoid ambiguity (#575)
This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper.
2022-03-22 04:29:27 +08:00
Jose Antonio Martin H
10d919052b
Add Trainers as generators (#559)
The new proposed feature is to have trainers as generators.
The usage pattern is:

```python
trainer = OnPolicyTrainer(...)
for epoch, epoch_stat, info in trainer:
    print(f"Epoch: {epoch}")
    print(epoch_stat)
    print(info)
    do_something_with_policy()
    query_something_about_policy()
    make_a_plot_with(epoch_stat)
    display(info)
```

- epoch int: the epoch number
- epoch_stat dict: a large collection of metrics of the current epoch, including stat
- info dict: the usual dict out of the non-generator version of the trainer

You can even iterate on several different trainers at the same time:

```python
trainer1 = OnPolicyTrainer(...)
trainer2 = OnPolicyTrainer(...)
for result1, result2, ... in zip(trainer1, trainer2, ...):
    compare_results(result1, result2, ...)
```

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-18 00:26:14 +08:00
Costa Huang
df3d7f582b
Update WandbLogger implementation (#558)
* Use `global_step` as the x-axis for wandb
* Use Tensorboard SummaryWritter as core with `wandb.init(..., sync_tensorboard=True)`
* Update all atari examples with wandb

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-07 06:40:47 +08:00
Yi Su
2377f2f186
Implement Generative Adversarial Imitation Learning (GAIL) (#550)
Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531, #173)
2022-03-06 23:57:15 +08:00
Anas BELFADIL
d976a5aa91
Fixed hardcoded reward_treshold (#548) 2022-03-04 10:35:39 +08:00
Jiayi Weng
c248b4f87e
fix conda support and keep API compatibility (#536)
* loose constrains

* fix nni issue (#478)

* fix coverage
2022-02-26 00:05:02 +08:00
Chengqi Duan
23fbc3b712
upgrade gym version to >=0.21, fix related CI and update examples/atari (#534)
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-25 07:40:33 +08:00
Mohammad Mahdi Rahimi
c7e2e56fac
Pettingzoo support (#494)
Co-authored-by: Rodrigo de Lazcano <r.l.p.v96@gmail.com>
Co-authored-by: J K Terry <justinkterry@gmail.com>
2022-02-15 22:56:45 +08:00
Jiayi Weng
3d697aa4c6
make unit test faster (#522)
* test cache expert data in offline training

* faster cql test

* faster tests

* use dummy

* test ray dependency
2022-02-09 00:24:52 +08:00
Chengqi Duan
9c100e0705
Enable venvs.reset() concurrent execution (#517)
- change the internal API name of worker: send_action -> send, get_result -> recv (align with envpool)
- add a timing test for venvs.reset() to make sure the concurrent execution
- change venvs.reset() logic

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-08 00:40:01 +08:00
ChenDRAG
c25926dd8f
Formalize variable names (#509)
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-01-30 00:53:56 +08:00
Bernard Tan
bc53ead273
Implement CQLPolicy and offline_cql example (#506) 2022-01-16 05:30:21 +08:00
Yi Su
a59d96d041
Add Intrinsic Curiosity Module (#503) 2022-01-15 02:43:48 +08:00
Yi Su
3592f45446
Fix critic network for Discrete CRR (#485)
- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies;
- Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
- Add `writer.flush()` in TensorboardLogger to ensure real-time result;
- Enable `test_collector=None` in 3 trainers to turn off testing during training;
- Updates the Atari offline results in README.md;
- Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments.
2021-11-28 23:10:28 +08:00
Bernard Tan
5c5a3db94e
Implement BCQPolicy and offline_bcq example (#480)
This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py.
2021-11-22 22:21:02 +08:00
Markus28
8f19a86966
Implements set_env_attr and get_env_attr for vector environments (#478)
close #473
2021-11-03 00:08:00 +08:00
Jiayi Weng
e45e2096d8
add multi-GPU support (#461)
add a new class DataParallelNet
2021-10-06 01:39:14 +08:00
Jiayi Weng
5df64800f4
final fix for actor_critic shared head parameters (#458) 2021-10-04 23:19:07 +08:00
Ayush Chaurasia
22d7bf38c8
Improve W&B logger (#441)
- rename WandBLogger -> WandbLogger
- add save_data and restore_data
- allow more input arguments for wandb init
- integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py
- documentation update
2021-09-24 21:52:23 +08:00
n+e
fc251ab0b8
bump to v0.4.3 (#432)
* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check
2021-09-03 05:05:04 +08:00
Andriy Drozdyuk
8a5e2190f7
Add Weights and Biases Logger (#427)
- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2021-08-30 22:35:02 +08:00
n+e
e4f4f0e144
fix docs build failure and a bug in a2c/ppo optimizer (#428)
* fix rtfd build

* list + list -> set.union

* change seed of test_qrdqn

* add py39 test
2021-08-30 02:07:03 +08:00
Yi Su
291be08d43
Add Rainbow DQN (#386)
- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network
2021-08-29 23:34:59 +08:00
Andriy Drozdyuk
d161059c3d
Replaced indice by plural indices (#422) 2021-08-20 21:58:44 +08:00
n+e
5b7732a29b
make ppo discrete test script more general (#418) 2021-08-15 21:37:37 +08:00