Tianshou

Author	SHA1	Message	Date
Yi Su	662af52820	Fix Atari PPO example (#780 ) - [x] I have marked all applicable categories: + [ ] exception-raising fix + [x] algorithm implementation fix + [ ] documentation modification + [ ] new feature - [x] I have reformatted the code using `make format` (required) - [x] I have checked the code using `make commit-checks` (required) - [x] If applicable, I have mentioned the relevant/related issue(s) - [x] If applicable, I have listed every items in this Pull Request below While trying to debug Atari PPO+LSTM, I found significant gap between our Atari PPO example vs [CleanRL's Atari PPO w/ EnvPool](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_envpoolpy). I tried to align our implementation with CleaRL's version, mostly in hyper parameter choices, and got significant gain in Breakout, Qbert, SpaceInvaders while on par in other games. After this fix, I would suggest updating our [Atari Benchmark](https://tianshou.readthedocs.io/en/master/tutorials/benchmark.html) PPO experiments. A few interesting findings: - Layer initialization helps stabilize the training and enable the use of larger learning rates; without it, larger learning rates will trigger NaN gradient very quickly; - ppo.py#L97-L101: this change helps training stability for reasons I do not understand; also it makes the GPU usage higher. Shoutout to [CleanRL](https://github.com/vwxyzjn/cleanrl) for a well-tuned Atari PPO reference implementation!	2022-12-04 12:23:18 -08:00
Wenhao Chen	f270e88461	Do not allow async simulation for test collector (#705 )	2022-07-22 16:23:55 -07:00
Yi Su	df35718992	Implement TD3+BC for offline RL (#660 ) - implement TD3+BC for offline RL; - fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;	2022-06-07 00:39:37 +08:00
Jiayi Weng	5ecea2402e	Fix save_checkpoint_fn return value (#659 ) - Fix save_checkpoint_fn return value to checkpoint_path; - Fix wrong link in doc; - Fix an off-by-one bug in trainer iterator.	2022-06-03 01:07:07 +08:00
Michal Gregor	c87b9f49bc	Add show_progress option for trainer (#641 ) - A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar; - Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm.	2022-05-17 23:41:59 +08:00
Jiayi Weng	2a9c9289e5	rename save_fn to save_best_fn to avoid ambiguity (#575 ) This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper.	2022-03-22 04:29:27 +08:00
Jose Antonio Martin H	10d919052b	Add Trainers as generators (#559 ) The new proposed feature is to have trainers as generators. The usage pattern is: ```python trainer = OnPolicyTrainer(...) for epoch, epoch_stat, info in trainer: print(f"Epoch: {epoch}") print(epoch_stat) print(info) do_something_with_policy() query_something_about_policy() make_a_plot_with(epoch_stat) display(info) ``` - epoch int: the epoch number - epoch_stat dict: a large collection of metrics of the current epoch, including stat - info dict: the usual dict out of the non-generator version of the trainer You can even iterate on several different trainers at the same time: ```python trainer1 = OnPolicyTrainer(...) trainer2 = OnPolicyTrainer(...) for result1, result2, ... in zip(trainer1, trainer2, ...): compare_results(result1, result2, ...) ``` Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-03-18 00:26:14 +08:00

7 Commits