Commit Graph

  • b5c3ddabfa
    Add discrete Conservative Q-Learning for offline RL (#359) Yi Su 2021-05-11 18:24:48 -07:00
  • 84f58636eb
    Make trainer resumable (#350) Ark 2021-05-06 08:53:53 +08:00
  • f4e05d585a
    Support deterministic evaluation for onpolicy algorithms (#354) Yuge Zhang 2021-04-27 21:22:39 +08:00
  • ff4d3cd714
    Support different state size and fix exception in venv.__del__ (#352) n+e 2021-04-25 15:23:46 +08:00
  • bbc3c3e32d
    Add numerical analysis tool and interactive plot (#341) ChenDRAG 2021-04-22 12:49:54 +08:00
  • 844d7703c3
    NPG Mujoco benchmark release (#347) ChenDRAG 2021-04-21 16:31:20 +08:00
  • 1dcf65fe21
    Add NPG policy (#344) ChenDRAG 2021-04-21 09:52:15 +08:00
  • c059f98abf
    fix atari_bcq (#345) n+e 2021-04-20 22:59:21 +08:00
  • a57503c0aa
    TRPO benchmark release (#340) ChenDRAG 2021-04-19 17:05:06 +08:00
  • f68cb78ed7
    Add self-hosted runner for GPU checks (#339) n+e 2021-04-18 16:57:37 +08:00
  • 5057b5c89e
    Add TRPO policy (#337) ChenDRAG 2021-04-16 20:37:12 +08:00
  • 333b8fbd66
    add plotter (#335) ChenDRAG 2021-04-14 14:06:36 +08:00
  • dd4a01132c
    Fix SAC loss explode (#333) v0.4.1 ChenDRAG 2021-04-04 17:33:35 +08:00
  • 825da9bc53
    add cross-platform test and release 0.4.1 (#331) n+e 2021-03-31 15:14:22 +08:00
  • 09692c84fe
    fix numpy>=1.20 typing check (#323) n+e 2021-03-30 16:06:03 +08:00
  • 6426a39796
    ppo benchmark (#330) ChenDRAG 2021-03-30 11:50:35 +08:00
  • 5d580c3662
    refactor ppo (#329) ChenDRAG 2021-03-28 18:28:36 +08:00
  • 1730a9008a
    A2C benchmark for mujoco (#325) ChenDRAG 2021-03-28 13:12:43 +08:00
  • 105b277b87
    hotfix:keep statisics of buffer when reset buffer in on policy trainer (#328) ChenDRAG 2021-03-27 16:58:48 +08:00
  • 8963a14327
    fix exception in tutorials/dqn.rst (#327) n+e 2021-03-26 12:57:00 +08:00
  • 7db21f3df6
    Test on finite vector env (#324) Yuge Zhang 2021-03-25 22:59:34 +08:00
  • 3ac67d9974
    refactor A2C/PPO, change behavior of value normalization (#321) ChenDRAG 2021-03-25 10:12:39 +08:00
  • 47c77899d5
    Add REINFORCE benchmark for mujoco (#320) ChenDRAG 2021-03-24 19:59:53 +08:00
  • e27b5a26f3
    Refactor PG algorithm and change behavior of compute_episodic_return (#319) ChenDRAG 2021-03-23 22:05:48 +08:00
  • 2c11b6e43b
    Add lr_scheduler option for Onpolicy algorithm (#318) ChenDRAG 2021-03-22 16:57:24 +08:00
  • 4d92952a7b
    Remap action to fit gym's action space (#313) ChenDRAG 2021-03-21 16:45:50 +08:00
  • 0c7117dd55
    fix concepts.rst with regard to new buffer behavior (#316) n+e 2021-03-20 21:46:36 +08:00
  • ec23c7efe9
    fix qvalue mask_action error for obs_next (#310) n+e 2021-03-15 08:06:24 +08:00
  • 243ab43b3c
    support observation normalization in BaseVectorEnv (#308) ChenDRAG 2021-03-11 20:50:20 +08:00
  • 5c53f8c1f8
    fix reward_metric & n_episode bug in on policy algorithm (#306) ChenDRAG 2021-03-08 14:35:30 +08:00
  • e605bdea94
    MuJoCo Benchmark - DDPG, TD3, SAC (#305) ChenDRAG 2021-03-07 19:21:02 +08:00
  • 389bdb7ed3
    Merge pull request #302 from thu-ml/dev v0.4.0 n+e 2021-03-02 20:28:29 +08:00
  • 454c86c469
    fix venv seed, add TOC in docs, and split buffer.py into several files (#303) n+e 2021-03-02 12:28:28 +08:00
  • 31e7f445d1
    fix vecenv action_space randomness (#300) n+e 2021-03-01 15:44:03 +08:00
  • f22b539761
    Remove reward_normaliztion option in offpolicy algorithm (#298) ChenDRAG 2021-02-27 11:20:43 +08:00
  • 3108b9db0d
    Add Timelimit trick to optimize policies (#296) ChenDRAG 2021-02-26 13:23:18 +08:00
  • 9b61bc620c add logger (#295) ChenDRAG 2021-02-24 14:48:42 +08:00
  • e99e1b0fdd Improve buffer.prev() & buffer.next() (#294) Trinkle23897 2021-02-22 19:19:22 +08:00
  • 7036073649
    Trainer refactor : some definition change (#293) ChenDRAG 2021-02-21 13:06:02 +08:00
  • 150d0ec51b
    Step collector implementation (#280) ChenDRAG 2021-02-19 10:33:49 +08:00
  • d918022ce9 merge master into dev Trinkle23897 2021-02-18 12:46:55 +08:00
  • cb65b56b13
    v0.3.2 (#292) v0.3.2 n+e 2021-02-16 09:31:46 +08:00
  • d003c8e566
    fix 2 bugs of batch (#284) n+e 2021-02-16 09:01:54 +08:00
  • f528131da1
    hotfix:fix test failure in cuda environment (#289) ChenDRAG 2021-02-09 17:13:40 +08:00
  • e3ee415b1a temporary fix numpy<1.20.0 (#281) Trinkle23897 2021-02-08 12:59:37 +08:00
  • c838f2f0e9
    fix 2 bugs of batch (#284) n+e 2021-02-02 19:28:05 +08:00
  • f0129f4ca7
    Add CachedReplayBuffer and ReplayBufferManager (#278) ChenDRAG 2021-01-29 12:23:18 +08:00
  • 1eb6137645
    Add QR-DQN algorithm (#276) wizardsheng 2021-01-28 09:27:05 +08:00
  • a511cb4779
    Add offline trainer and discrete BCQ algorithm (#263) v0.3.1 Jialu Zhu 2021-01-20 02:13:04 -08:00
  • a633a6a028
    update utils.network (#275) ChenDRAG 2021-01-20 16:54:13 +08:00
  • 866e35d550
    fix readme (#273) 蔡舒起 2021-01-16 19:27:35 +08:00
  • c6f2648e87
    Add C51 algorithm (#266) wizardsheng 2021-01-06 10:17:45 +08:00
  • 5d13d8a453
    Saving and loading replay buffer with HDF5 (#261) Nico Gürtler 2020-12-17 01:58:43 +01:00
  • cd481423dc sac mujoco result (#246) Trinkle23897 2020-11-09 16:43:55 +08:00
  • c97aa4065e
    add singleton pattern version of summary_writter (#230) rocknamx 2020-10-31 16:38:54 +08:00
  • b364f1a26f specify the meaning of logits in documentation (#238) v0.3.0.post1 Trinkle23897 2020-10-08 23:16:15 +08:00
  • 5ed6c1c7aa
    change the step in trainer (#235) n+e 2020-10-04 21:55:43 +08:00
  • 710966eda7
    change API of train_fn and test_fn (#229) v0.3.0 n+e 2020-09-26 16:35:37 +08:00
  • d87d31a705
    Update Anaconda support (#228) n+e 2020-09-25 15:07:36 +08:00
  • 83bd1ec9e2
    Add MANIFEST.in to include license file in source distribution (#227) Joshua Adelman 2020-09-24 20:15:24 -04:00
  • dcfcbb37f4
    add PSRL policy (#202) v0.3.0rc0 Yao Feng 2020-09-23 20:57:33 +08:00
  • bf39b9ef7d
    clarify updating state (#224) rocknamx 2020-09-22 16:28:46 +08:00
  • eec0826fd3
    change PER update interface in BasePolicy (#217) n+e 2020-09-16 17:43:19 +08:00
  • 623bf24f0c
    fix unittest (#218) n+e 2020-09-14 15:59:32 +08:00
  • a6ee979609
    implement sac for discrete action settings (#216) danagi 2020-09-14 14:59:23 +08:00
  • b284ace102
    type check in unit test (#200) n+e 2020-09-13 19:31:50 +08:00
  • c91def6cbc
    code format and update function signatures (#213) n+e 2020-09-12 15:39:01 +08:00
  • 16d8e9b051
    SAC implementation update (#212) danagi 2020-09-12 08:44:50 +08:00
  • b86d78766b
    fix docs and add docstring check (#210) n+e 2020-09-11 07:55:37 +08:00
  • 64af7ea839
    fix critical bugs in MAPolicy and docs update (#207) v0.2.7 n+e 2020-09-08 21:10:48 +08:00
  • 380e9e911d
    fix atari examples (#206) n+e 2020-09-06 23:05:33 +08:00
  • 8bb8ecba6e
    set policy.eval() before collector.collect (#204) n+e 2020-09-06 16:20:16 +08:00
  • 34f714a677 Numba acceleration (#193) Trinkle23897 2020-09-02 13:03:32 +08:00
  • 5b49192a48
    DQN Atari examples (#187) yingchengyang 2020-08-30 05:48:09 +08:00
  • 94bfb32cc1
    optimize training procedure and improve code coverage (#189) n+e 2020-08-27 12:15:18 +08:00
  • a9f9940d17
    code refactor for venv (#179) v0.2.6 youkaichao 2020-08-19 15:00:24 +08:00
  • 311a2beafb
    Pickle compatible for replay buffer and improve buffer.get (#182) n+e 2020-08-16 16:26:23 +08:00
  • 7f3b817b24
    add policy.update to enable post process and remove collector.sample (#180) youkaichao 2020-08-15 16:10:42 +08:00
  • 140b1c2cab
    Improve PER (#159) n+e 2020-08-06 10:26:24 +08:00
  • 312b7551cc
    Add BipedalWalkerHardcore-v3 SAC example (#177) Imone 2020-08-05 10:29:41 +08:00
  • f2bcc55a25
    ShmemVectorEnv Implementation (#174) ChenDRAG 2020-08-04 13:39:05 +08:00
  • 996e2f7c9b
    Add profile workflow (#143) ChenDRAG 2020-08-02 18:24:40 +08:00
  • 32df0567bb
    use nn.Sequential in DQN (#176) youkaichao 2020-08-02 15:14:44 +08:00
  • 99a1d40e85
    Dueling DQN (#170) yingchengyang 2020-07-29 19:44:42 +08:00
  • ad395b5235
    bugfix for test_async_env (#171) youkaichao 2020-07-28 20:06:01 +08:00
  • b7a4015db7 doc update and do not force save 'policy' in np format (#168) Trinkle23897 2020-07-27 16:54:14 +08:00
  • e024afab8c
    Asynchronous sampling vector environment (#134) Alexis DUBURCQ 2020-07-26 12:01:21 +02:00
  • 30368c29a6
    Replay buffer allows stack_num = 1 (#165) Alexis DUBURCQ 2020-07-25 13:33:44 +02:00
  • 38a95c19da
    Yet another 3 fix (#160) n+e 2020-07-24 17:38:12 +08:00
  • bfeffe1f97
    unify single-env and multi-env in collector (#157) youkaichao 2020-07-23 16:40:53 +08:00
  • 352a518399
    3 fix (#158) n+e 2020-07-23 15:12:02 +08:00
  • bd9c3c7f8d
    docs fix and v0.2.5 (#156) v0.2.5 n+e 2020-07-22 14:42:08 +08:00
  • 089b85b6a2
    Fix shape inconsistency in A2CPolicy and PPOPolicy (#155) n+e 2020-07-21 22:24:06 +08:00
  • 865ef6c693
    Improve to_torch/to_numpy converters (#147) Alexis DUBURCQ 2020-07-21 10:47:56 +02:00
  • 8c32d99c65
    Add multi-agent example: tic-tac-toe (#122) youkaichao 2020-07-21 14:59:49 +08:00
  • d09b69e594
    buffer update bug fix (#154) ChenDRAG 2020-07-20 22:12:57 +08:00
  • fe5555d2a1 write tutorials to specify the standard of Batch (#142) youkaichao 2020-07-19 15:20:35 +08:00
  • 3a08e27ed4 Standardized behavior of Batch.cat and misc code refactor (#137) youkaichao 2020-07-16 19:36:32 +08:00
  • 09e10e384f Vector env enable select worker (#132) Alexis DUBURCQ 2020-07-13 16:38:42 +02:00
  • 26fb87433d Improve collector (#125) v0.2.4.post1 youkaichao 2020-07-13 00:24:31 +08:00