Commit Graph

  • 5599a6d1a6 Fix padding of inconsistent keys with Batch.stack and Batch.cat (#130) youkaichao 2020-07-12 23:45:42 +08:00
  • affeec13de Improve Batch (#128) youkaichao 2020-07-11 21:46:01 +08:00
  • 2564e989fb Improve Batch (#126) youkaichao 2020-07-11 09:44:47 +08:00
  • 47e8e2686c
    move atari wrapper to examples and publish v0.2.4 (#124) v0.2.4 n+e 2020-07-10 17:20:39 +08:00
  • ff99662fe6
    bugfix for update with empty buffer; remove duplicate variable _weight_sum in PrioritizedReplayBuffer (#120) youkaichao 2020-07-10 08:24:11 +08:00
  • e767de044b
    Remove dummy net code (#123) youkaichao 2020-07-09 22:57:01 +08:00
  • aa3c453f42
    Raise exception for Batch __getitem__. (#119) Alexis DUBURCQ 2020-07-08 16:29:37 +02:00
  • 7f9a1f1328
    add type check for each element rather than the first element (#112) youkaichao 2020-07-08 21:00:00 +08:00
  • 481015932c
    bugfix for hang in list(Batch()) (#117) youkaichao 2020-07-08 17:09:27 +08:00
  • f5e007932f
    fix Batch init for types other than number and bool (#115) youkaichao 2020-07-08 13:45:29 +08:00
  • dbbb859ec5
    doc fix (#113) youkaichao 2020-07-08 08:30:01 +08:00
  • 9c7d31e5d6
    bugfix for empty_ (#114) youkaichao 2020-07-08 08:10:34 +08:00
  • 69caf89908
    Fix to_torch converters (#111) Alexis DUBURCQ 2020-07-07 12:40:55 +02:00
  • 8913bf36b1
    change Batch.empty to in-place fill; add copy option for Batch construction (#110) youkaichao 2020-07-06 20:30:15 +08:00
  • 5b1373924e
    doc fix; policy train/eval signiture fix (#109) youkaichao 2020-07-06 10:44:34 +08:00
  • db0e2e5cd2
    Advanced Batch slicing & minor fix of RNN support (#106) n+e 2020-06-30 18:02:44 +08:00
  • c639446c66 add copybutton Trinkle23897 2020-06-29 13:45:01 +08:00
  • e0f4862d01 store RNN hidden states in policy._state and add sample_avail in buffer (#19) Trinkle23897 2020-06-29 12:18:52 +08:00
  • 60cfc373f8
    fix #98, support #99 (#102) danagi 2020-06-27 21:40:09 +08:00
  • a951a32487
    Enable partial stacking at Batch level (#100) Alexis DUBURCQ 2020-06-27 03:06:40 +02:00
  • 70aa7bf93e
    Use lower-level API to reduce overhead. (#97) Alexis DUBURCQ 2020-06-26 12:37:50 +02:00
  • 5ac9f9b144
    Do not check bounds since it is always valid when everything is fine. (#95) Alexis DUBURCQ 2020-06-25 15:06:35 +02:00
  • 3086b5c31d
    Buffer refactoring to support batch over batch reliably (#93) Alexis DUBURCQ 2020-06-25 14:39:30 +02:00
  • 506cc97ba5
    fix #91 (#94) rocknamx 2020-06-25 07:02:59 +08:00
  • 49f43e9f1f
    Fix Batch to numpy compatibility (#92) Alexis DUBURCQ 2020-06-24 15:43:48 +02:00
  • ebc551a25e
    Fix support of 0-dim numpy array (#89) Alexis DUBURCQ 2020-06-24 00:55:24 +02:00
  • d7dd3105bc
    Fix tuple support. (#88) Alexis DUBURCQ 2020-06-23 17:37:26 +02:00
  • ec270759ab
    Batch refactoring (#87) Alexis DUBURCQ 2020-06-23 16:50:59 +02:00
  • 13828f6309
    added noise param to collector for test phase, fixed examples to adapt modification (#86) danagi 2020-06-23 07:20:51 +08:00
  • e8b44bbaf4 move sac_mcc to examples (runtime too long) Trinkle23897 2020-06-22 21:39:00 +08:00
  • 6a2963bd64 fix #85 Trinkle23897 2020-06-22 17:11:26 +08:00
  • a655334d00 change batch.append to batch.cat Trinkle23897 2020-06-20 22:23:12 +08:00
  • aff0f9aee0 fix append batch over batch Trinkle23897 2020-06-20 22:03:22 +08:00
  • 268f9d0533
    type signature correction (#83) youkaichao 2020-06-20 09:57:16 +08:00
  • 81e4a16ef2 fix a bug in re-index replay buffer (fix #82) Trinkle23897 2020-06-17 16:37:51 +08:00
  • c59ad40aef
    Add auto alpha tuning and exploration noise for sac. (#80) danagi 2020-06-16 22:17:28 +08:00
  • 263e490b76 fix #79 Trinkle23897 2020-06-16 16:54:16 +08:00
  • 5f2f05a570 fix #40 Trinkle23897 2020-06-13 17:06:08 +08:00
  • 3774258cc7 fix unittest Trinkle23897 2020-06-11 09:07:45 +08:00
  • 1a914336f7 add random action in collector (fix #78) Trinkle23897 2020-06-11 08:57:37 +08:00
  • 397e92b0fc fix #77 Trinkle23897 2020-06-10 12:06:56 +08:00
  • f1951780ab fix a bug of storing batch over batch data into buffer Trinkle23897 2020-06-09 18:46:14 +08:00
  • b32b96cd3e seperate flake8 lint Trinkle23897 2020-06-09 10:33:48 +08:00
  • 513573ea82 add link Trinkle23897 2020-06-08 22:20:52 +08:00
  • 560116d0b2 cheat sheet Trinkle23897 2020-06-08 21:53:00 +08:00
  • 52be533d06
    Enable getattr for SubprocVecEnv. (#74) Alexis DUBURCQ 2020-06-05 11:17:43 +02:00
  • 66be5641b6
    Fix to_numpy. (#73) Alexis DUBURCQ 2020-06-04 16:32:05 +02:00
  • 7bf202f195 polish docs Trinkle23897 2020-06-03 17:04:26 +08:00
  • dc451dfe88 nstep all (fix #51) Trinkle23897 2020-06-03 13:59:47 +08:00
  • ff81a18f42 compute_nstep_returns (item 2 of #51) Trinkle23897 2020-06-02 22:29:50 +08:00
  • f818a2467b zh_CN docs Trinkle23897 2020-06-02 08:51:14 +08:00
  • 5f2c5347df v0.2.3 v0.2.3 Trinkle23897 2020-06-01 09:37:30 +08:00
  • ba1b3e54eb fix #69 Trinkle23897 2020-06-01 08:30:09 +08:00
  • 1fce527c77
    Fix 'to_tensor' dtype/device forwarding for Batch over Batch. (#68) Alexis DUBURCQ 2020-05-30 15:40:31 +02:00
  • 529a4cf44c
    Add pickle support for Batch. Fix VectorEnv. (#67) Alexis DUBURCQ 2020-05-30 15:29:33 +02:00
  • dd3e2130bb
    Infer the right dtype for replay buffers. (#64) Alexis DUBURCQ 2020-05-29 16:27:03 +02:00
  • 8af7196a9a
    Robust conversion from/to numpy/pytorch (#63) Alexis DUBURCQ 2020-05-29 14:45:21 +02:00
  • b5093ecb56
    Minor refactor for Batch class. (#61) Alexis DUBURCQ 2020-05-29 11:56:46 +02:00
  • be9ce44290 fix #59 Trinkle23897 2020-05-29 11:49:47 +08:00
  • d2b2fa87c0 fix #56 Trinkle23897 2020-05-29 08:03:37 +08:00
  • de556fd22d item3 of #51 Trinkle23897 2020-05-27 11:02:23 +08:00
  • 6237cc0d52
    fix dqn zero eps (#52) magicly 2020-05-21 11:35:41 +08:00
  • 57bca16f94
    Fix log_prob and PPO dual_clip (#49) Imone 2020-05-18 16:23:35 +08:00
  • 70122dc03d oinit with 0 bias Trinkle23897 2020-05-17 17:06:20 +08:00
  • 3271c92609 orthogonal init for ppo in test script Trinkle23897 2020-05-16 20:27:01 +08:00
  • 0eef0ca198 fix optional type syntax Trinkle23897 2020-05-16 20:08:32 +08:00
  • 3243484f8e show stat in pytest Trinkle23897 2020-05-16 08:48:12 +08:00
  • 9b26137cd2 add type annotation Trinkle23897 2020-05-12 11:31:47 +08:00
  • 075825325e add preprocess_fn (#42) Trinkle23897 2020-05-05 13:39:51 +08:00
  • 04b091d975 fix max-grad-norm err in a2c (#46) Trinkle23897 2020-05-04 12:33:04 +08:00
  • c2a7caf806 add recurrent actor and critic Trinkle23897 2020-04-30 16:31:40 +08:00
  • 134f787e24 reserve 'policy' keyword in replay buffer Trinkle23897 2020-04-29 17:48:48 +08:00
  • e58fc78546 build docs Trinkle23897 2020-04-29 14:16:38 +08:00
  • bb2f833d0e support Batch of Batch and fix bugs (#38) Trinkle23897 2020-04-29 12:14:53 +08:00
  • 8f718d9b13
    Fix log_prob in SAC (#41) nicoguertler 2020-04-28 17:44:15 +02:00
  • 69e4b3d301 fix setup err on building docs Trinkle23897 2020-04-28 21:11:40 +08:00
  • 80d661907e Multimodal obs (#38, #27, #25) Trinkle23897 2020-04-28 20:56:02 +08:00
  • 959955fa2a fix historical issues Trinkle23897 2020-04-26 16:13:51 +08:00
  • 6b96f124ae fix pdqn v0.2.2 Trinkle23897 2020-04-26 15:11:20 +08:00
  • b23749463e
    Prioritized DQN (#30) rocknamx 2020-04-26 12:05:58 +08:00
  • 70290346ea compatible with torch==1.5.0 (fix #37) Trinkle23897 2020-04-26 11:04:45 +08:00
  • 8812eaa502 fix #36 Trinkle23897 2020-04-23 22:06:18 +08:00
  • 205698dd66
    fix #33 (#34) Minghao Zhang 2020-04-21 15:36:08 +08:00
  • 4fd826761c enable null buffer in test collector Trinkle23897 2020-04-20 11:50:18 +08:00
  • 815f3522bb imitation with discrete action space Trinkle23897 2020-04-20 11:25:20 +08:00
  • 6bf1ea644d fix ppo Trinkle23897 2020-04-19 14:30:42 +08:00
  • 680fc0ffbe gae Trinkle23897 2020-04-14 21:11:06 +08:00
  • 7b65d43394 vanilla imitation learning Trinkle23897 2020-04-13 19:37:27 +08:00
  • befdfb07e8 polish docs Trinkle23897 2020-04-11 19:29:46 +08:00
  • 6a244d1fbb save_fn Trinkle23897 2020-04-11 16:54:27 +08:00
  • 74407e13da env info log_fn (#28) Trinkle23897 2020-04-10 18:02:05 +08:00
  • ecfcb9f295 fix docs Trinkle23897 2020-04-10 11:16:33 +08:00
  • 3cc22b7c0c __call__ -> forward Trinkle23897 2020-04-10 10:47:16 +08:00
  • 13086b7f64 add ignore_obs_next in buffer Trinkle23897 2020-04-10 09:01:17 +08:00
  • 19f2cce294 seealso and change policy dir structure Trinkle23897 2020-04-09 21:36:53 +08:00
  • 6da80e045a fix rnn (#19), add __repr__, and fix #26 Trinkle23897 2020-04-09 19:53:45 +08:00
  • 86572c66d4 maybe finished rnn? Trinkle23897 2020-04-08 21:13:15 +08:00
  • d9d2763dad first version with full documentation v0.2.1 Trinkle23897 2020-04-07 11:50:34 +08:00
  • 6c8edf6a3a codecov badge Trinkle23897 2020-04-07 11:17:10 +08:00
  • e0809ff135 add policy docs (#21) Trinkle23897 2020-04-06 19:36:59 +08:00