7 Commits

Author SHA1 Message Date
Trinkle23897
ff81a18f42 compute_nstep_returns (item 2 of #51) 2020-06-02 22:29:50 +08:00
Trinkle23897
c2a7caf806 add recurrent actor and critic 2020-04-30 16:31:40 +08:00
Trinkle23897
6bf1ea644d fix ppo 2020-04-19 14:30:42 +08:00
Trinkle23897
7b65d43394 vanilla imitation learning 2020-04-13 19:37:27 +08:00
Trinkle23897
fdc969b830 fix collector 2020-03-25 14:08:28 +08:00
Trinkle23897
a87563b8e6 add demo of ppo continuous action task 2020-03-21 17:04:42 +08:00
Trinkle23897
8bd8246b16 refract test code 2020-03-21 10:58:01 +08:00