diff --git a/README.md b/README.md index bc2414a..7f72b02 100644 --- a/README.md +++ b/README.md @@ -83,12 +83,6 @@ Try to use full names. Don't use abbrevations for class/function/variable names The """xxx""" comment should be written right after class/function. Also comment the part that's not intuitive during the code. We must comment, but for now we don't need to polish them. -# High Priority TODO - -For Haosheng and Tongzheng: separate actor and critic, rewrite the interfaces for policy - -Others can still focus on the task below. - ## TODO Search based method parallel. @@ -106,6 +100,18 @@ Note: install openai/gym first to run the Atari environment; note that interface Without preprocessing and other tricks, this example will not train to any meaningful results. Codes should past two tests: individual module test and run through this example code. +## Some bug to fix + +For DQN and other deterministic policy: $\epsilon$-greedy or other exploration during collection? + +In Batch.py, notice that we cannot stop by setting num_timestep + +Magic numbers + +## One idea + +Like zhusuan, we can register losses background so that we need not claim it in the example. + ## Dependency Tensorflow (Version >= 1.4) Gym