modified README

This commit is contained in:
rtz19970824 2018-02-24 16:26:19 +08:00
parent f3aee448e0
commit a40e5aec54

View File

@ -83,12 +83,6 @@ Try to use full names. Don't use abbrevations for class/function/variable names
The """xxx""" comment should be written right after class/function. Also comment the part that's not intuitive during the code. We must comment, but for now we don't need to polish them.
# High Priority TODO
For Haosheng and Tongzheng: separate actor and critic, rewrite the interfaces for policy
Others can still focus on the task below.
## TODO
Search based method parallel.
@ -106,6 +100,18 @@ Note: install openai/gym first to run the Atari environment; note that interface
Without preprocessing and other tricks, this example will not train to any meaningful results. Codes should past two tests: individual module test and run through this example code.
## Some bug to fix
For DQN and other deterministic policy: $\epsilon$-greedy or other exploration during collection?
In Batch.py, notice that we cannot stop by setting num_timestep
Magic numbers
## One idea
Like zhusuan, we can register losses background so that we need not claim it in the example.
## Dependency
Tensorflow (Version >= 1.4)
Gym