modified README
This commit is contained in:
parent
f3aee448e0
commit
a40e5aec54
18
README.md
18
README.md
@ -83,12 +83,6 @@ Try to use full names. Don't use abbrevations for class/function/variable names
|
|||||||
|
|
||||||
The """xxx""" comment should be written right after class/function. Also comment the part that's not intuitive during the code. We must comment, but for now we don't need to polish them.
|
The """xxx""" comment should be written right after class/function. Also comment the part that's not intuitive during the code. We must comment, but for now we don't need to polish them.
|
||||||
|
|
||||||
# High Priority TODO
|
|
||||||
|
|
||||||
For Haosheng and Tongzheng: separate actor and critic, rewrite the interfaces for policy
|
|
||||||
|
|
||||||
Others can still focus on the task below.
|
|
||||||
|
|
||||||
## TODO
|
## TODO
|
||||||
Search based method parallel.
|
Search based method parallel.
|
||||||
|
|
||||||
@ -106,6 +100,18 @@ Note: install openai/gym first to run the Atari environment; note that interface
|
|||||||
|
|
||||||
Without preprocessing and other tricks, this example will not train to any meaningful results. Codes should past two tests: individual module test and run through this example code.
|
Without preprocessing and other tricks, this example will not train to any meaningful results. Codes should past two tests: individual module test and run through this example code.
|
||||||
|
|
||||||
|
## Some bug to fix
|
||||||
|
|
||||||
|
For DQN and other deterministic policy: $\epsilon$-greedy or other exploration during collection?
|
||||||
|
|
||||||
|
In Batch.py, notice that we cannot stop by setting num_timestep
|
||||||
|
|
||||||
|
Magic numbers
|
||||||
|
|
||||||
|
## One idea
|
||||||
|
|
||||||
|
Like zhusuan, we can register losses background so that we need not claim it in the example.
|
||||||
|
|
||||||
## Dependency
|
## Dependency
|
||||||
Tensorflow (Version >= 1.4)
|
Tensorflow (Version >= 1.4)
|
||||||
Gym
|
Gym
|
||||||
|
Loading…
x
Reference in New Issue
Block a user