2017-12-15 14:24:08 +08:00
|
|
|
#TODO:
|
|
|
|
|
|
|
|
Separate actor and critic. (Important, we need to focus on that recently)
|
|
|
|
|
2017-12-08 21:09:23 +08:00
|
|
|
# policy
|
2017-11-18 09:37:15 +08:00
|
|
|
|
2017-12-08 21:09:23 +08:00
|
|
|
YongRen
|
2017-11-18 09:37:15 +08:00
|
|
|
|
2017-12-08 21:09:23 +08:00
|
|
|
### base, stochastic
|
2017-11-18 09:37:15 +08:00
|
|
|
|
2017-12-08 21:09:23 +08:00
|
|
|
follow OnehotCategorical to write Gaussian, can be in the same file as stochastic.py
|
2017-11-18 09:37:15 +08:00
|
|
|
|
2017-12-08 21:09:23 +08:00
|
|
|
### deterministic
|
2017-11-18 09:37:15 +08:00
|
|
|
|
2017-12-08 21:09:23 +08:00
|
|
|
not sure how to write, but should at least have act() method to interact with environment
|
2017-11-18 09:37:15 +08:00
|
|
|
|
2017-12-13 20:47:45 +08:00
|
|
|
referencing QValuePolicy in base.py, should have at least the listed methods.
|
2017-11-18 09:37:15 +08:00
|
|
|
|
|
|
|
|
2017-12-08 21:09:23 +08:00
|
|
|
# losses
|
2017-11-18 09:37:15 +08:00
|
|
|
|
2017-12-08 21:09:23 +08:00
|
|
|
TongzhengRen
|
2017-11-18 09:37:15 +08:00
|
|
|
|
2017-12-08 21:09:23 +08:00
|
|
|
seems to be direct python functions. Though the management of placeholders may require some discussion. also may write it in a functional form.
|