#TODO:
Separate actor and critic. (Important, we need to focus on that recently)
policy
YongRen
base, stochastic
follow OnehotCategorical to write Gaussian, can be in the same file as stochastic.py
deterministic
not sure how to write, but should at least have act() method to interact with environment
referencing QValuePolicy in base.py, should have at least the listed methods.
losses
TongzhengRen
seems to be direct python functions. Though the management of placeholders may require some discussion. also may write it in a functional form.