policy
YongRen
base, stochastic
follow OnehotCategorical to write Gaussian, can be in the same file as stochastic.py
deterministic
not sure how to write, but should at least have act() method to interact with environment
DQN should have an effective argmax_{actions}() method to use as a value network
losses
TongzhengRen
seems to be direct python functions. Though the management of placeholders may require some discussion. also may write it in a functional form.