policy

YongRen

follow OnehotCategorical to write Gaussian, can be in the same file as stochastic.py

not sure how to write, but should at least have act() method to interact with environment

DQN should have an effective argmax_{actions}() method to use as a value network

losses

TongzhengRen

seems to be direct python functions. Though the management of placeholders may require some discussion. also may write it in a functional form.