Tianshou/tianshou/core/policy/base.py

from __future__ import absolute_import
from __future__ import division

import tensorflow as tf


class PolicyBase(object):
    """
    base class for policy. only provides `act` method with exploration
    """
    def act(self, observation, my_feed_dict):
        raise NotImplementedError()

    def reset(self):
        """
        for temporal correlated random process exploration, as in DDPG
        :return:
        """
        pass
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`from __future__ import absolute_import`
			`from __future__ import division`

			`import tensorflow as tf`

implement dqn loss and dpg loss, add TODO for separate actor and critic 2017-12-15 14:24:08 +08:00
fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development. 2017-12-23 15:36:10 +08:00			`class PolicyBase(object):`
			`"""`
			base class for policy. only provides `act` method with exploration
			`"""`
initial data_collector. working on examples/dqn_replay.py to run 2018-03-04 21:29:58 +08:00			`def act(self, observation, my_feed_dict):`
finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit 2017-12-17 12:52:00 +08:00			`raise NotImplementedError()`
preliminary design of dqn_example, dqn interface. identify the assign of networks 2017-12-13 20:47:45 +08:00
finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check! 2018-03-11 17:47:42 +08:00			`def reset(self):`
			`"""`
			`for temporal correlated random process exploration, as in DDPG`
			`:return:`
			`"""`
			`pass`