Tianshou/tianshou/core/policy/base.py

from __future__ import absolute_import
from __future__ import division

import tensorflow as tf


class PolicyBase(object):
    """
    Base class for policy. Mandatory methods for a policy class are:

    - :func:`act`. It's used interacting with the environment during training, \
    so exploration noise should be added in this method.

    - :func:`act_test`. Since RL usually adds additional exploration noise during training, a different method\
    for testing the policy should be defined with different exploration specification.\
    Generally, DQN uses different :math:`\epsilon` in :math:`\epsilon`-greedy and\
    DDPG removes exploration noise during test.

    - :func:`reset`. It's mainly to reset the states of the exploration random process, or if your policy has\
    some internal states that should be reset at the beginning of each new episode. Otherwise, this method\
    does nothing.
    """
    def act(self, observation, my_feed_dict):
        """
        Return action given observation, when interacting with the environment during training.

        :param observation: An array-like with rank the same as a single observation of the environment.
            Its "batch_size" is 1, but should not be explicitly set. This method will add the dimension
            of "batch_size" to the first dimension.
        :param my_feed_dict: A dict. Specifies placeholders such as dropout and batch_norm except observation.

        :return: A numpy array. Action given the single observation. Its "batch_size" is 1,
            but should not be explicitly set.
        """
        raise NotImplementedError()

    def act_test(self, observation, my_feed_dict):
        """
        Return action given observation, when interacting with the environment during test.

        :param observation: An array-like with rank the same as a single observation of the environment.
            Its "batch_size" is 1, but should not be explicitly set. This method will add the dimension
            of "batch_size" to the first dimension.
        :param my_feed_dict: A dict. Specifies placeholders such as dropout and batch_norm except observation.

        :return: A numpy array. Action given the single observation. Its "batch_size" is 1,
            but should not be explicitly set.
        """
        raise NotImplementedError

    def reset(self):
        """
        Reset the internal states of the policy. Does nothing by default.
        """
        pass
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`from __future__ import absolute_import`
			`from __future__ import division`

			`import tensorflow as tf`

implement dqn loss and dpg loss, add TODO for separate actor and critic 2017-12-15 14:24:08 +08:00
fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development. 2017-12-23 15:36:10 +08:00			`class PolicyBase(object):`
			`"""`
part of API doc 2018-04-12 21:10:50 +08:00			`Base class for policy. Mandatory methods for a policy class are:`

			- :func:`act`. It's used interacting with the environment during training, \
			`so exploration noise should be added in this method.`

			- :func:`act_test`. Since RL usually adds additional exploration noise during training, a different method\
			`for testing the policy should be defined with different exploration specification.\`
			Generally, DQN uses different :math:`\epsilon` in :math:`\epsilon`-greedy and\
			`DDPG removes exploration noise during test.`

			- :func:`reset`. It's mainly to reset the states of the exploration random process, or if your policy has\
			`some internal states that should be reset at the beginning of each new episode. Otherwise, this method\`
			`does nothing.`
fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development. 2017-12-23 15:36:10 +08:00			`"""`
initial data_collector. working on examples/dqn_replay.py to run 2018-03-04 21:29:58 +08:00			`def act(self, observation, my_feed_dict):`
part of API doc 2018-04-12 21:10:50 +08:00			`"""`
			`Return action given observation, when interacting with the environment during training.`

			`:param observation: An array-like with rank the same as a single observation of the environment.`
			`Its "batch_size" is 1, but should not be explicitly set. This method will add the dimension`
			`of "batch_size" to the first dimension.`
			`:param my_feed_dict: A dict. Specifies placeholders such as dropout and batch_norm except observation.`

			`:return: A numpy array. Action given the single observation. Its "batch_size" is 1,`
			`but should not be explicitly set.`
			`"""`
finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit 2017-12-17 12:52:00 +08:00			`raise NotImplementedError()`
preliminary design of dqn_example, dqn interface. identify the assign of networks 2017-12-13 20:47:45 +08:00
part of API doc 2018-04-12 21:10:50 +08:00			`def act_test(self, observation, my_feed_dict):`
			`"""`
			`Return action given observation, when interacting with the environment during test.`

			`:param observation: An array-like with rank the same as a single observation of the environment.`
			`Its "batch_size" is 1, but should not be explicitly set. This method will add the dimension`
			`of "batch_size" to the first dimension.`
			`:param my_feed_dict: A dict. Specifies placeholders such as dropout and batch_norm except observation.`

			`:return: A numpy array. Action given the single observation. Its "batch_size" is 1,`
			`but should not be explicitly set.`
			`"""`
			`raise NotImplementedError`

finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check! 2018-03-11 17:47:42 +08:00			`def reset(self):`
			`"""`
part of API doc 2018-04-12 21:10:50 +08:00			`Reset the internal states of the policy. Does nothing by default.`
finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check! 2018-03-11 17:47:42 +08:00			`"""`
			`pass`