169 Commits

Author SHA1 Message Date
Trinkle23897
d918022ce9 merge master into dev 2021-02-18 12:46:55 +08:00
n+e
cb65b56b13
v0.3.2 (#292)
Throw a warning in ListReplayBuffer.

This version update is needed because of #289, the previous v0.3.1 cannot work well under torch<=1.6.0 with cuda environment.
2021-02-16 09:31:46 +08:00
n+e
d003c8e566
fix 2 bugs of batch (#284)
1. `_create_value(Batch(a={}, b=[1, 2, 3]), 10, False)`

before:
```python
TypeError: cannot concatenate with Batch() which is scalar
```
after:
```python
Batch(
    a: Batch(),
    b: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
)
```

2. creating keys in a batch's subkey, e.g. 
```python
a = Batch(info={"key1": [0, 1], "key2": [2, 3]})
a[0] = Batch(info={"key1": 2, "key3": 4})
print(a)
```
before:
```python
Batch(
    info: Batch(
              key1: array([0, 1]),
              key2: array([0, 3]),
          ),
)
```
after:
```python
ValueError: Creating keys is not supported by item assignment.
```

3. small optimization for `Batch.stack_` and `Batch.cat_`
2021-02-16 09:01:54 +08:00
n+e
c838f2f0e9
fix 2 bugs of batch (#284)
1. `_create_value(Batch(a={}, b=[1, 2, 3]), 10, False)`

before:
```python
TypeError: cannot concatenate with Batch() which is scalar
```
after:
```python
Batch(
    a: Batch(),
    b: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
)
```

2. creating keys in a batch's subkey, e.g. 
```python
a = Batch(info={"key1": [0, 1], "key2": [2, 3]})
a[0] = Batch(info={"key1": 2, "key3": 4})
print(a)
```
before:
```python
Batch(
    info: Batch(
              key1: array([0, 1]),
              key2: array([0, 3]),
          ),
)
```
after:
```python
ValueError: Creating keys is not supported by item assignment.
```

3. small optimization for `Batch.stack_` and `Batch.cat_`, raise ValueError when receiving invalid data format.
2021-02-02 19:28:05 +08:00
ChenDRAG
f0129f4ca7
Add CachedReplayBuffer and ReplayBufferManager (#278)
This is the second commit of 6 commits mentioned in #274, which features minor refactor of ReplayBuffer and adding two new ReplayBuffer classes called CachedReplayBuffer and ReplayBufferManager. You can check #274 for more detail.

1. Add ReplayBufferManager (handle a list of buffers) and CachedReplayBuffer;
2. Make sure the reserved keys cannot be edited by methods like `buffer.done = xxx`;
3. Add `set_batch` method for manually choosing the batch the ReplayBuffer wants to handle;
4. Add `sample_index` method, same as `sample` but only return index instead of both index and batch data;
5. Add `prev` (one-step previous transition index), `next` (one-step next transition index) and `unfinished_index` (the last modified index whose done==False);
6. Separate `alloc_fn` method for allocating new memory for `self._meta` when a new `(key, value)` pair comes in;
7. Move buffer's documentation to `docs/tutorials/concepts.rst`.

Co-authored-by: n+e <trinkle23897@gmail.com>
2021-01-29 12:23:18 +08:00
Nico Gürtler
5d13d8a453
Saving and loading replay buffer with HDF5 (#261)
As mentioned in #260, this pull request is about an implementation of saving and loading the replay buffer with HDF5.
2020-12-17 08:58:43 +08:00
n+e
5ed6c1c7aa
change the step in trainer (#235)
This PR separates the `global_step` into `env_step` and `gradient_step`. In the future, the data from the collecting state will be stored under `env_step`, and the data from the updating state will be stored under `gradient_step`.

Others:
- add `rew_std` and `best_result` into the monitor
- fix network unbounded in `test/continuous/test_sac_with_il.py` and `examples/box2d/bipedal_hardcore_sac.py`
- change the dependency of ray to 1.0.0 since ray-project/ray#10134 has been resolved
2020-10-04 21:55:43 +08:00
n+e
710966eda7
change API of train_fn and test_fn (#229)
train_fn(epoch) -> train_fn(epoch, num_env_step)
test_fn(epoch) -> test_fn(epoch, num_env_step)
2020-09-26 16:35:37 +08:00
rocknamx
bf39b9ef7d
clarify updating state (#224)
Add an indicator(i.e. `self.learning`) of learning will be convenient for distinguishing state of policy.
Meanwhile, the state of `self.training` will be undisputed in the training stage.
Related issue: #211 

Others:
- fix a bug in DDQN: target_q could not be sampled from np.random.rand
- fix a bug in DQN atari net: it should add a ReLU before the last layer
- fix a bug in collector timing

Co-authored-by: n+e <463003665@qq.com>
2020-09-22 16:28:46 +08:00
n+e
623bf24f0c
fix unittest (#218) 2020-09-14 15:59:32 +08:00
n+e
b284ace102
type check in unit test (#200)
Fix #195: Add mypy test in .github/workflows/docs_and_lint.yml.

Also remove the out-of-the-date api
2020-09-13 19:31:50 +08:00
n+e
c91def6cbc
code format and update function signatures (#213)
Cherry-pick from #200 

- update the function signature
- format code-style
- move _compile into separate functions
- fix a bug in to_torch and to_numpy (Batch)
- remove None in action_range

In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black))
2020-09-12 15:39:01 +08:00
n+e
b86d78766b
fix docs and add docstring check (#210)
- fix broken links and out-of-the-date content
- add pydocstyle and doc8 check
- remove collector.seed and collector.render
2020-09-11 07:55:37 +08:00
n+e
8bb8ecba6e
set policy.eval() before collector.collect (#204)
* fix #203

* no_grad argument in collector.collect
2020-09-06 16:20:16 +08:00
Trinkle23897
34f714a677 Numba acceleration (#193)
Training FPS improvement (base commit is 94bfb32):
test_pdqn: 1660 (without numba) -> 1930
discrete/test_ppo: 5100 -> 5170

since nstep has little impact on overall performance, the unit test result is:
GAE: 4.1s -> 0.057s
nstep: 0.3s -> 0.15s (little improvement)

Others:
- fix a bug in ttt set_eps
- keep only sumtree in segment tree implementation
- dirty fix for asyncVenv check_id test
2020-09-02 13:03:32 +08:00
yingchengyang
5b49192a48
DQN Atari examples (#187)
This PR aims to provide the script of Atari DQN setting:
- A speedrun of PongNoFrameskip-v4 (finished, about half an hour in i7-8750 + GTX1060 with 1M environment steps)
- A general script for all atari game
Since we use multiple env for simulation, the result is slightly different from the original paper, but consider to be acceptable.

It also adds another parameter save_only_last_obs for replay buffer in order to save the memory.

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-08-30 05:48:09 +08:00
n+e
94bfb32cc1
optimize training procedure and improve code coverage (#189)
1. add policy.eval() in all test scripts' "watch performance"
2. remove dict return support for collector preprocess_fn
3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)`
4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184)
5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard
6. add test_returns (both GAE and nstep)
7. change the type-checking order in batch.py and converter.py in order to meet the most often case first
8. fix shape inconsistency for torch.Tensor in replay buffer
9. remove `**kwargs` in ReplayBuffer
10. remove default value in batch.split() and add merge_last argument (#185)
11. improve nstep efficiency
12. add max_batchsize in onpolicy algorithms
13. potential bugfix for subproc.wait
14. fix RecurrentActorProb
15. improve the code-coverage (from 90% to 95%) and remove the dead code
16. fix some incorrect type annotation

The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).
2020-08-27 12:15:18 +08:00
youkaichao
a9f9940d17
code refactor for venv (#179)
- Refacor code to remove duplicate code

- Enable async simulation for all vector envs

- Remove `collector.close` and rename `VectorEnv` to `DummyVectorEnv`

The abstraction of vector env changed.

Prior to this pr, each vector env is almost independent.

After this pr, each env is wrapped into a worker, and vector envs differ with their worker type. In fact, users can just use `BaseVectorEnv` with different workers, I keep `SubprocVectorEnv`, `ShmemVectorEnv` for backward compatibility.

Co-authored-by: n+e <463003665@qq.com>
Co-authored-by: magicly <magicly007@gmail.com>
2020-08-19 15:00:24 +08:00
n+e
311a2beafb
Pickle compatible for replay buffer and improve buffer.get (#182)
fix #84 and make buffer more efficient
2020-08-16 16:26:23 +08:00
youkaichao
7f3b817b24
add policy.update to enable post process and remove collector.sample (#180)
* add policy.update to enable post process and remove collector.sample

* update doc in policy concept

* remove collector.sample in doc

* doc update of concepts

* docs

* polish

* polish policy

* remove collector.sample in docs

* minor fix

* Apply suggestions from code review

just a test

* doc fix

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-08-15 16:10:42 +08:00
n+e
140b1c2cab
Improve PER (#159)
- use segment tree to rewrite the previous PrioReplayBuffer code, add the test

- enable all Q-learning algorithms to use PER
2020-08-06 10:26:24 +08:00
Trinkle23897
b7a4015db7 doc update and do not force save 'policy' in np format (#168) 2020-07-27 16:54:14 +08:00
Alexis DUBURCQ
e024afab8c
Asynchronous sampling vector environment (#134)
Fix #103

Co-authored-by: youkaichao <youkaichao@126.com>
Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-26 18:01:21 +08:00
Alexis DUBURCQ
30368c29a6
Replay buffer allows stack_num = 1 (#165)
* stack_num starts at 1 (for no stacking) instead of 0.

* Use getter/stepper for stack_num.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-25 19:33:44 +08:00
youkaichao
bfeffe1f97
unify single-env and multi-env in collector (#157)
Unify the implementation with multi-environments (wrap a single environment in a multi-environment with one envs) to greatly simplify the code.

This changed the behavior of single-environment.
Prior to this pr, for single environment, collector.collect(n_step=n) will step n steps.
After this pr, for single environment, collector.collect(n_step=n) will step m episodes until the steps are greater than n.

That is to say, collectors now always collect full episodes.
2020-07-23 16:40:53 +08:00
Alexis DUBURCQ
865ef6c693
Improve to_torch/to_numpy converters (#147)
* Enable converting list/tuple back and forth from/to numpy/torch.

* Add fallbacks.

* Fix PEP8

* Update unit tests.

* Type annotation. Robust dtype check.

* List of object are converted individually, as a single tensor otherwise.

* Improve robustness of _to_array_with_correct_type

* Add unit tests.

* Do not catch exception at _to_array_with_correct_type level.

* Use _parse_value

* Fix PEP8

* Fix _parse_value list output type fallback.

* Catch torch exception.

* Do not convert torch tensor during fallback.

* Improve unit tests.

* Add unit tests.

* FIx missing import

* Remove support of numpy arrays of tensors for Batch value parser.

* Forbid numpy arrays of tensors.

* Fix PEP8.

* Fix comment.

* Reduce _parse_value branch number.

* Fix None value.

* Forward error message for debugging purpose.

* Fix _is_scalar.

* More specific try/catch blocks.

* Fix exception chaining.

* Fix PEP8.

* Fix _is_scalar.

* Fix missing corner case.

* Fix PEP8.

* Allow Batch empty key.

* Fix multi-dim array datatype check.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-21 16:47:56 +08:00
youkaichao
8c32d99c65
Add multi-agent example: tic-tac-toe (#122)
* make fileds with empty Batch rather than None after reset

* dummy code

* remove dummy

* add reward_length argument for collector

* Improve Batch (#126)

* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack

* bugfix for reward_length

* add get_final_reward_fn argument to collector to deal with marl

* minor polish

* remove multibuf

* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* make fileds with empty Batch rather than None after reset

* dummy code

* remove dummy

* add reward_length argument for collector

* bugfix for reward_length

* add get_final_reward_fn argument to collector to deal with marl

* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack

* dummy code

* remove dummy

* add multi-agent example: tic-tac-toe

* move TicTacToeEnv to a separate file

* remove dummy MANet

* code refactor

* move tic-tac-toe example to test

* update doc with marl-example

* fix docs

* reduce the threshold

* revert

* update player id to start from 1 and change player to agent; keep coding

* add reward_length argument for collector

* Improve Batch (#128)

* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* fix docs

* fix docs

* fix docs [ci skip]

* fix docs [ci skip]

Co-authored-by: Trinkle23897 <463003665@qq.com>

* refact

* re-implement Batch.stack and add testcases

* add doc for Batch.stack

* reward_metric

* modify flag

* minor fix

* reuse _create_values and refactor stack_ & cat_

* fix pep8

* fix reward stat in collector

* fix stat of collector, simplify test/base/env.py

* fix docs

* minor fix

* raise exception for stacking with partial keys and axis!=0

* minor fix

* minor fix

* minor fix

* marl-examples

* add condense; bugfix for torch.Tensor; code refactor

* marl example can run now

* enable tic tac toe with larger board size and win-size

* add test dependency

* Fix padding of inconsistent keys with Batch.stack and Batch.cat (#130)

* re-implement Batch.stack and add testcases

* add doc for Batch.stack

* reuse _create_values and refactor stack_ & cat_

* fix pep8

* fix docs

* raise exception for stacking with partial keys and axis!=0

* minor fix

* minor fix

Co-authored-by: Trinkle23897 <463003665@qq.com>

* stash

* let agent learn to play as agent 2 which is harder

* code refactor

* Improve collector (#125)

* remove multibuf

* reward_metric

* make fileds with empty Batch rather than None after reset

* many fixes and refactor
Co-authored-by: Trinkle23897 <463003665@qq.com>

* marl for tic-tac-toe and general gomoku

* update default gamma to 0.1 for tic tac toe to win earlier

* fix name typo; change default game config; add rew_norm option

* fix pep8

* test commit

* mv test dir name

* add rew flag

* fix torch.optim import error and madqn rew_norm

* remove useless kwargs

* Vector env enable select worker (#132)

* Enable selecting worker for vector env step method.

* Update collector to match new vecenv selective worker behavior.

* Bug fix.

* Fix rebase

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>

* show the last move of tictactoe by capital letters

* add multi-agent tutorial

* fix link

* Standardized behavior of Batch.cat and misc code refactor (#137)

* code refactor; remove unused kwargs; add reward_normalization for dqn

* bugfix for __setitem__ with torch.Tensor; add Batch.condense

* minor fix

* support cat with empty Batch

* remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases

* support stack with empty Batch

* remove condense

* refactor code to reflect the shared / partial / reserved categories of keys

* add is_empty(recursive=False)

* doc fix

* docfix and bugfix for _is_batch_set

* add doc for key reservation

* bugfix for algebra operators

* fix cat with lens hint

* code refactor

* bugfix for storing None

* use ValueError instead of exception

* hide lens away from users

* add comment for __cat

* move the computation of the initial value of lens in cat_ itself.

* change the place of doc string

* doc fix for Batch doc string

* change recursive to recurse

* doc string fix

* minor fix for batch doc

* write tutorials to specify the standard of Batch (#142)

* add doc for len exceptions

* doc move; unify is_scalar_value function

* remove some issubclass check

* bugfix for shape of Batch(a=1)

* keep moving doc

* keep writing batch tutorial

* draft version of Batch tutorial done

* improving doc

* keep improving doc

* batch tutorial done

* rename _is_number

* rename _is_scalar

* shape property do not raise exception

* restore some doc string

* grammarly [ci skip]

* grammarly + fix warning of building docs

* polish docs

* trim and re-arrange batch tutorial

* go straight to the point

* minor fix for batch doc

* add shape / len in basic usage

* keep improving tutorial

* unify _to_array_with_correct_type to remove duplicate code

* delegate type convertion to Batch.__init__

* further delegate type convertion to Batch.__init__

* bugfix for setattr

* add a _parse_value function

* remove dummy function call

* polish docs

Co-authored-by: Trinkle23897 <463003665@qq.com>

* bugfix for mapolicy

* pretty code

* remove debug code; remove condense

* doc fix

* check before get_agents in tutorials/tictactoe

* tutorial

* fix

* minor fix for batch doc

* minor polish

* faster test_ttt

* improve tic-tac-toe environment

* change default epoch and step-per-epoch for tic-tac-toe

* fix mapolicy

* minor polish for mapolicy

* 90% to 80% (need to change the tutorial)

* win rate

* show step number at board

* simplify mapolicy

* minor polish for mapolicy

* remove MADQN

* fix pep8

* change legal_actions to mask (need to update docs)

* simplify maenv

* fix typo

* move basevecenv to single file

* separate RandomAgent

* update docs

* grammarly

* fix pep8

* win rate typo

* format in cheatsheet

* use bool mask directly

* update doc for boolean mask

Co-authored-by: Trinkle23897 <463003665@qq.com>
Co-authored-by: Alexis DUBURCQ <alexis.duburcq@gmail.com>
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-21 14:59:49 +08:00
ChenDRAG
d09b69e594
buffer update bug fix (#154)
* buffer update bug fix

* some fix in buffer update

* polish

Co-authored-by: n+e <463003665@qq.com>
2020-07-20 22:12:57 +08:00
youkaichao
fe5555d2a1 write tutorials to specify the standard of Batch (#142)
* add doc for len exceptions

* doc move; unify is_scalar_value function

* remove some issubclass check

* bugfix for shape of Batch(a=1)

* keep moving doc

* keep writing batch tutorial

* draft version of Batch tutorial done

* improving doc

* keep improving doc

* batch tutorial done

* rename _is_number

* rename _is_scalar

* shape property do not raise exception

* restore some doc string

* grammarly [ci skip]

* grammarly + fix warning of building docs

* polish docs

* trim and re-arrange batch tutorial

* go straight to the point

* minor fix for batch doc

* add shape / len in basic usage

* keep improving tutorial

* unify _to_array_with_correct_type to remove duplicate code

* delegate type convertion to Batch.__init__

* further delegate type convertion to Batch.__init__

* bugfix for setattr

* add a _parse_value function

* remove dummy function call

* polish docs

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-20 15:54:18 +08:00
youkaichao
3a08e27ed4 Standardized behavior of Batch.cat and misc code refactor (#137)
* code refactor; remove unused kwargs; add reward_normalization for dqn

* bugfix for __setitem__ with torch.Tensor; add Batch.condense

* minor fix

* support cat with empty Batch

* remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases

* support stack with empty Batch

* remove condense

* refactor code to reflect the shared / partial / reserved categories of keys

* add is_empty(recursive=False)

* doc fix

* docfix and bugfix for _is_batch_set

* add doc for key reservation

* bugfix for algebra operators

* fix cat with lens hint

* code refactor

* bugfix for storing None

* use ValueError instead of exception

* hide lens away from users

* add comment for __cat

* move the computation of the initial value of lens in cat_ itself.

* change the place of doc string

* doc fix for Batch doc string

* change recursive to recurse

* doc string fix

* minor fix for batch doc
2020-07-20 15:54:18 +08:00
Alexis DUBURCQ
09e10e384f Vector env enable select worker (#132)
* Enable selecting worker for vector env step method.

* Update collector to match new vecenv selective worker behavior.

* Bug fix.

* Fix rebase

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-20 15:54:18 +08:00
youkaichao
26fb87433d Improve collector (#125)
* remove multibuf

* reward_metric

* make fileds with empty Batch rather than None after reset

* many fixes and refactor
Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-13 17:33:01 +08:00
youkaichao
5599a6d1a6 Fix padding of inconsistent keys with Batch.stack and Batch.cat (#130)
* re-implement Batch.stack and add testcases

* add doc for Batch.stack

* reuse _create_values and refactor stack_ & cat_

* fix pep8

* fix docs

* raise exception for stacking with partial keys and axis!=0

* minor fix

* minor fix

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-13 17:33:01 +08:00
youkaichao
affeec13de Improve Batch (#128)
* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* fix docs

* fix docs

* fix docs [ci skip]

* fix docs [ci skip]

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-13 17:33:01 +08:00
youkaichao
2564e989fb Improve Batch (#126)
* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack
2020-07-13 17:33:01 +08:00
youkaichao
ff99662fe6
bugfix for update with empty buffer; remove duplicate variable _weight_sum in PrioritizedReplayBuffer (#120)
* bugfix for update with empty buffer; remove duplicate variable _weight_sum in PrioritizedReplayBuffer

* point out that ListReplayBuffer cannot be sampled

* remove useless _amortization_counter variable
2020-07-10 08:24:11 +08:00
Alexis DUBURCQ
aa3c453f42
Raise exception for Batch __getitem__. (#119)
* Raise exception for Batch __getitem__.

* Try fixing access to reserved key.

* Simpler patch.

* Add unit test to check indexing empty Batch raises an exception.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-08 22:29:37 +08:00
youkaichao
7f9a1f1328
add type check for each element rather than the first element (#112)
This PR does the following:
- improvement: dramatic reduce of the call to _is_batch_set
- bugfix: list(Batch()) fail; Batch(a=[torch.ones(3), torch.ones(3)]) fail;
- misc: add type check for each element rather than the first element; add test case; _create_value with torch.Tensor does not have np.object type;
2020-07-08 21:00:00 +08:00
youkaichao
481015932c
bugfix for hang in list(Batch()) (#117) 2020-07-08 17:09:27 +08:00
youkaichao
f5e007932f
fix Batch init for types other than number and bool (#115)
* fix Batch init for types other than number and bool

* change doc to involve bool type

* use type check

* Batch type check complete
2020-07-08 13:45:29 +08:00
youkaichao
dbbb859ec5
doc fix (#113)
* doc fix

* change line

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-08 08:30:01 +08:00
youkaichao
9c7d31e5d6
bugfix for empty_ (#114)
* bugfix for empty_

* use v.__class__(0) for scalar
2020-07-08 08:10:34 +08:00
Alexis DUBURCQ
69caf89908
Fix to_torch converters (#111)
* Fix to_torch converters.

* to_torch now convert any object Torch Tensor-compatible.

* Fix linter.

* Fix Batch to_torch to convert any Torch Tensor-compatible data.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-07 18:40:55 +08:00
youkaichao
8913bf36b1
change Batch.empty to in-place fill; add copy option for Batch construction (#110)
* in-place empty_ for Batch

* change Batch.empty to in-place fill; add copy option for Batch construction

* type signiture & remove shadow names for copy

* add doc for data type (only support numbers and object data type)

* add unit test for Batch copy

* fix pep8

* add test case for Batch.empty

* doc fix

* fix pep8

* use object to test Batch

* test commit

* refact

* change Batch(copy) testcase

* minor fix

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-06 20:30:15 +08:00
youkaichao
5b1373924e
doc fix; policy train/eval signiture fix (#109)
* doc fix; policy train/eval signiture fix

* change train/eval behavior according to pytorch

* change train/eval behavior according to pytorch
2020-07-06 10:44:34 +08:00
n+e
db0e2e5cd2
Advanced Batch slicing & minor fix of RNN support (#106)
* add shape property and modify __getitem__

* change Batch.size to Batch.shape

* setattr

* Batch.empty

* remove scalar in advanced slicing

* modify empty_ and __getitem__

* missing testcase

* fix empty
2020-06-30 18:02:44 +08:00
Trinkle23897
e0f4862d01 store RNN hidden states in policy._state and add sample_avail in buffer (#19) 2020-06-29 12:18:52 +08:00
Alexis DUBURCQ
a951a32487
Enable partial stacking at Batch level (#100)
* Enable stacking of partially matching Batch instances.

* Fix list support for getitem.

* Fix Batch 'size' method.

* Update Batch documentation.
2020-06-27 09:06:40 +08:00
Alexis DUBURCQ
70aa7bf93e
Use lower-level API to reduce overhead. (#97)
* Use lower-level API to reduce overhead.

* Further improvements.

* Buffer _add_to_buffer improvement.

* Do not use _data field to store Batch data to avoid overhead. Add back _meta field in Buffer.

* Restore metadata attribute to store batch in Buffer.

* Move out nested methods.

* Update try/catch instead of actual check to efficiency.

* Remove unsed branches for efficiency.

* Use np.array over list when possible for efficiency.

* Final performance improvement.

* Add unit tests for Batch size method.

* Add missing stack unit tests.

* Enforce Buffer initialization to zero.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-26 18:37:50 +08:00
Alexis DUBURCQ
5ac9f9b144
Do not check bounds since it is always valid when everything is fine. (#95) 2020-06-25 21:06:35 +08:00