69 Commits

Author SHA1 Message Date
n+e
454c86c469
fix venv seed, add TOC in docs, and split buffer.py into several files (#303)
Things changed in this PR:

- various docs update, add TOC
- split buffer into several files
- fix venv action_space randomness
2021-03-02 12:28:28 +08:00
ChenDRAG
150d0ec51b
Step collector implementation (#280)
This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail.

Things changed in this PR:

1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv;
2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.)
3. add policy.exploration_noise(act, batch) -> act
4. small change in BasePolicy.compute_*_returns
5. move reward_metric from collector to trainer
6. fix np.asanyarray issue (different version's numpy will result in different output)
7. flake8 maxlength=88
8. polish docs and fix test

Co-authored-by: n+e <trinkle23897@gmail.com>
2021-02-19 10:33:49 +08:00
n+e
c838f2f0e9
fix 2 bugs of batch (#284)
1. `_create_value(Batch(a={}, b=[1, 2, 3]), 10, False)`

before:
```python
TypeError: cannot concatenate with Batch() which is scalar
```
after:
```python
Batch(
    a: Batch(),
    b: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
)
```

2. creating keys in a batch's subkey, e.g. 
```python
a = Batch(info={"key1": [0, 1], "key2": [2, 3]})
a[0] = Batch(info={"key1": 2, "key3": 4})
print(a)
```
before:
```python
Batch(
    info: Batch(
              key1: array([0, 1]),
              key2: array([0, 3]),
          ),
)
```
after:
```python
ValueError: Creating keys is not supported by item assignment.
```

3. small optimization for `Batch.stack_` and `Batch.cat_`, raise ValueError when receiving invalid data format.
2021-02-02 19:28:05 +08:00
n+e
b284ace102
type check in unit test (#200)
Fix #195: Add mypy test in .github/workflows/docs_and_lint.yml.

Also remove the out-of-the-date api
2020-09-13 19:31:50 +08:00
n+e
c91def6cbc
code format and update function signatures (#213)
Cherry-pick from #200 

- update the function signature
- format code-style
- move _compile into separate functions
- fix a bug in to_torch and to_numpy (Batch)
- remove None in action_range

In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black))
2020-09-12 15:39:01 +08:00
n+e
b86d78766b
fix docs and add docstring check (#210)
- fix broken links and out-of-the-date content
- add pydocstyle and doc8 check
- remove collector.seed and collector.render
2020-09-11 07:55:37 +08:00
n+e
94bfb32cc1
optimize training procedure and improve code coverage (#189)
1. add policy.eval() in all test scripts' "watch performance"
2. remove dict return support for collector preprocess_fn
3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)`
4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184)
5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard
6. add test_returns (both GAE and nstep)
7. change the type-checking order in batch.py and converter.py in order to meet the most often case first
8. fix shape inconsistency for torch.Tensor in replay buffer
9. remove `**kwargs` in ReplayBuffer
10. remove default value in batch.split() and add merge_last argument (#185)
11. improve nstep efficiency
12. add max_batchsize in onpolicy algorithms
13. potential bugfix for subproc.wait
14. fix RecurrentActorProb
15. improve the code-coverage (from 90% to 95%) and remove the dead code
16. fix some incorrect type annotation

The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).
2020-08-27 12:15:18 +08:00
Alexis DUBURCQ
865ef6c693
Improve to_torch/to_numpy converters (#147)
* Enable converting list/tuple back and forth from/to numpy/torch.

* Add fallbacks.

* Fix PEP8

* Update unit tests.

* Type annotation. Robust dtype check.

* List of object are converted individually, as a single tensor otherwise.

* Improve robustness of _to_array_with_correct_type

* Add unit tests.

* Do not catch exception at _to_array_with_correct_type level.

* Use _parse_value

* Fix PEP8

* Fix _parse_value list output type fallback.

* Catch torch exception.

* Do not convert torch tensor during fallback.

* Improve unit tests.

* Add unit tests.

* FIx missing import

* Remove support of numpy arrays of tensors for Batch value parser.

* Forbid numpy arrays of tensors.

* Fix PEP8.

* Fix comment.

* Reduce _parse_value branch number.

* Fix None value.

* Forward error message for debugging purpose.

* Fix _is_scalar.

* More specific try/catch blocks.

* Fix exception chaining.

* Fix PEP8.

* Fix _is_scalar.

* Fix missing corner case.

* Fix PEP8.

* Allow Batch empty key.

* Fix multi-dim array datatype check.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-21 16:47:56 +08:00
youkaichao
8c32d99c65
Add multi-agent example: tic-tac-toe (#122)
* make fileds with empty Batch rather than None after reset

* dummy code

* remove dummy

* add reward_length argument for collector

* Improve Batch (#126)

* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack

* bugfix for reward_length

* add get_final_reward_fn argument to collector to deal with marl

* minor polish

* remove multibuf

* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* make fileds with empty Batch rather than None after reset

* dummy code

* remove dummy

* add reward_length argument for collector

* bugfix for reward_length

* add get_final_reward_fn argument to collector to deal with marl

* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack

* dummy code

* remove dummy

* add multi-agent example: tic-tac-toe

* move TicTacToeEnv to a separate file

* remove dummy MANet

* code refactor

* move tic-tac-toe example to test

* update doc with marl-example

* fix docs

* reduce the threshold

* revert

* update player id to start from 1 and change player to agent; keep coding

* add reward_length argument for collector

* Improve Batch (#128)

* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* fix docs

* fix docs

* fix docs [ci skip]

* fix docs [ci skip]

Co-authored-by: Trinkle23897 <463003665@qq.com>

* refact

* re-implement Batch.stack and add testcases

* add doc for Batch.stack

* reward_metric

* modify flag

* minor fix

* reuse _create_values and refactor stack_ & cat_

* fix pep8

* fix reward stat in collector

* fix stat of collector, simplify test/base/env.py

* fix docs

* minor fix

* raise exception for stacking with partial keys and axis!=0

* minor fix

* minor fix

* minor fix

* marl-examples

* add condense; bugfix for torch.Tensor; code refactor

* marl example can run now

* enable tic tac toe with larger board size and win-size

* add test dependency

* Fix padding of inconsistent keys with Batch.stack and Batch.cat (#130)

* re-implement Batch.stack and add testcases

* add doc for Batch.stack

* reuse _create_values and refactor stack_ & cat_

* fix pep8

* fix docs

* raise exception for stacking with partial keys and axis!=0

* minor fix

* minor fix

Co-authored-by: Trinkle23897 <463003665@qq.com>

* stash

* let agent learn to play as agent 2 which is harder

* code refactor

* Improve collector (#125)

* remove multibuf

* reward_metric

* make fileds with empty Batch rather than None after reset

* many fixes and refactor
Co-authored-by: Trinkle23897 <463003665@qq.com>

* marl for tic-tac-toe and general gomoku

* update default gamma to 0.1 for tic tac toe to win earlier

* fix name typo; change default game config; add rew_norm option

* fix pep8

* test commit

* mv test dir name

* add rew flag

* fix torch.optim import error and madqn rew_norm

* remove useless kwargs

* Vector env enable select worker (#132)

* Enable selecting worker for vector env step method.

* Update collector to match new vecenv selective worker behavior.

* Bug fix.

* Fix rebase

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>

* show the last move of tictactoe by capital letters

* add multi-agent tutorial

* fix link

* Standardized behavior of Batch.cat and misc code refactor (#137)

* code refactor; remove unused kwargs; add reward_normalization for dqn

* bugfix for __setitem__ with torch.Tensor; add Batch.condense

* minor fix

* support cat with empty Batch

* remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases

* support stack with empty Batch

* remove condense

* refactor code to reflect the shared / partial / reserved categories of keys

* add is_empty(recursive=False)

* doc fix

* docfix and bugfix for _is_batch_set

* add doc for key reservation

* bugfix for algebra operators

* fix cat with lens hint

* code refactor

* bugfix for storing None

* use ValueError instead of exception

* hide lens away from users

* add comment for __cat

* move the computation of the initial value of lens in cat_ itself.

* change the place of doc string

* doc fix for Batch doc string

* change recursive to recurse

* doc string fix

* minor fix for batch doc

* write tutorials to specify the standard of Batch (#142)

* add doc for len exceptions

* doc move; unify is_scalar_value function

* remove some issubclass check

* bugfix for shape of Batch(a=1)

* keep moving doc

* keep writing batch tutorial

* draft version of Batch tutorial done

* improving doc

* keep improving doc

* batch tutorial done

* rename _is_number

* rename _is_scalar

* shape property do not raise exception

* restore some doc string

* grammarly [ci skip]

* grammarly + fix warning of building docs

* polish docs

* trim and re-arrange batch tutorial

* go straight to the point

* minor fix for batch doc

* add shape / len in basic usage

* keep improving tutorial

* unify _to_array_with_correct_type to remove duplicate code

* delegate type convertion to Batch.__init__

* further delegate type convertion to Batch.__init__

* bugfix for setattr

* add a _parse_value function

* remove dummy function call

* polish docs

Co-authored-by: Trinkle23897 <463003665@qq.com>

* bugfix for mapolicy

* pretty code

* remove debug code; remove condense

* doc fix

* check before get_agents in tutorials/tictactoe

* tutorial

* fix

* minor fix for batch doc

* minor polish

* faster test_ttt

* improve tic-tac-toe environment

* change default epoch and step-per-epoch for tic-tac-toe

* fix mapolicy

* minor polish for mapolicy

* 90% to 80% (need to change the tutorial)

* win rate

* show step number at board

* simplify mapolicy

* minor polish for mapolicy

* remove MADQN

* fix pep8

* change legal_actions to mask (need to update docs)

* simplify maenv

* fix typo

* move basevecenv to single file

* separate RandomAgent

* update docs

* grammarly

* fix pep8

* win rate typo

* format in cheatsheet

* use bool mask directly

* update doc for boolean mask

Co-authored-by: Trinkle23897 <463003665@qq.com>
Co-authored-by: Alexis DUBURCQ <alexis.duburcq@gmail.com>
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-21 14:59:49 +08:00
youkaichao
fe5555d2a1 write tutorials to specify the standard of Batch (#142)
* add doc for len exceptions

* doc move; unify is_scalar_value function

* remove some issubclass check

* bugfix for shape of Batch(a=1)

* keep moving doc

* keep writing batch tutorial

* draft version of Batch tutorial done

* improving doc

* keep improving doc

* batch tutorial done

* rename _is_number

* rename _is_scalar

* shape property do not raise exception

* restore some doc string

* grammarly [ci skip]

* grammarly + fix warning of building docs

* polish docs

* trim and re-arrange batch tutorial

* go straight to the point

* minor fix for batch doc

* add shape / len in basic usage

* keep improving tutorial

* unify _to_array_with_correct_type to remove duplicate code

* delegate type convertion to Batch.__init__

* further delegate type convertion to Batch.__init__

* bugfix for setattr

* add a _parse_value function

* remove dummy function call

* polish docs

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-20 15:54:18 +08:00
youkaichao
3a08e27ed4 Standardized behavior of Batch.cat and misc code refactor (#137)
* code refactor; remove unused kwargs; add reward_normalization for dqn

* bugfix for __setitem__ with torch.Tensor; add Batch.condense

* minor fix

* support cat with empty Batch

* remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases

* support stack with empty Batch

* remove condense

* refactor code to reflect the shared / partial / reserved categories of keys

* add is_empty(recursive=False)

* doc fix

* docfix and bugfix for _is_batch_set

* add doc for key reservation

* bugfix for algebra operators

* fix cat with lens hint

* code refactor

* bugfix for storing None

* use ValueError instead of exception

* hide lens away from users

* add comment for __cat

* move the computation of the initial value of lens in cat_ itself.

* change the place of doc string

* doc fix for Batch doc string

* change recursive to recurse

* doc string fix

* minor fix for batch doc
2020-07-20 15:54:18 +08:00
Alexis DUBURCQ
09e10e384f Vector env enable select worker (#132)
* Enable selecting worker for vector env step method.

* Update collector to match new vecenv selective worker behavior.

* Bug fix.

* Fix rebase

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-20 15:54:18 +08:00
youkaichao
5599a6d1a6 Fix padding of inconsistent keys with Batch.stack and Batch.cat (#130)
* re-implement Batch.stack and add testcases

* add doc for Batch.stack

* reuse _create_values and refactor stack_ & cat_

* fix pep8

* fix docs

* raise exception for stacking with partial keys and axis!=0

* minor fix

* minor fix

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-13 17:33:01 +08:00
youkaichao
affeec13de Improve Batch (#128)
* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* fix docs

* fix docs

* fix docs [ci skip]

* fix docs [ci skip]

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-13 17:33:01 +08:00
youkaichao
2564e989fb Improve Batch (#126)
* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack
2020-07-13 17:33:01 +08:00
Alexis DUBURCQ
aa3c453f42
Raise exception for Batch __getitem__. (#119)
* Raise exception for Batch __getitem__.

* Try fixing access to reserved key.

* Simpler patch.

* Add unit test to check indexing empty Batch raises an exception.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-08 22:29:37 +08:00
youkaichao
7f9a1f1328
add type check for each element rather than the first element (#112)
This PR does the following:
- improvement: dramatic reduce of the call to _is_batch_set
- bugfix: list(Batch()) fail; Batch(a=[torch.ones(3), torch.ones(3)]) fail;
- misc: add type check for each element rather than the first element; add test case; _create_value with torch.Tensor does not have np.object type;
2020-07-08 21:00:00 +08:00
youkaichao
481015932c
bugfix for hang in list(Batch()) (#117) 2020-07-08 17:09:27 +08:00
youkaichao
f5e007932f
fix Batch init for types other than number and bool (#115)
* fix Batch init for types other than number and bool

* change doc to involve bool type

* use type check

* Batch type check complete
2020-07-08 13:45:29 +08:00
youkaichao
dbbb859ec5
doc fix (#113)
* doc fix

* change line

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-08 08:30:01 +08:00
youkaichao
9c7d31e5d6
bugfix for empty_ (#114)
* bugfix for empty_

* use v.__class__(0) for scalar
2020-07-08 08:10:34 +08:00
Alexis DUBURCQ
69caf89908
Fix to_torch converters (#111)
* Fix to_torch converters.

* to_torch now convert any object Torch Tensor-compatible.

* Fix linter.

* Fix Batch to_torch to convert any Torch Tensor-compatible data.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-07 18:40:55 +08:00
youkaichao
8913bf36b1
change Batch.empty to in-place fill; add copy option for Batch construction (#110)
* in-place empty_ for Batch

* change Batch.empty to in-place fill; add copy option for Batch construction

* type signiture & remove shadow names for copy

* add doc for data type (only support numbers and object data type)

* add unit test for Batch copy

* fix pep8

* add test case for Batch.empty

* doc fix

* fix pep8

* use object to test Batch

* test commit

* refact

* change Batch(copy) testcase

* minor fix

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-06 20:30:15 +08:00
n+e
db0e2e5cd2
Advanced Batch slicing & minor fix of RNN support (#106)
* add shape property and modify __getitem__

* change Batch.size to Batch.shape

* setattr

* Batch.empty

* remove scalar in advanced slicing

* modify empty_ and __getitem__

* missing testcase

* fix empty
2020-06-30 18:02:44 +08:00
Trinkle23897
e0f4862d01 store RNN hidden states in policy._state and add sample_avail in buffer (#19) 2020-06-29 12:18:52 +08:00
Alexis DUBURCQ
a951a32487
Enable partial stacking at Batch level (#100)
* Enable stacking of partially matching Batch instances.

* Fix list support for getitem.

* Fix Batch 'size' method.

* Update Batch documentation.
2020-06-27 09:06:40 +08:00
Alexis DUBURCQ
70aa7bf93e
Use lower-level API to reduce overhead. (#97)
* Use lower-level API to reduce overhead.

* Further improvements.

* Buffer _add_to_buffer improvement.

* Do not use _data field to store Batch data to avoid overhead. Add back _meta field in Buffer.

* Restore metadata attribute to store batch in Buffer.

* Move out nested methods.

* Update try/catch instead of actual check to efficiency.

* Remove unsed branches for efficiency.

* Use np.array over list when possible for efficiency.

* Final performance improvement.

* Add unit tests for Batch size method.

* Add missing stack unit tests.

* Enforce Buffer initialization to zero.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-26 18:37:50 +08:00
Alexis DUBURCQ
5ac9f9b144
Do not check bounds since it is always valid when everything is fine. (#95) 2020-06-25 21:06:35 +08:00
Alexis DUBURCQ
3086b5c31d
Buffer refactoring to support batch over batch reliably (#93)
* Fix support of batch over batch for Buffer.

* Do not use internal __dict__ attribute to store batch data since it breaks inheritance.

* Various fixes.

* Improve robustness of Batch/Buffer by avoiding direct attribute assignment. Buffer refactoring.

* Add axis optional argument to Batch stack method.

* Add item assignment to Batch class.

* Fix list support for Buffer.

* Convert list to np.array by default for efficiency.

* Add missing unit test for Batch. Fix unit tests.

* Batch item assignment is now robust to key order.

* Do not use getattr/setattr explicity for simplicity.

* More flexible __setitem__.

* Fixes

* Remove broacasting at Batch level since it is unreliable.

* Forbid item assignement for inconsistent batches.

* Implement broadcasting at Buffer level.

* Add more unit test for Batch item assignment.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-25 20:39:30 +08:00
Alexis DUBURCQ
49f43e9f1f
Fix Batch to numpy compatibility (#92)
* Fix Batch to numpy compatibility.

* Fix Batch unit tests.

* Fix linter

* Add Batch shape method.

* Remove shape and add size. Enable to reserve keys using empty batch/list.

* Fix linter and unit tests.

* Batch init using list of Batch.

* Add unit tests.

* Fix Batch __len__.

* Fix unit tests.

* Fix slicing

* Add missing slicing unit tests.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-24 21:43:48 +08:00
Alexis DUBURCQ
ebc551a25e
Fix support of 0-dim numpy array (#89)
* Fix support of 0-dim numpy array.

* Do not raise exception if Batch index does not make sense since it breaks existing code.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-24 06:55:24 +08:00
Alexis DUBURCQ
d7dd3105bc
Fix tuple support. (#88) 2020-06-23 23:37:26 +08:00
Alexis DUBURCQ
ec270759ab
Batch refactoring (#87)
* Enable to stack Batch instances. Add Batch cat static method. Rename cat in cat_ since inplace.

* Properly handle Batch init using np.array of dict.

* WIP

* Get rid of metadata.

* Update UT. Replace cat by cat_ everywhere.

* Do not sort Batch keys anymore for efficiency. Add items method.

* Fix cat copy issue.

* Add unit test to chack cat and stack methods.

* Remove used import.

* Fix linter issues.

* Fix unit tests.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-23 22:50:59 +08:00
Trinkle23897
a655334d00 change batch.append to batch.cat 2020-06-20 22:23:12 +08:00
Trinkle23897
aff0f9aee0 fix append batch over batch 2020-06-20 22:03:22 +08:00
Alexis DUBURCQ
66be5641b6
Fix to_numpy. (#73)
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-04 22:32:05 +08:00
Trinkle23897
ba1b3e54eb fix #69 2020-06-01 08:30:09 +08:00
Alexis DUBURCQ
1fce527c77
Fix 'to_tensor' dtype/device forwarding for Batch over Batch. (#68)
* Fix Batch to_torch method not updating dtype/device of already converted data.

* Fix dtype/device to forwarded by to_tensor for Batch over Batch.

* Add Unit test to check to_torch dtype/device recursive forwarding.

* Batch UT check accessing data using both dict and class style.

* Fix utils to_tensor dtype/device forwarding. Add Unit tests.

* Fix UT.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
Co-authored-by: n+e <463003665@qq.com>
2020-05-30 21:40:31 +08:00
Alexis DUBURCQ
529a4cf44c
Add pickle support for Batch. Fix VectorEnv. (#67)
* Fix vecenv.

* Add pickle support for Batch class.

* Add Batch pickle Unit Test.

* Fix lint.

* Swap Batch UT.

* Fix lint.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-30 21:29:33 +08:00
Alexis DUBURCQ
8af7196a9a
Robust conversion from/to numpy/pytorch (#63)
* Enable to convert Batch data back to torch.

* Add torch converter to collector.

* Fix

* Move to_numpy/to_torch convert in dedicated utils.py.

* Use to_numpy/to_torch to convert arrays.

* fix lint

* fix

* Add unit test to check Batch from/to numpy.

* Fix Batch over Batch.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-29 20:45:21 +08:00
Alexis DUBURCQ
b5093ecb56
Minor refactor for Batch class. (#61)
* Minor refactor for Batch class.

* Fix.

* Add back key sorting.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-29 17:56:46 +08:00
Trinkle23897
d2b2fa87c0 fix #56 2020-05-29 08:03:37 +08:00
Trinkle23897
de556fd22d item3 of #51 2020-05-27 11:02:23 +08:00
Trinkle23897
0eef0ca198 fix optional type syntax 2020-05-16 20:08:32 +08:00
Trinkle23897
9b26137cd2 add type annotation 2020-05-12 11:31:47 +08:00
Trinkle23897
075825325e add preprocess_fn (#42) 2020-05-05 13:39:51 +08:00
Trinkle23897
134f787e24 reserve 'policy' keyword in replay buffer 2020-04-29 17:48:48 +08:00
Trinkle23897
bb2f833d0e support Batch of Batch and fix bugs (#38) 2020-04-29 12:14:53 +08:00
Trinkle23897
80d661907e Multimodal obs (#38, #27, #25) 2020-04-28 20:56:02 +08:00
Trinkle23897
6bf1ea644d fix ppo 2020-04-19 14:30:42 +08:00