86 Commits

Author SHA1 Message Date
youkaichao
affeec13de Improve Batch (#128)
* minor polish

* improve and implement Batch.cat_

* bugfix for buffer.sample with field impt_weight

* restore the usage of a.cat_(b)

* fix 2 bugs in batch and add corresponding unittest

* code fix for update

* update is_empty to recognize empty over empty; bugfix for len

* bugfix for update and add testcase

* add testcase of update

* fix docs

* fix docs

* fix docs [ci skip]

* fix docs [ci skip]

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-13 17:33:01 +08:00
youkaichao
2564e989fb Improve Batch (#126)
* make sure the key type of Batch is string, and add unit tests

* add is_empty() function and unit tests

* enable cat of mixing dict and Batch, just like stack
2020-07-13 17:33:01 +08:00
youkaichao
ff99662fe6
bugfix for update with empty buffer; remove duplicate variable _weight_sum in PrioritizedReplayBuffer (#120)
* bugfix for update with empty buffer; remove duplicate variable _weight_sum in PrioritizedReplayBuffer

* point out that ListReplayBuffer cannot be sampled

* remove useless _amortization_counter variable
2020-07-10 08:24:11 +08:00
Alexis DUBURCQ
aa3c453f42
Raise exception for Batch __getitem__. (#119)
* Raise exception for Batch __getitem__.

* Try fixing access to reserved key.

* Simpler patch.

* Add unit test to check indexing empty Batch raises an exception.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-08 22:29:37 +08:00
youkaichao
7f9a1f1328
add type check for each element rather than the first element (#112)
This PR does the following:
- improvement: dramatic reduce of the call to _is_batch_set
- bugfix: list(Batch()) fail; Batch(a=[torch.ones(3), torch.ones(3)]) fail;
- misc: add type check for each element rather than the first element; add test case; _create_value with torch.Tensor does not have np.object type;
2020-07-08 21:00:00 +08:00
youkaichao
481015932c
bugfix for hang in list(Batch()) (#117) 2020-07-08 17:09:27 +08:00
youkaichao
f5e007932f
fix Batch init for types other than number and bool (#115)
* fix Batch init for types other than number and bool

* change doc to involve bool type

* use type check

* Batch type check complete
2020-07-08 13:45:29 +08:00
youkaichao
dbbb859ec5
doc fix (#113)
* doc fix

* change line

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-08 08:30:01 +08:00
youkaichao
9c7d31e5d6
bugfix for empty_ (#114)
* bugfix for empty_

* use v.__class__(0) for scalar
2020-07-08 08:10:34 +08:00
Alexis DUBURCQ
69caf89908
Fix to_torch converters (#111)
* Fix to_torch converters.

* to_torch now convert any object Torch Tensor-compatible.

* Fix linter.

* Fix Batch to_torch to convert any Torch Tensor-compatible data.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-07-07 18:40:55 +08:00
youkaichao
8913bf36b1
change Batch.empty to in-place fill; add copy option for Batch construction (#110)
* in-place empty_ for Batch

* change Batch.empty to in-place fill; add copy option for Batch construction

* type signiture & remove shadow names for copy

* add doc for data type (only support numbers and object data type)

* add unit test for Batch copy

* fix pep8

* add test case for Batch.empty

* doc fix

* fix pep8

* use object to test Batch

* test commit

* refact

* change Batch(copy) testcase

* minor fix

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-07-06 20:30:15 +08:00
youkaichao
5b1373924e
doc fix; policy train/eval signiture fix (#109)
* doc fix; policy train/eval signiture fix

* change train/eval behavior according to pytorch

* change train/eval behavior according to pytorch
2020-07-06 10:44:34 +08:00
n+e
db0e2e5cd2
Advanced Batch slicing & minor fix of RNN support (#106)
* add shape property and modify __getitem__

* change Batch.size to Batch.shape

* setattr

* Batch.empty

* remove scalar in advanced slicing

* modify empty_ and __getitem__

* missing testcase

* fix empty
2020-06-30 18:02:44 +08:00
Trinkle23897
e0f4862d01 store RNN hidden states in policy._state and add sample_avail in buffer (#19) 2020-06-29 12:18:52 +08:00
Alexis DUBURCQ
a951a32487
Enable partial stacking at Batch level (#100)
* Enable stacking of partially matching Batch instances.

* Fix list support for getitem.

* Fix Batch 'size' method.

* Update Batch documentation.
2020-06-27 09:06:40 +08:00
Alexis DUBURCQ
70aa7bf93e
Use lower-level API to reduce overhead. (#97)
* Use lower-level API to reduce overhead.

* Further improvements.

* Buffer _add_to_buffer improvement.

* Do not use _data field to store Batch data to avoid overhead. Add back _meta field in Buffer.

* Restore metadata attribute to store batch in Buffer.

* Move out nested methods.

* Update try/catch instead of actual check to efficiency.

* Remove unsed branches for efficiency.

* Use np.array over list when possible for efficiency.

* Final performance improvement.

* Add unit tests for Batch size method.

* Add missing stack unit tests.

* Enforce Buffer initialization to zero.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-26 18:37:50 +08:00
Alexis DUBURCQ
5ac9f9b144
Do not check bounds since it is always valid when everything is fine. (#95) 2020-06-25 21:06:35 +08:00
Alexis DUBURCQ
3086b5c31d
Buffer refactoring to support batch over batch reliably (#93)
* Fix support of batch over batch for Buffer.

* Do not use internal __dict__ attribute to store batch data since it breaks inheritance.

* Various fixes.

* Improve robustness of Batch/Buffer by avoiding direct attribute assignment. Buffer refactoring.

* Add axis optional argument to Batch stack method.

* Add item assignment to Batch class.

* Fix list support for Buffer.

* Convert list to np.array by default for efficiency.

* Add missing unit test for Batch. Fix unit tests.

* Batch item assignment is now robust to key order.

* Do not use getattr/setattr explicity for simplicity.

* More flexible __setitem__.

* Fixes

* Remove broacasting at Batch level since it is unreliable.

* Forbid item assignement for inconsistent batches.

* Implement broadcasting at Buffer level.

* Add more unit test for Batch item assignment.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-25 20:39:30 +08:00
rocknamx
506cc97ba5
fix #91 (#94) 2020-06-25 07:02:59 +08:00
Alexis DUBURCQ
49f43e9f1f
Fix Batch to numpy compatibility (#92)
* Fix Batch to numpy compatibility.

* Fix Batch unit tests.

* Fix linter

* Add Batch shape method.

* Remove shape and add size. Enable to reserve keys using empty batch/list.

* Fix linter and unit tests.

* Batch init using list of Batch.

* Add unit tests.

* Fix Batch __len__.

* Fix unit tests.

* Fix slicing

* Add missing slicing unit tests.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-24 21:43:48 +08:00
Alexis DUBURCQ
ebc551a25e
Fix support of 0-dim numpy array (#89)
* Fix support of 0-dim numpy array.

* Do not raise exception if Batch index does not make sense since it breaks existing code.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-24 06:55:24 +08:00
Alexis DUBURCQ
d7dd3105bc
Fix tuple support. (#88) 2020-06-23 23:37:26 +08:00
Alexis DUBURCQ
ec270759ab
Batch refactoring (#87)
* Enable to stack Batch instances. Add Batch cat static method. Rename cat in cat_ since inplace.

* Properly handle Batch init using np.array of dict.

* WIP

* Get rid of metadata.

* Update UT. Replace cat by cat_ everywhere.

* Do not sort Batch keys anymore for efficiency. Add items method.

* Fix cat copy issue.

* Add unit test to chack cat and stack methods.

* Remove used import.

* Fix linter issues.

* Fix unit tests.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-23 22:50:59 +08:00
danagi
13828f6309
added noise param to collector for test phase, fixed examples to adapt modification (#86)
* Add auto alpha tuning and exploration noise for sac.
Add class BaseNoise and GaussianNoise for the concept of exploration noise.
Add new test for sac tested in MountainCarContinuous-v0,
which should benefits from the two above new feature.

* add exploration noise to collector, fix example to adapt modification
2020-06-23 07:20:51 +08:00
Trinkle23897
a655334d00 change batch.append to batch.cat 2020-06-20 22:23:12 +08:00
Trinkle23897
aff0f9aee0 fix append batch over batch 2020-06-20 22:03:22 +08:00
Trinkle23897
81e4a16ef2 fix a bug in re-index replay buffer (fix #82) 2020-06-17 16:37:51 +08:00
Trinkle23897
1a914336f7 add random action in collector (fix #78) 2020-06-11 08:57:37 +08:00
Trinkle23897
f1951780ab fix a bug of storing batch over batch data into buffer 2020-06-09 18:46:14 +08:00
Trinkle23897
560116d0b2 cheat sheet 2020-06-08 21:53:00 +08:00
Alexis DUBURCQ
66be5641b6
Fix to_numpy. (#73)
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-06-04 22:32:05 +08:00
Trinkle23897
dc451dfe88 nstep all (fix #51) 2020-06-03 13:59:47 +08:00
Trinkle23897
ba1b3e54eb fix #69 2020-06-01 08:30:09 +08:00
Alexis DUBURCQ
1fce527c77
Fix 'to_tensor' dtype/device forwarding for Batch over Batch. (#68)
* Fix Batch to_torch method not updating dtype/device of already converted data.

* Fix dtype/device to forwarded by to_tensor for Batch over Batch.

* Add Unit test to check to_torch dtype/device recursive forwarding.

* Batch UT check accessing data using both dict and class style.

* Fix utils to_tensor dtype/device forwarding. Add Unit tests.

* Fix UT.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
Co-authored-by: n+e <463003665@qq.com>
2020-05-30 21:40:31 +08:00
Alexis DUBURCQ
529a4cf44c
Add pickle support for Batch. Fix VectorEnv. (#67)
* Fix vecenv.

* Add pickle support for Batch class.

* Add Batch pickle Unit Test.

* Fix lint.

* Swap Batch UT.

* Fix lint.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-30 21:29:33 +08:00
Alexis DUBURCQ
dd3e2130bb
Infer the right dtype for replay buffers. (#64) 2020-05-29 22:27:03 +08:00
Alexis DUBURCQ
8af7196a9a
Robust conversion from/to numpy/pytorch (#63)
* Enable to convert Batch data back to torch.

* Add torch converter to collector.

* Fix

* Move to_numpy/to_torch convert in dedicated utils.py.

* Use to_numpy/to_torch to convert arrays.

* fix lint

* fix

* Add unit test to check Batch from/to numpy.

* Fix Batch over Batch.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-29 20:45:21 +08:00
Alexis DUBURCQ
b5093ecb56
Minor refactor for Batch class. (#61)
* Minor refactor for Batch class.

* Fix.

* Add back key sorting.

Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>
2020-05-29 17:56:46 +08:00
Trinkle23897
be9ce44290 fix #59 2020-05-29 11:49:47 +08:00
Trinkle23897
d2b2fa87c0 fix #56 2020-05-29 08:03:37 +08:00
Trinkle23897
de556fd22d item3 of #51 2020-05-27 11:02:23 +08:00
Trinkle23897
0eef0ca198 fix optional type syntax 2020-05-16 20:08:32 +08:00
Trinkle23897
9b26137cd2 add type annotation 2020-05-12 11:31:47 +08:00
Trinkle23897
075825325e add preprocess_fn (#42) 2020-05-05 13:39:51 +08:00
Trinkle23897
c2a7caf806 add recurrent actor and critic 2020-04-30 16:31:40 +08:00
Trinkle23897
134f787e24 reserve 'policy' keyword in replay buffer 2020-04-29 17:48:48 +08:00
Trinkle23897
bb2f833d0e support Batch of Batch and fix bugs (#38) 2020-04-29 12:14:53 +08:00
Trinkle23897
80d661907e Multimodal obs (#38, #27, #25) 2020-04-28 20:56:02 +08:00
rocknamx
b23749463e
Prioritized DQN (#30)
* add sum_tree.py

* add prioritized replay buffer

* del sum_tree.py

* fix some format issues

* fix weight_update bug

* simply replace replaybuffer in test_dqn without weight update

* weight default set to 1

* fix sampling bug when buffer is not full

* rename parameter

* fix formula error, add accuracy check

* add PrioritizedDQN test

* add test_pdqn.py

* add update_weight() doc

* add ref of prio dqn in readme.md and index.rst

* restore test_dqn.py, fix args of test_pdqn.py
2020-04-26 12:05:58 +08:00
Trinkle23897
4fd826761c enable null buffer in test collector 2020-04-20 11:50:18 +08:00