Fixes a small bug with using np.inf instead of torch-based infinity
Closes#963
---------
Co-authored-by: ivan.rodriguez <ivan.rodriguez@unternehmertum.de>
This PR adds a new method for getting actions from an env's observation
and info. This is useful for standard inference and stands in contrast
to batch-based methods that are currently used in training and
evaluation. Without this, users have to do some kind of gymnastics to
actually perform inference with a trained policy. I have also added a
test for the new method.
In future PRs, this method should be included in the examples (in the
the "watch" section).
To add this required improving multiple typing things and, importantly,
_simplifying the signature of `forward` in many policies!_ This is a
**breaking change**, but it will likely affect no users. The `input`
parameter of forward was a rather hacky mechanism, I believe it is good
that it's gone now. It will also help with #948 .
The main functional change is the addition of `compute_action` to
`BasePolicy`.
Other minor changes:
- improvements in typing
- updated PR and Issue templates
- Improved handling of `max_action_num`
Closes#981
- [X] I have marked all applicable categories:
+ [X] exception-raising fix
+ [ ] algorithm implementation fix
+ [ ] documentation modification
+ [ ] new feature
- [X] I have reformatted the code using `make format` (**required**)
- [X] I have checked the code using `make commit-checks` (**required**)
- [ ] If applicable, I have mentioned the relevant/related issue(s)
- [ ] If applicable, I have listed every items in this Pull Request
below
The cause was the use of a lambda function in the state of a generated
object.
which stores the entire policy (new default), supporting applications
where it is desired to be bale to load the policy without having
to instantiate an environment or recreate a corresponding policy
object
(should be dev dependency only) by introducing a new
place where jsonargparse can be configured:
logging.run_cli, which is also slightly more convenient
of number of environments in SamplingConfig is used
(values are now passed to factory method)
This is clearer and removes the need to pass otherwise
unnecessary configuration to environment factories at
construction
* Add persistence/restoration of Experiment instance
* Add file logging in experiment
* Allow all persistence/logging to be disabled
* Disable persistence in tests
* Add example atari_iqn_hl
* Factor out trainer callbacks to new module atari_callbacks
* Extract base class for DQN-based agent factories
* Improved module factory interface design, achieving higher generality