Yi Su
							
						 
					 | 
					
						
						
							
							
						
						
						
							
						
						
							40289b8b0e
							
						
					 | 
					
						
						
							
							Add atari ppo example (#523)
						
						
						
						
						
						
						
						I needed a policy gradient baseline myself and it has been requested several times (#497, #374, #440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters.
Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in https://github.com/DLR-RM/rl-baselines3-zoo/issues/156. 
						
						
					 | 
					
						2022-02-11 06:45:06 +08:00 | 
					
					
						
						
							
							
							
						
					 |