I work in TensorFlow2 Framework
- Snake_v1
- Observation space: (RGB array or Extracted features)
- Action space: Discrete(3), [Go Straight / Turn Left / Turn Right]
- Reward scheme: (Dense or Sparse)
- Snake_v2
- Observation space: (RGB array)
- Action space: Box(2), [Speed(0.0 ~ 1.0), Angle(-1.0 ~ 1.0)]
- Reward scheme: (Dense or Sparse)
- Randomly
- TODOs
- PPO (Proximal Policy Optimization)
- DDPG (Deep Deterministic Policy Gradient)
- A3C (Asynchronous Advantage Actor Critic)
- DQN (Deep Q-Network)
- TODOs
- BC (Behavior Cloning)
- GAIL (Generative Adversarial Imitation Learning)
- VAIL (Variational GAIL)
- DI-GAIL (Directed-Info GAIL)