Deep RL on Google Football

Vishal Bidawatka

Vishal Bidawatka

Hyderabad, Telangana

8 0
  • 0 Collaborators

We set up google football environment for testing A3C RL algorithm for the same. For the implementation of RL algorithms, we have used ChainerRL library given it contains an optimized version of A3C. We have used one file namely a3c.py which contains the code for training the agent. ...learn more

Project status: Concept

Artificial Intelligence

Intel Technologies
AI DevCloud / Xeon

Code Samples [1]

Overview / Usage

A3C has the advantage of multiple workers with each separately working on its own environment and taking actions which are completely independent of each other. The following steps shall summarize its working in easy and laymen terms and bring out its advantages over other RL algorithms-

  1. Each worker works independently in their own environment.
  2. More exploration takes place with each worker running in parallel space.
  3. Each worker at the end of each episode gives out tuple of information containing- [current state, next state, the action is taken, reward obtained, Done( Boolean value telling whether the episode ended or not].
  4. These tuples from each worker are couples together in a global buffer
  5. The global agent then trains on this global buffer and saves its weights.
  6. The workers then load on the saved weights of the global agent.
  7. The workers then take actions based on the trained weights of the Global Agent.
  8. The same steps repeat till the global agent converges.
  9. Faster training since workers running in parallel.

Advantage function-

Q values can be broken down into two segments -

  1. The State Value function V(s)
  2. The Advantage value A(s, a)

Advantage functions can be derived as follows-

Q(s, a)= V(s)+ A(s,a)
A(s,a) =Q(s,a) -V(s)
A(s,a)= r+ γV(s_cap) -V(s)

Advantage function actually helps us better depict how an action is compared to the others at a given state while the value function captures how good it is to be at this state.

Methodology / Approach

For the implementation of RL algorithms, we have used ChainerRL library given it contains an optimized version of A3C. We have used one file namely a3c.py which contains the code for training the agent. We trained our agent on Intel AI Dev cloud resources.

Technologies Used

ChainerRL

Intel AI Dev Cloud

Python

Repository

https://github.com/Ujwal2910/Deep-RL-on-Gfootabll-Google-football-OpenAI-style-environment

Comments (0)