<img width="800px" src="../fidle/img/header.svg"></img>

# <!-- TITLE --> [DRL2] - RL Baselines3 Zoo: Training in Colab
<!-- DESC --> Demo of Stable baseline3 with Colab
<!-- AUTHOR : Nathan Cassereau (IDRIS) and Bertrand Cabot (IDRIS) -->


Demo of Stable baseline3 adapted By Nathan Cassereau (IDRIS) and Bertrand Cabot (IDRIS)


Github Repo: [https://github.com/DLR-RM/rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

Stable-Baselines3 Repo: [https://github.com/DLR-RM/rl-baselines3-zoo](https://github.com/DLR-RM/stable-baselines3)


# Install Dependencies



In [None]:
!apt-get install swig cmake ffmpeg freeglut3-dev xvfb

In [None]:
!apt-get install -y \
    libgl1-mesa-dev \
    libgl1-mesa-glx \
    libglew-dev \
    libosmesa6-dev \
    software-properties-common

!apt-get install -y patchelf

## Clone RL Baselines3 Zoo Repo

In [None]:
!git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo

In [None]:
%cd /content/rl-baselines3-zoo/

### Install pip dependencies

In [None]:
!pip install -r requirements.txt

In [None]:
!pip install free-mujoco-py

## Pretrained model

gym environments: https://gym.openai.com/envs/

In [None]:
%cd /content/rl-baselines3-zoo/

### Record  a Video

In [None]:
# Set up display; otherwise rendering will fail
import os
os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

In [None]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay

def show_videos(video_path='', prefix=''):
  """
  Taken from https://github.com/eleurent/highway-env

  :param video_path: (str) Path to the folder containing videos
  :param prefix: (str) Filter the video, showing only the only starting with this prefix
  """
  html = []
  for mp4 in Path(video_path).glob("**/*{}*.mp4".format(prefix)):
      video_b64 = base64.b64encode(mp4.read_bytes())
      html.append('''{} <br> <video alt="{}" autoplay 
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>'''.format(mp4, mp4, video_b64.decode('ascii')))
  ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

### Discrete environments

In [None]:
%run scripts/all_plots.py -a dqn qrdqn a2c ppo --env PongNoFrameskip-v4 -f rl-trained-agents/

In [None]:
%run scripts/plot_train.py -a dqn -e PongNoFrameskip-v4 -f rl-trained-agents/ -x time

In [None]:
%run scripts/plot_train.py -a qrdqn -e PongNoFrameskip-v4 -f rl-trained-agents/ -x time

In [None]:
%run scripts/plot_train.py -a a2c -e PongNoFrameskip-v4 -f rl-trained-agents/ -x time

In [None]:
%run scripts/plot_train.py -a ppo -e PongNoFrameskip-v4 -f rl-trained-agents/ -x time

In [None]:
!python enjoy.py --algo dqn --env PongNoFrameskip-v4 --no-render --n-timesteps 5000

In [None]:
!python -m utils.record_video --algo dqn --env PongNoFrameskip-v4

In [None]:
show_videos(video_path='rl-trained-agents/dqn', prefix='PongNoFrameskip-v4')

### Continuous environments

In [None]:
%run scripts/all_plots.py -a ppo trpo sac td3 tqc --env Ant-v3 -f rl-trained-agents/

In [None]:
%run scripts/plot_train.py -a ppo -e Ant-v3 -f rl-trained-agents/ -x time

In [None]:
%run scripts/plot_train.py -a trpo -e Ant-v3 -f rl-trained-agents/ -x time

In [None]:
%run scripts/plot_train.py -a tqc -e Ant-v3 -f rl-trained-agents/ -x time

In [None]:
%run scripts/plot_train.py -a td3 -e Ant-v3 -f rl-trained-agents/ -x time

In [None]:
%run scripts/plot_train.py -a sac -e Ant-v3 -f rl-trained-agents/ -x time

In [None]:
!python enjoy.py --algo td3 --env Ant-v3 --no-render --n-timesteps 5000

In [None]:
!python -m utils.record_video --algo td3 --env Ant-v3

In [None]:
show_videos(video_path='rl-trained-agents/td3', prefix='Ant-v3')

## Train an RL Agent


The train agent can be found in the `logs/` folder.

Here we will train A2C on CartPole-v1 environment for 100 000 steps. 


To train it on Pong (Atari), you just have to pass `--env PongNoFrameskip-v4`

Note: You need to update `hyperparams/algo.yml` to support new environments. You can access it in the side panel of Google Colab. (see https://stackoverflow.com/questions/46986398/import-data-into-google-colaboratory)

In [None]:
!python train.py --algo dqn --env PongNoFrameskip-v4 --n-timesteps 1000000

#### Evaluate trained agent


You can remove the `--folder logs/` to evaluate pretrained agent.

In [None]:
!python enjoy.py --algo dqn --env PongNoFrameskip-v4 --no-render --n-timesteps 5000 --folder logs/

#### Tune Hyperparameters

We use [Optuna](https://optuna.org/) for optimizing the hyperparameters.

Tune the hyperparameters for PPO, using a tpe sampler and median pruner, 2 parallels jobs,
with a budget of 1000 trials and a maximum of 50000 steps

In [None]:
#!python train.py --algo dqn --env PongNoFrameskip-v4 -n 5000 -optimize --n-trials 10 --n-jobs 5 --sampler tpe --pruner median

### Display the video

### Continue Training

Here, we will continue training of the previous model

In [None]:
#!python train.py --algo dqn --env PongNoFrameskip-v4  --n-timesteps 50000 -i logs/dqn/PongNoFrameskip-v4_1/PongNoFrameskip-v4.zip

In [None]:
#!python enjoy.py --algo dqn --env PongNoFrameskip-v4 --no-render --n-timesteps 1000 --folder logs/