<img width="800px" src="../fidle/img/header.svg"></img>

# <!-- TITLE --> [PYTORCH1] - Practical Lab : PyTorch
<!-- DESC --> PyTorch est l'un des principaux framework utilis√© dans le Deep Learning
<!-- AUTHOR : Kamel Guerda (CNRS/IDRIS) -->

## Objectives :
 - Understand PyTorch

## **Introduction**

**PyTorch** is an open-source machine learning library developed by Facebook's AI Research lab. It offers an imperative and dynamic computational model, making it particularly easy and intuitive for researchers. Its primary feature is the tensor, a multi-dimensional array similar to NumPy's ndarray, but with GPU acceleration.

### **Installation and usage**

Whether you're working on the supercomputer Jean Zay or your own machine, getting your environment ready is the first step. Here's how to proceed:

#### **On Jean Zay**

For those accessing the Jean Zay supercomputer (you should already be at step 3):

1. **Access JupyterHub**: Go to [https://jupyterhub.idris.fr](https://jupyterhub.idris.fr). The login credentials are the same as those used to access the Jean Zay machine. Ensure your IP address is whitelisted (add a new IP via the account management form if needed).
2. **Create a JupyterLab Instance**: Choose to create the instance either on a frontend node (e.g., for internet access) or on a compute node by reserving resources via Slurm. Select the appropriate options such as workspace, allocated resources, billing, etc.
3. **Choose the Kernel**: IDRIS provides kernels based on modules installed on Jean Zay. This includes various versions of Python, Tensorflow, and PyTorch. Create a new notebook with the desired kernel through the launcher or change the kernel on an existing notebook by clicking the kernel name at the top right of the screen.
4. For advanced features like Tensorboard, MLFlow, custom kernel creation, etc., refer to the [JupyterHub technical documentation](https://jupyterhub.idris.fr/services/documentation/).


> **Task:** Verifying Your Kernel in the upper top corner
>    - In JupyterLab, at the top right of your notebook, you should see the name of your current kernel.
>    - Ensure it matches "PyTorch 2.0" or a similar name indicating the PyTorch version.
>    - If it doesn't, click on the kernel name and select the appropriate kernel from the list.


#### **Elsewhere**


For users on other platforms:

1. Install PyTorch by following the official [installation guide](https://pytorch.org/get-started/locally/).
2. If you have a GPU, ensure you've installed the necessary CUDA toolkit and cuDNN libraries.
3. Launch your preferred Python environment, whether it's Jupyter notebook, an IDE like PyCharm, or just the terminal.

Once your setup is complete, you're ready to dive in. Let's explore the fascinating world of deep learning!

### **Version**

In [1]:
# Importing PyTorch
import torch

# TODO: Print the version of PyTorch being used


<details>
<summary>Hint (click to reveal)</summary>
To print the version of PyTorch you're using, you can access the <code>__version__</code> attribute of the <code>torch</code> module.    
    
```python
print(torch.__version__)
```

**Why PyTorch 2.0 is a Game-Changer**

PyTorch 2.0 represents a major step in the evolution of this popular deep learning library. As part of the transition to the 2-series, let's highlight some reasons why this version is pivotal:

1. **Performance**: With PyTorch 2.0, performance has been supercharged at the compiler level, offering faster execution and support for Dynamic Shapes and Distributed systems.
  
2. **torch.compile**: This introduces a more Pythonic approach, moving some parts of PyTorch from C++ back to Python. Notably, across a test set of 163 open-source models, the use of `torch.compile` resulted in a 43% speed increase during training on an NVIDIA A100 GPU.

3. **Innovative Technologies**: Technologies like TorchDynamo and TorchInductor, both written in Python, make PyTorch more flexible and developer-friendly.
  
4. **Staying Pythonic**: PyTorch 2.0 emphasizes Python-centric development, reducing barriers for developers and vendors.

As we progress in this lab, we'll dive deeper into some of these features, giving you hands-on experience with the power and flexibility of PyTorch 2.0.


## **Pytorch Fundamentals**

### **Tensors**

A **tensor** is a generalization of vectors and matrices and is easily understood as a multi-dimensional array. In the context of PyTorch:
- A 0-dimensional tensor is a scalar (a single number).
- A 1-dimensional tensor is a vector.
- A 2-dimensional tensor is a matrix.
- ... and so on for higher dimensions.

Tensors are fundamental to PyTorch not just as data containers but also for their compatibility with GPU acceleration, making operations on them extremely fast. This acceleration is vital for training large neural networks.

Let's start our journey with tensors by examining how PyTorch handles scalars.

#### **Scalars in PyTorch**

### Scalars in PyTorch

A scalar, being a 0-dimensional tensor, is simply a single number. While it might seem trivial, understanding scalars in PyTorch lays the foundation for grasping more complex tensor structures. Familiarize yourself with the `torch.tensor()` function from the [official documentation](https://pytorch.org/docs/stable/generated/torch.tensor.html) before proceeding.

> **Task**: Create a scalar tensor in PyTorch and examine its properties.


In [2]:
# TODO: Create a scalar tensor with the value 7.5
scalar_tensor = # Your code here

# Print the scalar tensor
print("Scalar Tensor:", scalar_tensor)

# TODO: Print its dimension, shape, and type


SyntaxError: invalid syntax (2309926818.py, line 2)

<details>
<summary>Hint (click to reveal)</summary>
To create a scalar tensor, use the <code>torch.tensor()</code> function. To retrieve its dimension, shape, and type, you can use the <code>.dim()</code>, <code>.shape</code>, and <code>.dtype</code> attributes respectively. 

Here's how you can achieve that:

```python
scalar_tensor = torch.tensor(7.5)
print("Scalar Tensor:", scalar_tensor)
print("Dimension:", scalar_tensor.dim())
print("Shape:", scalar_tensor.shape)
print("Type:", scalar_tensor.dtype)
```
</details>

#### **Vectors in PyTorch**

A vector in PyTorch is a 1-dimensional tensor. It's essentially a list of numbers that can represent anything from a sequence of data points to the weights of a neural network layer.

In this section, we'll see how to create and manipulate vectors using PyTorch. We'll also look at some basic operations you can perform on them.

> **Task**: Create a 1-dimensional tensor (vector) with values `[1.5, 2.3, 3.1, 4.8, 5.2]` and print its dimension, shape, and type.

Start by referring to the `torch.tensor()` function in the [official documentation](https://pytorch.org/docs/stable/generated/torch.tensor.html) to understand how to create tensors of varying dimensions.


In [3]:
# TODO: Create a 1-dimensional tensor (vector) with values [1.5, 2.3, 3.1, 4.8, 5.2]
vector_tensor = # Your code here

# Print the vector tensor
print("Vector Tensor:", vector_tensor)

# TODO: Print its dimension, shape, and type


SyntaxError: invalid syntax (138343520.py, line 2)

<details>
<summary>Hint (click to reveal)</summary>
Creating a 1-dimensional tensor is similar to creating a scalar. Instead of a single number, you pass a list of numbers to the <code>torch.tensor()</code> function. The <code>.dim()</code>, <code>.shape</code>, and <code>.dtype</code> attributes will help you retrieve its properties.

```python
vector_tensor = torch.tensor([1.5, 2.3, 3.1, 4.8, 5.2])
print("Vector Tensor:", vector_tensor)
print("Dimension:", vector_tensor.dim())
print("Shape:", vector_tensor.shape)
print("Type:", vector_tensor.dtype)
```
</details>

#### **Vector Operations**

Vectors are not just static entities; we often perform various operations on them, especially in the context of neural networks. This includes addition, subtraction, scalar multiplication, dot products, etc.

> **Task**: Using the previously defined `vector_tensor`, perform the following operations:
1. Add 5 to all the elements of the vector.
2. Multiply all the elements of the vector by 2.
3. Compute the dot product of the vector with itself.

In [4]:
# TODO: Add 5 to all elements
vector_added = # Your code here

# TODO: Multiply all elements by 2
vector_multiplied = # Your code here

# TODO: Compute the dot product with itself
dot_product = # Your code here

# Print the results
print("Vector after addition:", vector_added)
print("Vector after multiplication:", vector_multiplied)
print("Dot Product:", dot_product)

SyntaxError: invalid syntax (184231995.py, line 2)

<details>
<summary>Hint (click to reveal)</summary>
PyTorch tensors support regular arithmetic operations. For the dot product, you can use the <code>torch.dot()</code> function.

```python

vector_added = vector_tensor + 5
vector_multiplied = vector_tensor * 2
dot_product = torch.dot(vector_tensor, vector_tensor)

print("Vector after addition:", vector_added)
print("Vector after multiplication:", vector_multiplied)
print("Dot Product:", dot_product)
```
</details>

#### **Matrices in PyTorch**

A matrix in PyTorch is represented as a 2D tensor. Just as vectors are generalizations of scalars, matrices are generalizations of vectors, providing an additional dimension. Matrices are crucial for a range of operations in deep learning, including representing datasets, transformations, and more.


##### **Creating Matrices**

Before diving into manual matrix creation, it's beneficial to know some utility functions PyTorch provides:

- `torch.rand()`: Generates a matrix with random values between 0 and 1.
- `torch.eye()`: Creates an identity matrix.
- `torch.zeros()`: Generates a matrix filled with zeros.
- `torch.ones()`: Generates a matrix filled with ones.

You can explore more about these functions in the [official documentation](https://pytorch.org/docs/stable/tensors.html).

> **Task**: Using the above functions, create the following matrices:
> 1. A 3x3 matrix with random values.
> 2. A 5x5 identity matrix.
> 3. A 2x4 matrix filled with zeros.
> 4. A 4x2 matrix filled with ones.


In [None]:
# Your code for creating the matrices goes here



<details>
<summary>Hint (click to reveal)</summary>

To create these matrices, make use of the following functions:

1. `torch.rand(size)`: Use this function and specify the size as `(3, 3)` to create a 3x3 matrix with random values.
2. `torch.eye(n, m)`: Use this to generate an identity matrix. For a square matrix like 5x5, n and m would both be 5.
3. `torch.zeros(m, n)`: For a 2x4 matrix filled with zeros, specify m=2 and n=4.
4. `torch.ones(m, n)`: Similar to the `zeros` function but fills the matrix with ones.

```python
# 1. 3x3 matrix with random values
random_matrix = torch.rand(3, 3)
print(random_matrix)

# 2. 5x5 identity matrix
identity_matrix = torch.eye(5, 5)
print(identity_matrix)

# 3. 2x4 matrix filled with zeros
zero_matrix = torch.zeros(2, 4)
print(zero_matrix)

# 4. 4x2 matrix filled with ones
one_matrix = torch.ones(4, 2)
print(one_matrix)
```
</details>


#### **Matrix Operations in PyTorch**

Just like vectors, matrices can undergo a variety of operations. Some of the basic ones include matrix addition, subtraction, and multiplication. More advanced operations include matrix inversion, transposition, and determinant calculation.


##### **Basic Matrix Operations**

> **Task**: Perform the following operations on matrices:
> 1. Create two 3x3 matrices with random values.
> 2. Add the two matrices.
> 3. Subtract the second matrix from the first one.
> 4. Multiply the two matrices element-wise.

Remember, for matrix multiplication that results in the dot product, you'd use `torch.mm` or `@`, but for element-wise multiplication, you use `*`.

Here's the [official documentation](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.matmul) on matrix operations for your reference.


In [None]:
# Your code for creating the matrices and performing the operations goes here

<details>
<summary>Hint (click to reveal)</summary>

Here's how you can perform the given matrix operations:

```python
# 1. Create two 3x3 matrices with random values
matrix1 = torch.rand(3, 3)
matrix2 = torch.rand(3, 3)
print("Matrix 1:\n", matrix1)
print("\nMatrix 2:\n", matrix2)

# 2. Add the two matrices
sum_matrix = matrix1 + matrix2
print("\nSum of matrices:\n", sum_matrix)

# 3. Subtract the second matrix from the first one
difference_matrix = matrix1 - matrix2
print("\nDifference of matrices:\n", difference_matrix)

# 4. Multiply the two matrices element-wise
product_matrix = matrix1 * matrix2
print("\nElement-wise product of matrices:\n", product_matrix)
```
</details>

#### **Higher-Dimensional Tensors in PyTorch**

While scalars, vectors, and matrices cover 0D, 1D, and 2D tensors respectively, in deep learning, especially in tasks like image processing, you often encounter tensors with more than two dimensions.

For instance, a colored image is often represented as a 3D tensor: height x width x channels (e.g., RGB channels). A batch of such images would then be a 4D tensor: batch_size x height x width x channels.

Let's get our hands dirty with some higher-dimensional tensors!


##### **Creating a 3D Tensor**

> **Task**: Create a 3D tensor representing 2 images of size 4x4 with 3 channels (like RGB) filled with random values.

Use the `torch.rand` function, and remember to specify the dimensions correctly.

Here's the [official documentation](https://pytorch.org/docs/stable/tensors.html#creation-ops) for tensor creation.


In [None]:
# Your code for creating the 3D tensor goes here

<details>
<summary>Hint (click to reveal)</summary>

Creating a 3D tensor with the given specifications can be achieved using the `torch.rand` function. Here's how:

```python
# Create a 3D tensor representing 2 images of size 4x4 with 3 channels
image_tensor = torch.rand(2, 4, 4, 3)
print(image_tensor)
```
</details>

#### **Reshaping Tensors**

In deep learning, we often need to reshape our tensors. For instance, an image represented as a 3D tensor might need to be reshaped into a 1D tensor before passing it through a fully connected layer. PyTorch provides methods to make this easy.

The most commonly used method for reshaping tensors in PyTorch is the `view()` method. Another method that offers more flexibility (especially when you're unsure about the size of one dimension) is `reshape()`.

>[Task]: Using the official documentation, find out how to use the [`view()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view) and [`reshape()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.reshape) methods. Create a 2x3 tensor using `torch.tensor()` and then reshape it into a 3x2 tensor.


In [None]:
# Create a 2x3 tensor

# Reshape it into a 3x2 tensor


<details>
<summary>Hint (click to reveal)</summary>
To reshape a tensor using <code>view()</code> method:

```python
tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])
reshaped_tensor = tensor.view(3, 2)
```
<br>
Alternatively, using the <code>reshape()</code> method:

```python
reshaped_tensor = tensor.reshape(3, 2)
```
</details>

#### **Broadcasting**

Broadcasting is a powerful feature in PyTorch that allows you to perform operations between tensors of different shapes. When possible, PyTorch will automatically reshape the tensors in a way that makes the operation valid. This can significantly reduce manual reshaping and is efficient in memory usage.

However, it's essential to understand the rules and nuances of broadcasting to use it effectively and avoid unexpected behaviors.

>[Task]: Given a tensor `A` of shape (4, 1) and another tensor `B` of shape (1, 4), use PyTorch operations to produce a result tensor of shape (4, 4). Check the [official documentation on broadcasting](https://pytorch.org/docs/stable/notes/broadcasting.html) for guidance.


In [None]:
# Define tensor A of shape (4, 1) and tensor B of shape (1, 4)

# Perform an operation to get a result tensor of shape (4, 4)


<details>
<summary>Hint (click to reveal)</summary>
You can simply use addition, subtraction, multiplication, or any other element-wise operation. When you do this operation, PyTorch will automatically broadcast the tensors to a compatible shape. For example:

```python
A = torch.tensor([[1], [2], [3], [4]])
B = torch.tensor([[1, 2, 3, 4]])
result = A * B
print(result)
```
</details>


### **GPU Support with CUDA**

PyTorch seamlessly supports operations on Graphics Processing Units (GPUs) through CUDA, an API developed by NVIDIA for their GPUs. If you have a compatible NVIDIA GPU on your machine, PyTorch can utilize it to speed up tensor operations which can be orders of magnitude faster than on a CPU.

To verify if your PyTorch installation can use CUDA, you can check the attribute `torch.cuda.is_available()`. This returns `True` if CUDA is available and PyTorch can use GPUs, otherwise it returns `False`.

>[Task]: Print whether CUDA support is available on your system. The [CUDA documentation](https://pytorch.org/docs/stable/cuda.html) might be useful for this task.

In [None]:
# Check and print if CUDA is available
cuda_available = None  # Replace None with the appropriate code
print("CUDA available:", cuda_availablez

<details>
<summary>Hint (click to reveal)</summary>

To check if CUDA is available, you can utilize the torch.cuda.is_available() function.
```python
cuda_available = torch.cuda.is_available()
print("CUDA available:", cuda_available)
```
</details>

When developing deep learning models in PyTorch, it's a good habit to write device-agnostic code. This means your code can automatically use a GPU if available, or fall back to using the CPU if not. The `torch.device` object allows you to specify the device (either CPU or GPU) where you'd like your tensors to be allocated.

To dynamically determine the device, a common pattern is to check `torch.cuda.is_available()`, and set the device accordingly. This is particularly useful when you want your code to be flexible, regardless of the underlying hardware.

>[Task]: Define a `device` variable that is set to 'cuda:0' if CUDA is available and 'cpu' otherwise. Create a tensor on this device. The [documentation about torch.device](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device) might be handy.

In [None]:
# Define the device
device = None  # Replace None with the appropriate code

# Create a tensor on the specified device
tensor_on_device = torch.tensor([1, 2, 3, 4, 5], device=device)

<details>
<summary>Hint (click to reveal)</summary>

To define the device variable dynamically:

```python
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
```
<br>
After setting the device, you can create tensors on it directly using the device argument.

</details>


### **Automatic Differentiation with Autograd**

PyTorch's `autograd` module provides the tools for automatically computing the gradients for tensors. This feature is a cornerstone for neural network training, as gradients are essential for optimization algorithms like gradient descent.

When we create a tensor, `requires_grad` is set to `False` by default, meaning it won't track operations. However, if we set `requires_grad=True`, PyTorch will start to track all operations on the tensor.

Let's start with a simple example:

>**Task:** Create a tensor that holds a single value, let's say 2, and set `requires_grad=True`. Then, define a simple operation like squaring the tensor. Finally, inspect the resulting tensor. The [documentation for requires_grad](https://pytorch.org/docs/stable/autograd.html#torch.Tensor.requires_grad) might be handy.

In [None]:
# TODO: Create a tensor, perform a simple operation, and print its data and grad_fn separately.


<details>
<summary>Hint (click to reveal)</summary>

To create a tensor with requires_grad=True and square it:

```python
# TODO: Create a tensor, perform a simple operation, and print its data and grad_fn separately.
x = torch.tensor([2.0], requires_grad=True)
y = x ** 2
print("Data:", y.data)
print("grad_fn:", y.grad_fn)
```
</details>

Once the operation is executed on a tensor, a new attribute grad_fn is created. This attribute references a function that has created the tensor. In our example, since we squared the tensor, grad_fn will be of type PowBackward0.

This grad_fn attribute provides a link to the computational history of the tensor, allowing PyTorch to backpropagate errors and compute gradients when training neural networks.

#### **Computing Gradients**

Now, let's compute the gradients of `out` with respect to `x`. To do this, we'll call the `backward()` method on the tensor `out`.

>[Task]: Compute the gradients of `out` by calling the `backward()` method on it. Afterwards, print the gradients of `x`. The [documentation for backward()](https://pytorch.org/docs/stable/autograd.html#torch.autograd.backward) may be useful.


In [None]:
# TODO: Compute the gradient and print it.

<details>
<summary>Hint (click to reveal)</summary>

To compute the gradient:

```python
y.backward()
print(x.grad)
```
</details>

#### **Gradient Accumulation**

In PyTorch, the gradients of tensors are accumulated into the `.grad` attribute each time you call `.backward()`. This means that if you call `.backward()` multiple times, the gradients will add up.

However, by default, calling `.backward()` consumes the computational graph to save memory. If you intend to call `.backward()` multiple times on the same graph, you need to specify `retain_graph=True` during all but the last call.

>[Task]: Create a tensor, perform an operation on it, and then call `backward()` twice. Use `retain_graph=True` in the first call to retain the computational graph. Observe the `.grad` attribute after each call.


In [None]:
# Create a tensor
w = torch.tensor([1.0], requires_grad=True)

# Operation
result = w * 2

# TODO: Call backward twice (using retain_graph=True for the first call) and print the grad after each call
# ...


<details>
<summary>Hint (click to reveal)</summary>

```python
result.backward(retain_graph=True)
print(w.grad)  # This should print 2

result.backward()
print(w.grad)  # This should print 4, as gradients get accumulated
```
</details>

#### **Zeroing Gradients**



In neural network training, we typically want to update our weights with the gradients after each forward and backward pass. This means that we don't want the gradients to accumulate across multiple passes. Hence, it's common to zero out the gradients at the start of a new iteration.

>[Task]: Using the tensor from the previous cell, zero out its gradients and verify that it has been set to zero.


In [None]:
# TODO: Zero out the gradients of w and print

<details>
<summary>Hint (click to reveal)</summary>

```python

w.grad.zero_()
print(w.grad)
```
</details>

#### **Non-Scalar Backward**

When dealing with non-scalar tensors, `backward` requires an additional argument: the gradient of the tensor with respect to some scalar (usually a loss). 

>[Task]: Create a tensor of shape (2, 2) with `requires_grad=True`. Compute a non-scalar result by multiplying the tensor with itself. Then, compute backward with a gradient argument. You can consult the [backward documentation](https://pytorch.org/docs/stable/autograd.html#torch.autograd.backward) for reference.

In [None]:
# TODO: Create a tensor, perform an operation, and compute backward with a gradient argument

<details>
<summary>Hint (click to reveal)</summary>

```python

v = torch.tensor([[2.0, 3.0], [4.0, 5.0]], requires_grad=True)
result = v * v

grads = torch.tensor([[1.0, 1.0], [1.0, 1.0]])
result.backward(grads)
```
</details>

#### **Stopping Gradient Tracking**



There are scenarios where we don't want to track the gradients for certain operations. This can be achieved in two main ways:

1. **Using `torch.no_grad()`**: This context manager ensures that the enclosed operations are excluded from gradient tracking.
2. **Using `.detach()`**: Creates a tensor that shares the same storage but does not require gradients.

>[Task]: Create a tensor with `requires_grad=True`. Then, demonstrate both methods above to prevent gradient computation.


In [None]:
# TODO: Demonstrate operations without gradient tracking



<details>
<summary>Hint (click to reveal)</summary>

```python

# Using torch.no_grad()
with torch.no_grad():
    result_no_grad = v * v
print(result_no_grad.requires_grad)

# Using .detach()
detached_tensor = v.detach()
result_detach = detached_tensor * detached_tensor
print(result_detach.requires_grad)
```
</details>

## **Building a Simple Neural Network with PyTorch**

Neural networks are the cornerstone of deep learning. They are organized as a series of interconnected nodes or "neurons" that are structured into layers: an input layer, several hidden layers, and an output layer. Data flows through this network, undergoing transformations at each node, until it emerges at the output.

With PyTorch's `torch.nn` module, constructing these neural networks becomes straightforward. Let's dive into its main components:

### **nn.Module: The Base Class for Neural Networks**

Every neural network in PyTorch is derived from the `nn.Module` class. This class offers:
- Organization and management of the layers.
- Capabilities for GPU acceleration.
- Implementation of the forward pass.

When we inherit from `nn.Module`, our custom neural network class benefits from these functionalities.
For more details, you can refer to the official [documentation](https://pytorch.org/docs/stable/generated/torch.nn.Module.html).



>**Task:** Familiarize yourself with the structure of a simple neural network provided below. Later, you'll be enriching it.

In [None]:
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNet, self).__init__()
        # Define layers here

    def forward(self, x):
        # Call the layers in the correct order here
        return x

### **Linear Layers: Making Connections**

In PyTorch, a linear layer performs an affine transformation. It has both weights and biases which get updated during training. The transformation it performs can be described as:

$ y = xA^T + b $

Where:
- \( x \) is the input
- \( A \) represents the weights
- \( b \) is the bias

The `nn.Linear` class in PyTorch creates such a layer.

[Documentation Link for nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)


> **Task:** Add an input layer and an output layer to the `SimpleNet` class. 
>
> - The input layer should transform from `input_size` to `hidden_size`.
> - The output layer should transform from `hidden_size` to `output_size`.
> - After defining the layers in the `__init__` method, call them in the `forward` method to perform the transformations.


In [None]:
# Modify the below code by adding input and output linear layers in the appropriate places

class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNet, self).__init__()
        # Define layers here

    def forward(self, x):
        # Call the layers in the correct order here
        return x


<details>
<summary>Hint (click to reveal)</summary>
To define the input and output linear layers, use the `nn.Linear` class in the `__init__` method:

Then, in the `forward` method, pass the input through the defined layers.

```python
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNet, self).__init__()
        self.input_layer = nn.Linear(input_size, hidden_size)
        self.output_layer = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return x
```
</details>

### **Activation Functions: Introducing Non-Linearity**

Activation functions are critical components in neural networks, introducing non-linearity between layers. This non-linearity allows networks to learn from the error and make adjustments, which is essential for learning complex patterns.

In PyTorch, many activation functions are available as part of the `torch.nn` module, such as ReLU, Sigmoid, and Tanh.

For our `SimpleNet` model, we'll use the ReLU (Rectified Linear Unit) activation function after the input layer. The ReLU function is defined as \(f(x) = max(0, x)\).

Learn more about [ReLU and other activation functions in the official documentation](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity).

> **Task**: Update your `SimpleNet` class to include the ReLU activation function after the input layer. For this, you'll need to both define the activation function in `__init__` and apply it in the `forward` method.


In [None]:
# Copy the previous SimpleNet definition and modify the code to include the ReLU activation function.

<details>
<summary>Hint (click to reveal)</summary>
To include the ReLU activation in your neural network:

1. Define the ReLU activation function in the `__init__` method.
2. Apply the activation function in the `forward` method after passing through the `input_layer`.

```python
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNet, self).__init__()
        self.input_layer = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()  # Defining the ReLU activation function
        self.output_layer = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.input_layer(x)
        x = self.relu(x)  # Applying the ReLU activation function
        x = self.output_layer(x)
        return x
```
</details>

#### **Adjusting the Network: Adding Dropout**

[Dropout](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html) is a regularization technique that can improve generalization in neural networks. It works by randomly setting a fraction of input units to 0 at each update during training time. 

> **Task**: Modify the `SimpleNet` class to include a dropout layer with a dropout probability of 0.5 between the input layer and the output layer. Don't forget to call this layer in the forward method. 
>
> Remember, after modifying the class structure, you'll need to re-instantiate your model object.

In [None]:
# Add a dropout layer to your previous code

<details>
<summary>Hint (click to reveal)</summary>

Here's how you can modify the SimpleNet class to include dropout:

```python

class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNet, self).__init__()
        self.input_layer = nn.Linear(input_size, hidden_size)
        self.dropout = nn.Dropout(0.5)
        self.output_layer = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.input_layer(x)
        x = self.dropout(x)
        return self.output_layer(x)
    
model = SimpleNet(input_size, hidden_size, output_size).to(device)    
```
Don't forget to create a new instance of your model: model = SimpleNet(input_size, hidden_size, output_size).to(device)
</details>

### **Utilizing the Neural Network**

Once our neural network is defined, it's time to put it to use. This section will cover:

1. Instantiating the network
2. Transferring the network to GPU (if available)
3. Making predictions using the network (forward pass)
4. Understanding training and evaluation modes
5. Performing a backward pass to compute gradients

#### **1. Instantiating the Network**


To use our `SimpleNet`, we first need to create an instance of it. While creating an instance, the network's weights are also initialized.

> **Task**: Instantiate the `SimpleNet` class. Use `input_size=5`, `hidden_size=3`, and `output_size=1` as parameters.

In [None]:
# Your code here: Instantiate the model

<details>
<summary>Hint (click to reveal)</summary>

To instantiate the SimpleNet class:

```python

model = SimpleNet(input_size=5, hidden_size=3, output_size=1)
print(model)
```
</details>

#### **2. Transferring the Network to GPU**



PyTorch makes it very straightforward to transfer our model to a GPU if one is available. This is done using the .to() method.

> **Task**: Check if GPU (CUDA) is available. If it is, transfer the model to the GPU.

In [None]:
# Check for GPU availability and transfer the model to GPU if available.

<details>
<summary>Hint (click to reveal)</summary>

To transfer the model to the GPU if it's available:

```python

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
```
</details>

#### **3. Making Predictions using the Network (Forward Pass)**

With our model instantiated and potentially on a GPU, we can use it to make predictions. This involves passing some input data through the model, which is commonly referred to as a forward pass.

> **Task**: Create a tensor of size [1, 5] (representing one sample with five features) with random values. Transfer this tensor to the same device as your model (GPU or CPU). Then, pass this tensor through your model to get the prediction.

In [None]:
# Create a tensor, transfer it to the right device, and perform a forward pass.


<details>
<summary>Hint (click to reveal)</summary>

To make predictions using your model:

```python

# Create a tensor with random values
input_tensor = torch.randn(1, 5).to(device)

# Pass the tensor through the model
output = model(input_tensor)
print(output)
```
</details>

#### **4. Understanding Training and Evaluation Modes**

Every PyTorch model has two modes:
- `train` mode: In this mode, certain layers like dropout or batch normalization behave differently than during evaluation. For instance, dropout will randomly set a fraction of input units to 0 at each update during training.
- `eval` mode: Here, the model behaves in a deterministic manner. Dropout layers don't drop activations, and batch normalization uses the entire dataset's statistics instead of the current mini-batch's statistics.

Setting the model to the correct mode is crucial. Let's demonstrate this.

> **Task**: Set your model to `train` mode, then perform a forward pass using the same input tensor multiple times and observe the outputs. Then, set your model to `eval` mode and repeat. Notice any differences?

In [None]:
# Perform the forward passes multiple times with the same input in both modes and observe the outputs.

<details>
<summary>Hint (click to reveal)</summary>

Here's how you can demonstrate the difference:

```python
# Set to train mode
model.train()

# Forward pass multiple times
print("Train mode:")
for i in range(5):
    print(model(input_tensor))

# Set to eval mode
model.eval()
print("Eval mode:")
# Forward pass multiple times
for i in range(5):
    print(model(input_tensor))
```
    
If there were layers like dropout in your model, you'd notice that the outputs in training mode might differ on each pass, while in evaluation mode, they remain consistent.
</details>

## **The Training Procedure in PyTorch**

Training a neural network involves several key components: defining a loss function to measure errors, selecting an optimization method to adjust the model's weights, and iterating over the dataset multiple times. In this section, we will break down these components step by step, starting with the basics and moving towards more complex tasks.

### **Datasets and DataLoaders: Handling and Batching Data**

In PyTorch, the torch.utils.data.Dataset class is used to represent a dataset. This abstract class requires the implementation of two primary methods: __len__ (to return the number of items) and __getitem__ (to return the item at a given index). However, PyTorch provides a utility class, TensorDataset, that wraps tensors in the dataset format, making it easier to use with the DataLoader.

The torch.utils.data.DataLoader class is a more powerful tool, responsible for:

- Batching the data
- Shuffling the data
- Loading the data in parallel using multiprocessing workers

Let's wrap some data in a Dataset and use a DataLoader to handle batching and shuffling.

> **Task**: Convert the input and target tensors into a dataset and dataloader. For this exercise, set the batch size to 32.

Below we define synthetic data that is learnable.
This way, we're essentially modeling the relationship $y=mx+c+noise$  where:
- $y$ is the target or output.
- $m$ is the slope of the line.
- $c$ is the y-intercept.
- $x$ is the input.
- $noise$ is a small random value added to each point to make the data more realistic.

In [None]:
num_samples = 1000

# Define the relationship
m = 2.0
c = 1.0
noise_factor = 0.05



# Generate input tensor
input_tensor = torch.linspace(-10, 10, num_samples).view(-1, 1)

# Generate target tensor based on the relationship
target_tensor = m * input_tensor + c + noise_factor * torch.randn(num_samples, 1)
import matplotlib.pyplot as plt
plt.figure(figsize=(10,6))
plt.scatter(input_tensor.numpy(), target_tensor.numpy(), color='blue', marker='o')
plt.title("Synthetic Data Visualization")
plt.xlabel("Input")
plt.ylabel("Target")
plt.grid(True)
plt.show()


In [None]:
# Convert our data into a dataset
# ...

# Create a data loader for mini-batch training
# ...

<details>
<summary>Hint (click to reveal)</summary>

Use the TensorDataset class from torch.utils.data to wrap your tensors in a dataset format. After defining your dataset, you can use the DataLoader class to create an iterator that will return batches of data.
    
```python
from torch.utils.data import DataLoader, TensorDataset

# Convert our data into a dataset
dataset = TensorDataset(input_tensor, target_tensor)

# Create a data loader for mini-batch training
batch_size = 32
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
```
</details>

> **Task**: Explore the `dataset` and `data_loader`:
> 1. Print the total number of samples in the dataset and DataLoader.
> 2. Iterate one time over both and print the shape of items you retrieve.

In [None]:
# Total number of samples
# ...

# Dataset elements
# ...

# DataLoader elements
# ...

<details>
<summary>Hint (click to reveal)</summary>

When you iterate over the dataset, each item you get from the iteration should be a tuple of (input, target), so you should retrieve two elements each of len 1.

On the other hand, when you iterate over the data_loader, each item you get from the iteration is a mini-batch of data. Thus, the length you get from each iteration should correspond to the batch size you've set (i.e., 5 in our case), except possibly the last batch if the dataset size isn't a perfect multiple of the batch size.

```python
# Total number of samples
print(f"Total samples in dataset: {len(dataset)}")
print(f"Total batches in DataLoader: {len(data_loader)}")

# Dataset elements
(index, (data, target)) = next(enumerate(dataset))
print(f"Sample {index}: Data shape {data.shape}, Target shape {target.shape}")

# DataLoader elements
(index, (batch_data, batch_target)) = next(enumerate(data_loader))
print(f"Batch {index}: Data shape {batch_data.shape}, Target shape {batch_target.shape}")
```
</details>

### **Splitting the Dataset: Training, Validation, and Testing Sets**


When training neural networks, it's common to split the dataset into at least two sets:

1. **Training Set**: This set is used to train the model, i.e., adjust the weights using gradient descent.
2. **Validation Set** (optional, but often used): This set is used to evaluate the model during training, allowing for hyperparameter tuning without overfitting.
3. **Test Set**: This set is used to evaluate the model's performance after training, providing an unbiased assessment of its performance on new, unseen data.

In PyTorch, we can use the `random_split` function from `torch.utils.data` to easily split datasets.

First, let's define the lengths for each split:

In [None]:
total_samples = len(dataset)
train_size = int(0.8 * total_samples)
val_size = total_samples - train_size

> **Task**: Using the random_split function, split the dataset into a training set and a validation set using the sizes provided above.
[Here's the documentation for random_split](https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split).
> **Task**: Create the train_loader and val_loader

In [None]:
# Splitting the dataset


<details>
<summary>Hint (click to reveal)</summary>

```python

# Splitting the dataset
from torch.utils.data import random_split
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
```
</details>

> **Task**: Now, using the provided training and validation datasets, print out the number of samples in each set. Also, fetch one sample from each set and print its shape.


In [None]:
# Your code here

<details>
<summary>Hint (click to reveal)</summary>

```python

# Print number of samples in each set
print(f"Number of training samples: {len(train_dataset)}")
print(f"Number of validation samples: {len(val_dataset)}")

# Fetching one sample from each set and printing its shape
train_sample, train_target = train_dataset[0]
print(f"Training sample shape: {train_sample.shape}, Target shape: {train_target.shape}")

val_sample, val_target = val_dataset[0]
print(f"Validation sample shape: {val_sample.shape}, Target shape: {val_target.shape}")
```
</details>

### **Loss Functions: Measuring Model Errors**

Every training process needs a metric to determine how well the model's predictions align with the actual data. This metric is called the loss function or cost function. PyTorch provides many [loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions) suitable for different types of tasks.

Different problems might require different loss functions. PyTorch provides a variety of [loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions) suited for different tasks. For instance:
- **Mean Squared Error (MSE)**: Commonly used for regression tasks.
- **Cross-Entropy Loss**: Suited for classification tasks.


For a simple regression task, a common choice is the Mean Squared Error (MSE) loss. 

> **Task**: Familiarize yourself with the [MSE loss documentation](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html). You will soon use it in the training loop.

> **Task**:  Instantiate the Mean Squared Error (MSE) loss provided by PyTorch for our current neural network.

In [None]:
# Define the loss function.


<details>
<summary>Hint (click to reveal)</summary>

To define the MSE loss in PyTorch, you can use:

```python

criterion = nn.MSELoss()
```
</details>

### **Optimizers: Adjusting Weights**

Optimizers adjust the weights of the network based on the gradients computed during backpropagation. Different optimizers might update weights in varying ways. For example, the popular **Stochastic Gradient Descent (SGD)** optimizer simply updates weights in the direction of negative gradients, while **Adam** and **RMSprop** are more advanced optimizers that consider aspects like momentum and weight decay.

PyTorch offers a wide range of [optimizers](https://pytorch.org/docs/stable/optim.html). 


> **Task**: Review the [SGD optimizer documentation](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD). It will be pivotal in the training loop you'll construct.

> **Task**: For this exercise, let's use the SGD optimizer. Instantiate it, setting our neural network parameters as the ones to be optimized and choosing a learning rate of 0.01.



In [None]:
# Define the optimizer.


<details>
<summary>Hint (click to reveal)</summary>

To define the SGD optimizer in PyTorch, you can use:

```python
optimizer = torch.optim.SGD(model.parameters(), lr=0.0001)
```
Because of how simple the task is, you will probably need a really small learning rate to reach good results.
</details>



### **Setting Up the Basic Training Loop Function**

Having a training loop within a function allows us to reuse the same code structure for different models, datasets, or other training parameters without redundancy. This modular approach also promotes code clarity and maintainability.

Let's define the training loop function which takes the model, data (inputs and targets), loss function, optimizer, and the number of epochs as parameters. The function should return the history of the loss after each epoch.

A typical training loop consists of:
1. Sending the input through the model (forward pass).
2. Calculating the loss.
3. Propagating the loss backward through the model to compute gradients (backward pass).
4. Updating the weights using the optimizer.
5. Repeating the steps for several epochs.


Training with the entire dataset as one batch can be memory-intensive and sometimes not as effective. Hence, in practice, we usually divide our dataset into smaller chunks or mini-batches and update our weights after each mini-batch.

> **Task**: Create a function named `train_model` that encapsulates the training loop for the `SimpleNet` model. The function should follow the signature the next code cell:

In [None]:
def train_model(model, dataloader, loss_function, optimizer, epochs):
    # Your code here
    pass

<details>
<summary>Hint (click to reveal)</summary>

Here's how the train_model function might look:
```python

def train_model(model, dataloader, loss_function, optimizer, epochs):
    # Store the loss values at each epoch
    loss_history = []
    
    for epoch in range(epochs):
        for inputs, targets in dataloader:
            # Ensure that data is on the right device
            inputs, targets = inputs.to(device), targets.to(device)
            
            # Reset the gradients to zero
            optimizer.zero_grad()
            
            # Execute a forward pass
            outputs = model(inputs)
            
            # Calculate the loss
            loss = loss_function(outputs, targets)
            
            # Conduct a backward pass
            loss.backward()
            
            # Update the weights
            optimizer.step()
            
            # Append the loss to the history
            loss_history.append(loss.item())
            
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss_history[-1]:.4f}")
    
    return loss_history
```
</details>

### **Training the Neural Network**

With all the components defined in the previous sections, it's now time to integrate everything and set the training process in motion.

> **Task**: Combine all the previously defined elements to initiate the training procedure for your neural network model.
> 1. Don't forget to Move your model and to the same device (GPU or CPU).
> 2. Train the model using the `train_loader` and `val_loader`.



In [None]:
# Your code here to initiate the training process


<details>
<summary>Hint (click to reveal)</summary>

To train the model, you need to integrate all the previously defined components:

```python
# Moving the model to the device
model = SimpleNet(input_size=1, hidden_size=10, output_size=1).to(device)

# Training the model using the train_loader
loss_history = train_model(model, train_loader, criterion, optimizer, epochs=50)
```
Make sure you have defined the loss_function, optimizer, and epochs in the previous sections.
</details>

In [None]:
import matplotlib.pyplot as plt

# Plotting the loss curve
plt.figure(figsize=(10,6))
plt.plot(loss_history, label='Training Loss')
plt.title("Loss Curve")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)
plt.show()


## **Conclusion: Moving Beyond the Basics**


You've now built and trained a simple neural network using PyTorch, and you might be wondering: why aren't my results as good as I expected?

While you've certainly made strides, the journey of mastering deep learning and neural networks is filled with nuance, challenges, and constant learning. Here are some reasons why your results might not be optimal and what you'll discover in your next steps:

1. **Hyperparameters Tuning**: So far, we've set values like learning rate and batch size somewhat arbitrarily. These values are critical and often require careful tuning specific to each problem. 

2. **Learning Rate Scheduling**: A fixed learning rate might not always be the best strategy. Reducing the learning rate during training, known as learning rate annealing or scheduling, often leads to better convergence.

3. **Model Architecture**: The neural network we built is basic. There's an entire world of architectures out there, designed for specific types of data and tasks. The right architecture can make a significant difference.

4. **Regularization**: To prevent overfitting, techniques like dropout, weight decay, and early stopping can be applied. We haven't touched upon these, but they're crucial for ensuring your model generalizes well to unseen data.

5. **Data Quality and Quantity**: While we used synthetic data for simplicity, real-world data is messy. Cleaning and preprocessing data, augmenting it, and ensuring it's representative can have a significant impact on performance.

6. **Optimization Techniques**: There are advanced optimization algorithms and techniques that can speed up training and lead to better convergence. Techniques like momentum, adaptive learning rates (e.g., Adam, RMSprop) can play a crucial role.

7. **Evaluation Metrics**: We've looked at loss values, but in real-world scenarios, understanding and selecting the right evaluation metrics for the task (accuracy, F1-score, AUC-ROC, etc.) is vital. 

8. **Training Dynamics**: Understanding how models train, visualizing the activations, weights, and gradients, and knowing when and why a model is struggling can offer insights into how to improve performance.

Remember, while the mechanics of building and training a neural network are essential, the art of deep learning lies in understanding the nuances and iterating based on insights and knowledge. The next steps in your learning, focusing on methodology, will provide the tools and knowledge to navigate these complexities and achieve better results.

Keep learning, experimenting, and iterating! The world of deep learning is vast, and there's always something new to discover.

## **Extra for the Fast Movers: Diving Deeper**

To further enhance your understanding and capability with PyTorch, this section introduces additional topics that cater to more advanced use-cases. These tools and techniques can be essential when dealing with larger and more complex projects, providing valuable insights into optimization and performance.

### **Profiling with PyTorch Profiler in TensorBoard**

PyTorch, starting from version 1.9.0, incorporates the PyTorch Profiler as a TensorBoard plugin. This integration allows users to profile their PyTorch code and visualize the results directly within TensorBoard.
Below, we will be instrumenting PyTorch Code for TensorBoard Profiling.

Use this [documentation](http://www.idris.fr/jean-zay/pre-post/profiler_pt.html) to achieve the next tasks.

> **Task:** Before instrumenting your PyTorch code, you'll need to import the necessary modules for profiling.

> **Task:** Modify the training loop to invoke the profiler. 

In [None]:
# Your imports here

# Your code here
def train_model_with_profiling(model, train_loader, criterion, optimizer, epochs, profiler_dir='./profiler'):
    # Your code here
    pass

<details>
<summary>Hint (click to reveal)</summary>

```python
from torch.profiler import profile, tensorboard_trace_handler, ProfilerActivity, schedule

def train_model_with_profiling(model, dataloader, loss_function, optimizer, epochs, profiler_dir='./profiler'):
    # Store the loss values at each epoch
    loss_history = []
    
    with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
                 schedule=schedule(wait=1, warmup=1, active=12, repeat=1),
                 on_trace_ready=tensorboard_trace_handler(profiler_dir)) as prof:
        for epoch in range(epochs):
            for inputs, targets in dataloader:
                # Ensure that data is on the right device
                inputs, targets = inputs.to(device), targets.to(device)
                
                # Reset the gradients to zero
                optimizer.zero_grad()
                
                # Execute a forward pass
                outputs = model(inputs)
                
                # Calculate the loss
                loss = loss_function(outputs, targets)
                
                # Conduct a backward pass
                loss.backward()
                
                # Update the weights
                optimizer.step()
                
                # Append the loss to the history
                loss_history.append(loss.item())
                
                # Notify profiler of step boundary
                prof.step()
                
            print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss_history[-1]:.4f}")
    
    return loss_history
```
Make sure you have defined the loss_function, optimizer, and epochs in the previous sections.
</details>

In [None]:
# Training the model using the train_loader
loss_history = train_model_with_profiling(model, train_loader, criterion, optimizer, 10, profiler_dir='./profiler')

> **Task:** Visualize the profiling, you will need to open a Tensorboard interface using the Blue button on the top left corner.
>
> **Make sur to specify the logdir with "--logid=/path/to/profiler_folder".**

### **Learning Rate Scheduling**

One of the key hyperparameters to tune during neural network training is the learning rate. While it's possible to set a static learning rate for the entire training process, in practice, dynamically adjusting the learning rate often leads to better convergence and overall performance. This dynamic adjustment is often referred to as learning rate scheduling or annealing.
Concept of Learning Rate Scheduling

The learning rate determines the step size at each iteration while moving towards a minimum of the loss function. If it's too large, the optimization might overshoot the minimum. Conversely, if it's too small, the training might get stuck, or convergence could be very slow.

A learning rate scheduler changes the learning rate during training based on the provided scheduling policy. By adjusting the learning rate during training, you can achieve faster convergence and better final results.
Using Learning Rate Schedulers in PyTorch

PyTorch provides a variety of learning rate schedulers through the torch.optim.lr_scheduler module. Some of the popular ones are:
- StepLR: Decays the learning rate of each parameter group by gamma every step_size epochs.
- ExponentialLR: Decays the learning rate of each parameter group by gamma every epoch.
- ReduceLROnPlateau: Reduces the learning rate when a metric has stopped improving.

> **Task:** Take a look at the [documentation]() or click on the hint in the following cell then integrate an LR scheduler in your own code that you wrote before 


<details>
<summary>Hint (click to reveal)</summary>
Below, you have a typical training loop with a learning rate scheduler.
    
```python
from torch.optim.lr_scheduler import StepLR
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
for epoch in range(epochs):
    for input, target in data:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    
    # Step the learning rate scheduler
    scheduler.step()```
</details>


### **Automatic Mixed Precision**

Training deep neural networks can be both time-consuming and resource-intensive. One way to address this problem is by leveraging mixed precision training. In essence, mixed precision training uses both 16-bit and 32-bit floating-point types to represent numbers in the model, which can speed up training without sacrificing the accuracy of the final model.

**Overview of AMP (Automatic Mixed Precision)**

AMP (Automatic Mixed Precision) is a set of utilities provided by PyTorch to enable mixed precision training more effortlessly. The main advantages of AMP are:
- Faster Training: By using reduced precision, the model requires less memory bandwidth, resulting in faster data transfers and faster matrix multiplication.
- Reduced GPU Memory Usage: This enables training of larger models or utilization of larger batch sizes.

PyTorch has integrated the AMP utilities starting from version 1.6.

> **Task**: Setup AMP in the training function by checking the [documentation](http://www.idris.fr/eng/ia/mixed-precision-eng.html). You will need to do the necessary imports, initialize the GradScaler, modify the training loop by including "with autocast():" around the forward and loss computation.

In [None]:
# Your code here

<details>
<summary>Hint (click to reveal)</summary>
Below, you have a typical training loop with autocast.
    
```python
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for epoch in epochs:
    for input, target in data:
        optimizer.zero_grad()
        
        with autocast():
            output = model(input)
            loss = loss_fn(output, target)
        
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
```
</details>


### **Pytorch Compiler**

**For this section, you will need to use Pytorch with a version superior to 2.0.**

PyTorch, a widely adopted deep learning framework, has consistently evolved to offer users better performance and ease of use. One such advancement is the introduction of the PyTorch Compiler. This cutting-edge feature accelerates PyTorch code execution by JIT-compiling it into optimized kernels. What's even more impressive is its ability to enhance performance with minimal modifications to the original codebase.

Historically, PyTorch has introduced compiler solutions like TorchScript and FX Tracing. However, the introduction of torch.compile with PyTorch 2.0 has taken performance optimization to a new level. It provides a seamless experience, enabling you to transform typical PyTorch functions and even torch.nn.Module instances into their faster, compiled counterparts.

For those eager to dive deep into its workings and benefits, detailed documentation and tutorials have been made available:
- [torch.compile Tutorial](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html)
- [PyTorch 2.0 Release Notes](https://pytorch.org/get-started/pytorch-2.0/)

> **Task:**     Your task is to make your existing PyTorch model take advantage of the performance benefits offered by torch.compile. This will not only make your model run faster but also give you hands-on experience with one of the latest features in PyTorch.

<details>
<summary>Hint (click to reveal)</summary>

1. **Ensure Dependencies**:
   - Ensure that you have the required dependencies, especially PyTorch version 2.0 or higher.

2. **Check for GPU Compatibility**:
   - For optimal performance, it's recommended to use a modern NVIDIA GPU (H100, A100, or V100).

3. **Compile Functions**:
   - You can optimize arbitrary Python functions as shown in the example:
     ```python
     def your_function(x, y):
         # ... Your PyTorch code here ...
     opt_function = torch.compile(your_function)
     ```

   - Alternatively, use the decorator approach:
     ```python
     @torch.compile
     def opt_function(x, y):
         # ... Your PyTorch code here ...
     ```

4. **Compile Modules**:
   - If you have a PyTorch module (a class derived from `torch.nn.Module`), you can compile it similarly:
     ```python
     class YourModule(torch.nn.Module):
         # ... Your module definition here ...

     model = YourModule()
     opt_model = torch.compile(model)
     ```

</details>

Remember, while torch.compile optimizes performance, the underlying logic remains the same. Ensure to test and validate your compiled model's outputs against the original to confirm consistent behavior.

---
<img width="80px" src="../fidle/img/logo-paysage.svg"></img>