"Recall the sklearn documention to assess the [performance of a model (`model_evaluation` module)](https://scikit-learn.org/stable/modules/model_evaluation.html)."
]
]
},
},
{
{
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
This notebook can be run on mybinder: [](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fml-sicom3a/master?urlpath=lab/tree/notebooks/8_Trees_Boosting/N2_a_Regression_tree.ipynb)
This notebook can be run on mybinder: [](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fml-sicom3a/master?urlpath=lab/tree/notebooks/8_Trees_Boosting/N2_a_Regression_tree.ipynb)
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# REGRESSION TREE
# REGRESSION TREE
In this notebook, the methods illustrated are basically the same as those presented in the case of classification (notebook N1_Classif_tree.ipynb), **except** the criterion for defining the best split : in this regression case, the best split search at each node is conducted to minimize a **Mean Square Error criterion**. Note that this applies for numerical data only.
In this notebook, the methods illustrated are basically the same as those presented in the case of classification (notebook N1_Classif_tree.ipynb), **except** the criterion for defining the best split : in this regression case, the best split search at each node is conducted to minimize a **Mean Square Error criterion**. Note that this applies for numerical data only.
Let $N$ be the umber of samples in a set $S$. Splitting in two subsets will define the **partition** of $S$ into $\{ S_l, S_r \}$. The estimated variance or MSE of $S$ is
Let $N$ be the umber of samples in a set $S$. Splitting in two subsets will define the **partition** of $S$ into $\{ S_l, S_r \}$. The estimated variance or MSE of $S$ is
Recall the sklearn documention to assess the [performance of a model (`model_evaluation` module)](https://scikit-learn.org/stable/modules/model_evaluation.html).
y_t=y[test_index].ravel()# to force same dimensions as those of y_pred
y_t=y[test_index].ravel()# to force same dimensions as those of y_pred
mserr.append(np.square(y_t-y_pred).sum())
mserr.append(np.square(y_t-y_pred).sum())
# print(mserr)
# print(mserr)
reg_MSE.append(np.asarray(mserr).mean())
reg_MSE.append(np.asarray(mserr).mean())
plt.plot(depth,reg_MSE)
plt.plot(depth,reg_MSE)
plt.xlabel("max Depth of the reg_tree")
plt.xlabel("max Depth of the reg_tree")
plt.ylabel("MSE")
plt.ylabel("MSE")
plt.grid()
plt.grid()
```
```
%%%% Output: display_data
%%%% Output: display_data
[Hidden Image Output]
[Hidden Image Output]
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Exercice 5
## Exercice 5
- Determine the "optimal" depth to use to perform the best (MSE sense) tree based prediction
- Determine the "optimal" depth to use to perform the best (MSE sense) tree based prediction
- Change the noise power (set e.g. noise_std to take different values in the range $[.01;1]$ and study (plot) the obtained cross-validated "optimal depth" as a function of the noise power.
- Change the noise power (set e.g. noise_std to take different values in the range $[.01;1]$ and study (plot) the obtained cross-validated "optimal depth" as a function of the noise power.