Commit 2ebd7449 authored by Florent Chatelain's avatar Florent Chatelain
Browse files

up lab+hw

parent e35b50d5
......@@ -81,11 +81,11 @@ videoconference to the class every monday from 15:45 to 17:45.
- upload your lab 1 *short report* in the [chamilo assigment task](https://chamilo.grenoble-inp.fr/main/work/work_list.php?cidReq=PHELMA5PMSAST6&id_session=0&gidReq=0&gradebook=0&origin=&id=117582) (pdf file from your editor, or scanned pdf file of a handwritten paper;
code, figures or graphics are not required)
##### Homework before the first lab on **Monday, September 20**
- read and run the [introduction notebooks](https://gricad-gitlab.univ-grenoble-alpes.fr/chatelaf/ml-sicom3a/-/tree/master/notebooks/1_introduction/) `N1_Linear_Classification.ipynb` and `N2_Polynomial_Classification_Model_Complexity.ipynb`
- answer the questions of the notebook exercises and upload it (pdf file from your editor, or scanned pdf file of a handwritten sheet) under chamilo in the [assignment tool](https://chamilo.grenoble-inp.fr/main/work/work_list.php?cidReq=PHELMA5PMSAST6&id_session=0&gidReq=0&gradebook=0&origin=&id=117272) (those and only those who do not yet have an agalan account can send it to me by email):
- only text explanations are required, no need to copy/paste figure or graphics!
- must not exceed half a length of A4 paper
##### ~~Homework before the first lab on **Monday, September 20**~~
- ~~read and run the [introduction notebooks](https://gricad-gitlab.univ-grenoble-alpes.fr/chatelaf/ml-sicom3a/-/tree/master/notebooks/1_introduction/) `N1_Linear_Classification.ipynb` and `N2_Polynomial_Classification_Model_Complexity.ipynb`~~
- ~~answer the questions of the notebook exercises and upload it (pdf file from your editor, or scanned pdf file of a handwritten sheet) under chamilo in the [assignment tool](https://chamilo.grenoble-inp.fr/main/work/work_list.php?cidReq=PHELMA5PMSAST6&id_session=0&gidReq=0&gradebook=0&origin=&id=117272) (those and only those who do not yet have an agalan account can send it to me by email):~~
- ~~only text explanations are required, no need to copy/paste figure or graphics!~~
- ~~must not exceed half a length of A4 paper~~
##### ~~First course session will take place Monday afternoon, September 13 at Minatec M256 (face-to-face).~~
......
......@@ -277,7 +277,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
"version": "3.7.9"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
......
......@@ -162,10 +162,11 @@
%% Cell type:markdown id: tags:
### Exercice
- Recall what are the benefits of cross-validation w.r.t the simple validation approach
- Is the CV accuracy estimate consitent with the accuracy estimate obtained on the test set?
- Apply the same CV procedure to the iris data set (see and adapt notebook
[`2_knn/N2_iris_knn.ipynb`](https://gricad-gitlab.univ-grenoble-alpes.fr/chatelaf/ml-sicom3a/-/blob/master/notebooks/2_knn/N2_iris_knn.ipynb)). Compare your results with the simple validation approach (fixed training and test set) used in [`2_knn/N2_iris_knn.ipynb`](https://gricad-gitlab.univ-grenoble-alpes.fr/chatelaf/ml-sicom3a/-/blob/master/notebooks/2_knn/N2_iris_knn.ipynb)
%% Cell type:code id: tags:
......
......@@ -139,19 +139,22 @@
![]()
%% Cell type:markdown id: tags:
## Exercice:
- Explain why the prediction performances estimated by non-nested CV
1. Explain why the prediction performances estimated by non-nested CV
is optimistic with respect to the nested CV ones?
- Which estimator do you think most reliable for the test error?
2. Which estimator do you think most reliable for the test error?
A current alternative in machine learning is to split the original dataset in **three**: a *training* set to fit the algorithm, a *validation* set to optimize the hyperparameters (note that training and validation sets can be split repeatdly in a cross-validation framework), and a *test* set used only to evaluate the test error.
- Why use a test set in addition to the validation set?
- What are the benefits or disavantages of nested cross-validation w.r.t. the training/validation/test split approach?
- Implement the test/validation/test approach on this example and compare with the nested cross-validation (*hint:* you may use `train_test_split()`, `GridSearchCV()`, `cross_val_score()` sklearn functions). Compare and interpret your results.
A current alternative in machine learning is to split the original dataset to get a **separate test set**.
A standard (i.e. non nested) cross-validation procedure can then be applied on the remaining data set. The samples in the test set are now only used to evaluate the test error.
<!--In practice, this often requires to split the original dataset in three: a training set to fit the algorithm, a validation set to optimize the hyperparameters (note that training and validation sets can be split repeatdly in a cross-validation framework), and a test set used only to evaluate the test error.-->
This is different to nested cross-validation where the test samples used in the outer-loop are also used in the inner-loop to train or validate the model.
3. Why use a separate test set in addition to the validation set?
4. What are the benefits (or disavantages) of nested cross-validation w.r.t. the ?
5. **Optional:** Implement the *separate test set* approach on this example and compare with the nested cross-validation (*hint:* you may use `train_test_split()`, `GridSearchCV()`, `cross_val_score()` sklearn functions). Compare and interpret your results.
%% Cell type:code id: tags:
``` python
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment