Newer
Older
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img width=\"800px\" src=\"../fidle/img/00-Fidle-header-01.svg\"></img>\n",
"# <!-- TITLE --> [GTS1] - CNN with GTSRB dataset - Data analysis and preparation\n",
"<!-- DESC --> Episode 1 : Data analysis and creation of a usable dataset\n",
"<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n",
"## Objectives :\n",
" - Understand the **complexity associated with data**, even when it is only images\n",
" - Learn how to build up a simple and **usable image dataset**\n",
"\n",
"The German Traffic Sign Recognition Benchmark (GTSRB) is a dataset with more than 50,000 photos of road signs from about 40 classes. \n",
"The final aim is to recognise them ! \n",
"Description is available there : http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset\n",
"\n",
"\n",
"## What we're going to do :\n",
" - Preparing and formatting enhanced data\n",
" - Save enhanced datasets in h5 file format\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1 - Import and init\n",
"### 1.1 - Python"
"outputs": [
{
"data": {
"text/html": [
"<style>\n",
"\n",
"div.warn { \n",
" background-color: #fcf2f2;\n",
" border-color: #dFb5b4;\n",
" border-left: 5px solid #dfb5b4;\n",
" padding: 0.5em;\n",
" font-weight: bold;\n",
" font-size: 1.1em;;\n",
" }\n",
"\n",
"\n",
"\n",
"div.nota { \n",
" background-color: #DAFFDE;\n",
" border-left: 5px solid #92CC99;\n",
" padding: 0.5em;\n",
" }\n",
"\n",
"div.todo:before { content:url();\n",
" float:left;\n",
" margin-right:20px;\n",
" margin-top:-20px;\n",
" margin-bottom:20px;\n",
"}\n",
"div.todo{\n",
" font-weight: bold;\n",
" font-size: 1.1em;\n",
" margin-top:40px;\n",
"}\n",
"div.todo ul{\n",
" margin: 0.2em;\n",
"}\n",
"div.todo li{\n",
" margin-left:60px;\n",
" margin-top:0;\n",
" margin-bottom:0;\n",
"}\n",
"\n",
"\n",
"</style>\n",
"\n"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"FIDLE 2020 - Practical Work Module\n",
"Run time : Friday 28 February 2020, 09:23:59\n",
"TensorFlow version : 2.0.0\n",
"Keras version : 2.2.4-tf\n"
]
}
],
"source": [
"import os, time, sys\n",
"import csv\n",
"import math, random\n",
"\n",
"import numpy as np\n",
"\n",
"from skimage.morphology import disk\n",
"from skimage.filters import rank\n",
"sys.path.append('..')\n",
"import fidle.pwk as ooo\n",
"\n",
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.2 - Where are we ?"
]
},
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Well, we should be at GRICAD !\n",
"We are going to use: /bettik/PROJECTS/pr-fidle/datasets/GTSRB\n"
]
}
],
"source": [
"place, dataset_dir = ooo.good_place( { 'GRICAD' : f'{os.getenv(\"SCRATCH_DIR\",\"\")}/PROJECTS/pr-fidle/datasets/GTSRB',\n",
" 'IDRIS' : f'{os.getenv(\"WORK\",\"\")}/datasets/GTSRB',\n",
" 'HOME' : f'{os.getenv(\"HOME\",\"\")}/datasets/GTSRB'} )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Description is available there : http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset\n",
" - Each directory contains one CSV file with annotations (\"GT-<ClassID>.csv\") and the training images\n",
" - First line is fieldnames: Filename;Width;Height;Roi.X1;Roi.Y1;Roi.X2;Roi.Y2;ClassId \n",
" \n",
"### 2.1 - Understanding the dataset\n",
"The original dataset is in : **\\<dataset_dir\\>/origine.** \n",
"There is 3 subsets : **Train**, **Test** and **Meta**. \n",
"Each subset have an **csv file** and a **subdir**.\n",
" "
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Width</th>\n",
Loading
Loading full blame...