Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Fidle
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package Registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Guillaume Gautier
Fidle
Commits
e968b335
Commit
e968b335
authored
4 years ago
by
Jean-Luc Parouty
Browse files
Options
Downloads
Patches
Plain Diff
VAE / IDRIS Validation
parent
e0c5d593
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
3
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
VAE/06-Prepare-CelebA-datasets.ipynb
+33
-33
33 additions, 33 deletions
VAE/06-Prepare-CelebA-datasets.ipynb
VAE/08.1-VAE-with-CelebA.ipynb
+59
-182
59 additions, 182 deletions
VAE/08.1-VAE-with-CelebA.ipynb
fidle/log/finished.json
+8
-8
8 additions, 8 deletions
fidle/log/finished.json
with
100 additions
and
223 deletions
VAE/06-Prepare-CelebA-datasets.ipynb
+
33
−
33
View file @
e968b335
...
...
@@ -114,10 +114,10 @@
"text": [
"Version : 0.6.1 DEV\n",
"Notebook id : VAE6\n",
"Run time :
Satur
day
2
January 2021,
17:04
:5
8
\n",
"TensorFlow version : 2.
2
.0\n",
"Keras version : 2.
3
.0
-tf
\n",
"Datasets dir : /
home/pjluc
/datasets
/fidle
\n",
"Run time :
Mon
day
4
January 2021,
21:17
:5
1
\n",
"TensorFlow version : 2.
4
.0\n",
"Keras version : 2.
4
.0\n",
"Datasets dir : /
gpfswork/rech/mlh/uja62cb
/datasets\n",
"Run dir : ./run\n",
"CI running mode : none\n",
"Update keras cache : False\n",
...
...
@@ -337,12 +337,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Scale is :
0.1
\n",
"Scale is :
1.0
\n",
"Image size is : (128, 128)\n",
"dataset length is : 20259\n",
"cluster size is : 1000\n",
"dataset length is : 20259
9
\n",
"cluster size is : 1000
0
\n",
"clusters nb is : 21\n",
"cluster dir is :
./data
/clusters-128x128\n"
"cluster dir is :
/gpfswork/rech/mlh/uja62cb/datasets/celeba/enhanced
/clusters-128x128\n"
]
},
{
...
...
@@ -361,27 +361,27 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Cluster 000 : [########################################] 100.0% of 1000\n",
"Cluster 001 : [########################################] 100.0% of 1000\n",
"Cluster 002 : [########################################] 100.0% of 1000\n",
"Cluster 003 : [########################################] 100.0% of 1000\n",
"Cluster 004 : [########################################] 100.0% of 1000\n",
"Cluster 005 : [########################################] 100.0% of 1000\n",
"Cluster 006 : [########################################] 100.0% of 1000\n",
"Cluster 007 : [########################################] 100.0% of 1000\n",
"Cluster 008 : [########################################] 100.0% of 1000\n",
"Cluster 009 : [########################################] 100.0% of 1000\n",
"Cluster 010 : [########################################] 100.0% of 1000\n",
"Cluster 011 : [########################################] 100.0% of 1000\n",
"Cluster 012 : [########################################] 100.0% of 1000\n",
"Cluster 013 : [########################################] 100.0% of 1000\n",
"Cluster 014 : [########################################] 100.0% of 1000\n",
"Cluster 015 : [########################################] 100.0% of 1000\n",
"Cluster 016 : [########################################] 100.0% of 1000\n",
"Cluster 017 : [########################################] 100.0% of 1000\n",
"Cluster 018 : [########################################] 100.0% of 1000\n",
"Cluster 019 : [########################################] 100.0% of 1000\n",
"Cluster 020 : [##########------------------------------] 25.0% of 1000\r"
"Cluster 000 : [########################################] 100.0% of 1000
0
\n",
"Cluster 001 : [########################################] 100.0% of 1000
0
\n",
"Cluster 002 : [########################################] 100.0% of 1000
0
\n",
"Cluster 003 : [########################################] 100.0% of 1000
0
\n",
"Cluster 004 : [########################################] 100.0% of 1000
0
\n",
"Cluster 005 : [########################################] 100.0% of 1000
0
\n",
"Cluster 006 : [########################################] 100.0% of 1000
0
\n",
"Cluster 007 : [########################################] 100.0% of 1000
0
\n",
"Cluster 008 : [########################################] 100.0% of 1000
0
\n",
"Cluster 009 : [########################################] 100.0% of 1000
0
\n",
"Cluster 010 : [########################################] 100.0% of 1000
0
\n",
"Cluster 011 : [########################################] 100.0% of 1000
0
\n",
"Cluster 012 : [########################################] 100.0% of 1000
0
\n",
"Cluster 013 : [########################################] 100.0% of 1000
0
\n",
"Cluster 014 : [########################################] 100.0% of 1000
0
\n",
"Cluster 015 : [########################################] 100.0% of 1000
0
\n",
"Cluster 016 : [########################################] 100.0% of 1000
0
\n",
"Cluster 017 : [########################################] 100.0% of 1000
0
\n",
"Cluster 018 : [########################################] 100.0% of 1000
0
\n",
"Cluster 019 : [########################################] 100.0% of 1000
0
\n",
"Cluster 020 : [##########------------------------------] 25.0% of 1000
0
\r"
]
},
{
...
...
@@ -400,8 +400,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Duration : 0:
0
1:
59
\n",
"Size : 7
.
4 Go\n"
"Duration : 0:
5
1:
04
\n",
"Size : 74
.2
Go\n"
]
}
],
...
...
@@ -444,8 +444,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"End time is :
Satur
day
2
January 2021,
17
:0
6
:5
9
\n",
"Duration is : 00:
02:00 412
ms\n",
"End time is :
Mon
day
4
January 2021,
22
:0
8
:5
7
\n",
"Duration is : 00:
51:06 647
ms\n",
"This notebook ends here\n"
]
}
...
...
%% Cell type:markdown id: tags:
<img
width=
"800px"
src=
"../fidle/img/00-Fidle-header-01.svg"
></img>
# <!-- TITLE --> [VAE6] - Preparation of the CelebA dataset
<!-- DESC -->
Preparation of a clustered dataset, batchable
<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->
## Objectives :
-
Formatting our dataset in
**cluster files**
, using batch mode
-
Adapting a notebook for batch use
The
[
CelebFaces Attributes Dataset (CelebA)
](
http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
)
contains about 200,000 images (202599,218,178,3).
## What we're going to do :
-
Lire les images
-
redimensionner et normaliser celles-ci,
-
Constituer des clusters d'images en format npy
%% Cell type:markdown id: tags:
## Step 1 - Import and init
### 1.2 - Import
%% Cell type:code id: tags:
```
python
import
numpy
as
np
import
matplotlib.pyplot
as
plt
import
pandas
as
pd
from
skimage
import
io
,
transform
import
os
,
pathlib
,
time
,
sys
,
json
,
glob
import
csv
import
math
,
random
from
importlib
import
reload
sys
.
path
.
append
(
'
..
'
)
import
fidle.pwk
as
pwk
datasets_dir
=
pwk
.
init
(
'
VAE6
'
)
```
%% Output
**FIDLE 2020 - Practical Work Module**
Version : 0.6.1 DEV
Notebook id : VAE6
Run time :
Satur
day
2
January 2021,
17:04
:5
8
TensorFlow version : 2.
2
.0
Keras version : 2.
3
.0
-tf
Datasets dir : /
home/pjluc
/datasets
/fidle
Run time :
Mon
day
4
January 2021,
21:17
:5
1
TensorFlow version : 2.
4
.0
Keras version : 2.
4
.0
Datasets dir : /
gpfswork/rech/mlh/uja62cb
/datasets
Run dir : ./run
CI running mode : none
Update keras cache : False
Save figs : True
Path figs : ./run/figs
%% Cell type:markdown id: tags:
### 1.2 - Directories and files :
%% Cell type:code id: tags:
```
python
dataset_csv
=
f
'
{
datasets_dir
}
/celeba/origine/list_attr_celeba.csv
'
dataset_img
=
f
'
{
datasets_dir
}
/celeba/origine/img_align_celeba
'
```
%% Cell type:markdown id: tags:
## Step 2 - Read and shuffle filenames catalog
%% Cell type:code id: tags:
```
python
dataset_desc
=
pd
.
read_csv
(
dataset_csv
,
header
=
0
)
dataset_desc
=
dataset_desc
.
reindex
(
np
.
random
.
permutation
(
dataset_desc
.
index
))
```
%% Cell type:markdown id: tags:
## Step 3 - Save as clusters of n images
%% Cell type:markdown id: tags:
### 4.2 - Cooking function
%% Cell type:code id: tags:
```
python
def
read_and_save
(
dataset_img
,
dataset_desc
,
scale
=
1
,
cluster_size
=
1000
,
cluster_dir
=
'
./dataset_cluster
'
,
cluster_name
=
'
images
'
,
image_size
=
(
128
,
128
)):
global
pwk
def
save_cluster
(
imgs
,
desc
,
cols
,
id
):
file_img
=
f
'
{
cluster_dir
}
/
{
cluster_name
}
-
{
id
:
03
d
}
.npy
'
file_desc
=
f
'
{
cluster_dir
}
/
{
cluster_name
}
-
{
id
:
03
d
}
.csv
'
np
.
save
(
file_img
,
np
.
array
(
imgs
))
df
=
pd
.
DataFrame
(
data
=
desc
,
columns
=
cols
)
df
.
to_csv
(
file_desc
,
index
=
False
)
return
[],[],
id
+
1
pwk
.
chrono_start
()
cols
=
list
(
dataset_desc
.
columns
)
# ---- Check if cluster files exist
#
if
os
.
path
.
isfile
(
f
'
{
cluster_dir
}
/images-000.npy
'
):
print
(
'
\n
*** Oups. There are already clusters in the target folder!
\n
'
)
return
0
,
0
pwk
.
mkdir
(
cluster_dir
)
# ---- Scale
#
n
=
int
(
len
(
dataset_desc
)
*
scale
)
dataset
=
dataset_desc
[:
n
]
cluster_size
=
int
(
cluster_size
*
scale
)
pwk
.
subtitle
(
'
Parameters :
'
)
print
(
f
'
Scale is :
{
scale
}
'
)
print
(
f
'
Image size is :
{
image_size
}
'
)
print
(
f
'
dataset length is :
{
n
}
'
)
print
(
f
'
cluster size is :
{
cluster_size
}
'
)
print
(
f
'
clusters nb is :
'
,
int
(
n
/
cluster_size
+
1
))
print
(
f
'
cluster dir is :
{
cluster_dir
}
'
)
# ---- Read and save clusters
#
pwk
.
subtitle
(
'
Running...
'
)
imgs
,
desc
,
cluster_id
=
[],[],
0
#
for
i
,
row
in
dataset
.
iterrows
():
#
filename
=
f
'
{
dataset_img
}
/
{
row
.
image_id
}
'
#
# ---- Read image, resize (and normalize)
#
img
=
io
.
imread
(
filename
)
img
=
transform
.
resize
(
img
,
image_size
)
#
# ---- Add image and description
#
imgs
.
append
(
img
)
desc
.
append
(
row
.
values
)
#
# ---- Progress bar
#
pwk
.
update_progress
(
f
'
Cluster
{
cluster_id
:
03
d
}
:
'
,
len
(
imgs
),
cluster_size
)
#
# ---- Save cluster if full
#
if
len
(
imgs
)
==
cluster_size
:
imgs
,
desc
,
cluster_id
=
save_cluster
(
imgs
,
desc
,
cols
,
cluster_id
)
# ---- Save uncomplete cluster
if
len
(
imgs
)
>
0
:
imgs
,
desc
,
cluster_id
=
save_cluster
(
imgs
,
desc
,
cols
,
cluster_id
)
duration
=
pwk
.
chrono_stop
()
return
cluster_id
,
duration
```
%% Cell type:markdown id: tags:
### 4.3 - Cluster building
All the dataset will be use for training
Reading the 200,000 images can take a long time
**(>20 minutes)**
and a lot of place
**(>170 GB)**
Example :
Image Sizes: 128x128 : 74 GB
Image Sizes: 192x160 : 138 GB
You can define theses parameters :
`scale`
: 1 mean 100% of the dataset - set 0.05 for tests
`image_size`
: images size in the clusters, should be 128x128 or 192,160 (original is 218,178)
`output_dir`
: where to write clusters, could be :
-
`./data`
, for tests purpose
-
`<datasets_dir>/celeba/enhanced`
to add clusters in your datasets dir.
`cluster_size`
: number of images in a cluster, 10000 is fine. (will be adjust by scale)
**Note :**
If the target folder is not empty, the construction is blocked.
%% Cell type:code id: tags:
```
python
# ---- Parameters you can change -----------------------------------
# ---- Tests
scale
=
0.1
image_size
=
(
128
,
128
)
output_dir
=
'
./data
'
# ---- Full clusters generation, medium size
# scale = 1.
# image_size = (128,128)
# output_dir = f'{datasets_dir}/celeba/enhanced'
# ---- Full clusters generation, large size
# scale = 1.
# image_size = (192,160)
# output_dir = f'{datasets_dir}/celeba/enhanced'
```
%% Cell type:code id: tags:
```
python
# ---- Used for continous integration - Just forget this 3 lines
#
scale
=
pwk
.
override
(
'
scale
'
,
scale
)
image_size
=
pwk
.
override
(
'
image_size
'
,
image_size
)
output_dir
=
pwk
.
override
(
'
output_dir
'
,
output_dir
)
# ---- Build clusters
#
cluster_size
=
10000
lx
,
ly
=
image_size
cluster_dir
=
f
'
{
output_dir
}
/clusters-
{
lx
}
x
{
ly
}
'
cluster_nb
,
duration
=
read_and_save
(
dataset_img
,
dataset_desc
,
scale
=
scale
,
cluster_size
=
cluster_size
,
cluster_dir
=
cluster_dir
,
image_size
=
image_size
)
# ---- Conclusion...
directory
=
pathlib
.
Path
(
cluster_dir
)
s
=
sum
(
f
.
stat
().
st_size
for
f
in
directory
.
glob
(
'
**/*
'
)
if
f
.
is_file
())
pwk
.
subtitle
(
'
Conclusion :
'
)
print
(
'
Duration :
'
,
pwk
.
hdelay
(
duration
))
print
(
'
Size :
'
,
pwk
.
hsize
(
s
))
```
%% Output
<br>**Parameters :**
Scale is :
0.1
Scale is :
1.0
Image size is : (128, 128)
dataset length is : 20259
cluster size is : 1000
dataset length is : 20259
9
cluster size is : 1000
0
clusters nb is : 21
cluster dir is :
./data
/clusters-128x128
cluster dir is :
/gpfswork/rech/mlh/uja62cb/datasets/celeba/enhanced
/clusters-128x128
<br>**Running...**
Cluster 000 : [########################################] 100.0% of 1000
Cluster 001 : [########################################] 100.0% of 1000
Cluster 002 : [########################################] 100.0% of 1000
Cluster 003 : [########################################] 100.0% of 1000
Cluster 004 : [########################################] 100.0% of 1000
Cluster 005 : [########################################] 100.0% of 1000
Cluster 006 : [########################################] 100.0% of 1000
Cluster 007 : [########################################] 100.0% of 1000
Cluster 008 : [########################################] 100.0% of 1000
Cluster 009 : [########################################] 100.0% of 1000
Cluster 010 : [########################################] 100.0% of 1000
Cluster 011 : [########################################] 100.0% of 1000
Cluster 012 : [########################################] 100.0% of 1000
Cluster 013 : [########################################] 100.0% of 1000
Cluster 014 : [########################################] 100.0% of 1000
Cluster 015 : [########################################] 100.0% of 1000
Cluster 016 : [########################################] 100.0% of 1000
Cluster 017 : [########################################] 100.0% of 1000
Cluster 018 : [########################################] 100.0% of 1000
Cluster 019 : [########################################] 100.0% of 1000
Cluster 020 : [##########------------------------------] 25.0% of 1000
Cluster 000 : [########################################] 100.0% of 1000
0
Cluster 001 : [########################################] 100.0% of 1000
0
Cluster 002 : [########################################] 100.0% of 1000
0
Cluster 003 : [########################################] 100.0% of 1000
0
Cluster 004 : [########################################] 100.0% of 1000
0
Cluster 005 : [########################################] 100.0% of 1000
0
Cluster 006 : [########################################] 100.0% of 1000
0
Cluster 007 : [########################################] 100.0% of 1000
0
Cluster 008 : [########################################] 100.0% of 1000
0
Cluster 009 : [########################################] 100.0% of 1000
0
Cluster 010 : [########################################] 100.0% of 1000
0
Cluster 011 : [########################################] 100.0% of 1000
0
Cluster 012 : [########################################] 100.0% of 1000
0
Cluster 013 : [########################################] 100.0% of 1000
0
Cluster 014 : [########################################] 100.0% of 1000
0
Cluster 015 : [########################################] 100.0% of 1000
0
Cluster 016 : [########################################] 100.0% of 1000
0
Cluster 017 : [########################################] 100.0% of 1000
0
Cluster 018 : [########################################] 100.0% of 1000
0
Cluster 019 : [########################################] 100.0% of 1000
0
Cluster 020 : [##########------------------------------] 25.0% of 1000
0
<br>**Conclusion :**
Duration : 0:
0
1:
59
Size : 7
.
4 Go
Duration : 0:
5
1:
04
Size : 74
.2
Go
%% Cell type:code id: tags:
```
python
pwk
.
end
()
```
%% Output
End time is :
Satur
day
2
January 2021,
17
:0
6
:5
9
Duration is : 00:
02:00 412
ms
End time is :
Mon
day
4
January 2021,
22
:0
8
:5
7
Duration is : 00:
51:06 647
ms
This notebook ends here
%% Cell type:markdown id: tags:
---
<img
width=
"80px"
src=
"../fidle/img/00-Fidle-logo-01.svg"
></img>
...
...
This diff is collapsed.
Click to expand it.
VAE/08.1-VAE-with-CelebA.ipynb
+
59
−
182
View file @
e968b335
This diff is collapsed.
Click to expand it.
fidle/log/finished.json
+
8
−
8
View file @
e968b335
...
...
@@ -126,10 +126,10 @@
"duration"
:
"00:00:10 061ms"
},
"VAE6"
:
{
"path"
:
"/
home/pjluc/dev
/fidle/VAE"
,
"start"
:
"
Satur
day
2
January 2021,
17:04
:5
8
"
,
"end"
:
"
Satur
day
2
January 2021,
17
:0
6
:5
9
"
,
"duration"
:
"00:
02:00 412
ms"
"path"
:
"/
gpfsdswork/projects/rech/mlh/uja62cb
/fidle/VAE"
,
"start"
:
"
Mon
day
4
January 2021,
21:17
:5
1
"
,
"end"
:
"
Mon
day
4
January 2021,
22
:0
8
:5
7
"
,
"duration"
:
"00:
51:06 647
ms"
},
"GTS1"
:
{
"path"
:
"/home/pjluc/dev/fidle/GTSRB"
,
...
...
@@ -150,9 +150,9 @@
"duration"
:
"00:00:08 736ms"
},
"VAE8"
:
{
"path"
:
"/
home/pjluc/dev
/fidle/VAE"
,
"start"
:
"Monday 4 January 2021,
18:43:15
"
,
"end"
:
"
Monday 4 January 2021, 18:56:01
"
,
"duration"
:
"
00:12:46 153ms
"
"path"
:
"/
gpfsdswork/projects/rech/mlh/uja62cb
/fidle/VAE"
,
"start"
:
"Monday 4 January 2021,
22:27:20
"
,
"end"
:
""
,
"duration"
:
"
Unfinished...
"
}
}
\ No newline at end of file
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment