Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Fidle
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package Registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Guillaume Gautier
Fidle
Commits
c322a913
Commit
c322a913
authored
4 years ago
by
Jean-Luc Parouty
Browse files
Options
Downloads
Patches
Plain Diff
VAE Test
parent
e968b335
No related branches found
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
VAE/06-Prepare-CelebA-datasets.ipynb
+34
-34
34 additions, 34 deletions
VAE/06-Prepare-CelebA-datasets.ipynb
fidle/log/finished.json
+6
-6
6 additions, 6 deletions
fidle/log/finished.json
with
40 additions
and
40 deletions
VAE/06-Prepare-CelebA-datasets.ipynb
+
34
−
34
View file @
c322a913
...
...
@@ -114,10 +114,10 @@
"text": [
"Version : 0.6.1 DEV\n",
"Notebook id : VAE6\n",
"Run time : Monday 4 January 2021, 2
1:17:51
\n",
"TensorFlow version : 2.
4
.0\n",
"Keras version : 2.
4
.0\n",
"Datasets dir : /
gpfswork/rech/mlh/uja62cb
/datasets\n",
"Run time : Monday 4 January 2021, 2
3:45:46
\n",
"TensorFlow version : 2.
2
.0\n",
"Keras version : 2.
3
.0
-tf
\n",
"Datasets dir : /
home/pjluc
/datasets
/fidle
\n",
"Run dir : ./run\n",
"CI running mode : none\n",
"Update keras cache : False\n",
...
...
@@ -301,7 +301,7 @@
"# ---- Parameters you can change -----------------------------------\n",
"\n",
"# ---- Tests\n",
"scale = 0.
1
\n",
"scale = 0.
2
\n",
"image_size = (128,128)\n",
"output_dir = './data'\n",
"\n",
...
...
@@ -337,12 +337,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Scale is :
1.0
\n",
"Scale is :
0.2
\n",
"Image size is : (128, 128)\n",
"dataset length is :
20259
9\n",
"cluster size is :
10
000\n",
"dataset length is :
4051
9\n",
"cluster size is :
2
000\n",
"clusters nb is : 21\n",
"cluster dir is :
/gpfswork/rech/mlh/uja62cb/datasets/celeba/enhanced
/clusters-128x128\n"
"cluster dir is :
./data
/clusters-128x128\n"
]
},
{
...
...
@@ -361,27 +361,27 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Cluster 000 : [########################################] 100.0% of
10
000\n",
"Cluster 001 : [########################################] 100.0% of
10
000\n",
"Cluster 002 : [########################################] 100.0% of
10
000\n",
"Cluster 003 : [########################################] 100.0% of
10
000\n",
"Cluster 004 : [########################################] 100.0% of
10
000\n",
"Cluster 005 : [########################################] 100.0% of
10
000\n",
"Cluster 006 : [########################################] 100.0% of
10
000\n",
"Cluster 007 : [########################################] 100.0% of
10
000\n",
"Cluster 008 : [########################################] 100.0% of
10
000\n",
"Cluster 009 : [########################################] 100.0% of
10
000\n",
"Cluster 010 : [########################################] 100.0% of
10
000\n",
"Cluster 011 : [########################################] 100.0% of
10
000\n",
"Cluster 012 : [########################################] 100.0% of
10
000\n",
"Cluster 013 : [########################################] 100.0% of
10
000\n",
"Cluster 014 : [########################################] 100.0% of
10
000\n",
"Cluster 015 : [########################################] 100.0% of
10
000\n",
"Cluster 016 : [########################################] 100.0% of
10
000\n",
"Cluster 017 : [########################################] 100.0% of
10
000\n",
"Cluster 018 : [########################################] 100.0% of
10
000\n",
"Cluster 019 : [########################################] 100.0% of
10
000\n",
"Cluster 020 : [##########------------------------------] 25.0% of
10
000\r"
"Cluster 000 : [########################################] 100.0% of
2
000\n",
"Cluster 001 : [########################################] 100.0% of
2
000\n",
"Cluster 002 : [########################################] 100.0% of
2
000\n",
"Cluster 003 : [########################################] 100.0% of
2
000\n",
"Cluster 004 : [########################################] 100.0% of
2
000\n",
"Cluster 005 : [########################################] 100.0% of
2
000\n",
"Cluster 006 : [########################################] 100.0% of
2
000\n",
"Cluster 007 : [########################################] 100.0% of
2
000\n",
"Cluster 008 : [########################################] 100.0% of
2
000\n",
"Cluster 009 : [########################################] 100.0% of
2
000\n",
"Cluster 010 : [########################################] 100.0% of
2
000\n",
"Cluster 011 : [########################################] 100.0% of
2
000\n",
"Cluster 012 : [########################################] 100.0% of
2
000\n",
"Cluster 013 : [########################################] 100.0% of
2
000\n",
"Cluster 014 : [########################################] 100.0% of
2
000\n",
"Cluster 015 : [########################################] 100.0% of
2
000\n",
"Cluster 016 : [########################################] 100.0% of
2
000\n",
"Cluster 017 : [########################################] 100.0% of
2
000\n",
"Cluster 018 : [########################################] 100.0% of
2
000\n",
"Cluster 019 : [########################################] 100.0% of
2
000\n",
"Cluster 020 : [##########------------------------------] 25.0% of
2
000\r"
]
},
{
...
...
@@ -400,8 +400,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Duration : 0:5
1
:0
4
\n",
"Size :
7
4.
2
Go\n"
"Duration : 0:
0
5:0
6
\n",
"Size :
1
4.
8
Go\n"
]
}
],
...
...
@@ -444,8 +444,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"End time is : Monday 4 January 2021, 2
2:08
:5
7
\n",
"Duration is : 00:
51:06 647
ms\n",
"End time is : Monday 4 January 2021, 2
3:50
:5
4
\n",
"Duration is : 00:
05:08 166
ms\n",
"This notebook ends here\n"
]
}
...
...
%% Cell type:markdown id: tags:
<img
width=
"800px"
src=
"../fidle/img/00-Fidle-header-01.svg"
></img>
# <!-- TITLE --> [VAE6] - Preparation of the CelebA dataset
<!-- DESC -->
Preparation of a clustered dataset, batchable
<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->
## Objectives :
-
Formatting our dataset in
**cluster files**
, using batch mode
-
Adapting a notebook for batch use
The
[
CelebFaces Attributes Dataset (CelebA)
](
http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
)
contains about 200,000 images (202599,218,178,3).
## What we're going to do :
-
Lire les images
-
redimensionner et normaliser celles-ci,
-
Constituer des clusters d'images en format npy
%% Cell type:markdown id: tags:
## Step 1 - Import and init
### 1.2 - Import
%% Cell type:code id: tags:
```
python
import
numpy
as
np
import
matplotlib.pyplot
as
plt
import
pandas
as
pd
from
skimage
import
io
,
transform
import
os
,
pathlib
,
time
,
sys
,
json
,
glob
import
csv
import
math
,
random
from
importlib
import
reload
sys
.
path
.
append
(
'
..
'
)
import
fidle.pwk
as
pwk
datasets_dir
=
pwk
.
init
(
'
VAE6
'
)
```
%% Output
**FIDLE 2020 - Practical Work Module**
Version : 0.6.1 DEV
Notebook id : VAE6
Run time : Monday 4 January 2021, 2
1:17:51
TensorFlow version : 2.
4
.0
Keras version : 2.
4
.0
Datasets dir : /
gpfswork/rech/mlh/uja62cb
/datasets
Run time : Monday 4 January 2021, 2
3:45:46
TensorFlow version : 2.
2
.0
Keras version : 2.
3
.0
-tf
Datasets dir : /
home/pjluc
/datasets
/fidle
Run dir : ./run
CI running mode : none
Update keras cache : False
Save figs : True
Path figs : ./run/figs
%% Cell type:markdown id: tags:
### 1.2 - Directories and files :
%% Cell type:code id: tags:
```
python
dataset_csv
=
f
'
{
datasets_dir
}
/celeba/origine/list_attr_celeba.csv
'
dataset_img
=
f
'
{
datasets_dir
}
/celeba/origine/img_align_celeba
'
```
%% Cell type:markdown id: tags:
## Step 2 - Read and shuffle filenames catalog
%% Cell type:code id: tags:
```
python
dataset_desc
=
pd
.
read_csv
(
dataset_csv
,
header
=
0
)
dataset_desc
=
dataset_desc
.
reindex
(
np
.
random
.
permutation
(
dataset_desc
.
index
))
```
%% Cell type:markdown id: tags:
## Step 3 - Save as clusters of n images
%% Cell type:markdown id: tags:
### 4.2 - Cooking function
%% Cell type:code id: tags:
```
python
def
read_and_save
(
dataset_img
,
dataset_desc
,
scale
=
1
,
cluster_size
=
1000
,
cluster_dir
=
'
./dataset_cluster
'
,
cluster_name
=
'
images
'
,
image_size
=
(
128
,
128
)):
global
pwk
def
save_cluster
(
imgs
,
desc
,
cols
,
id
):
file_img
=
f
'
{
cluster_dir
}
/
{
cluster_name
}
-
{
id
:
03
d
}
.npy
'
file_desc
=
f
'
{
cluster_dir
}
/
{
cluster_name
}
-
{
id
:
03
d
}
.csv
'
np
.
save
(
file_img
,
np
.
array
(
imgs
))
df
=
pd
.
DataFrame
(
data
=
desc
,
columns
=
cols
)
df
.
to_csv
(
file_desc
,
index
=
False
)
return
[],[],
id
+
1
pwk
.
chrono_start
()
cols
=
list
(
dataset_desc
.
columns
)
# ---- Check if cluster files exist
#
if
os
.
path
.
isfile
(
f
'
{
cluster_dir
}
/images-000.npy
'
):
print
(
'
\n
*** Oups. There are already clusters in the target folder!
\n
'
)
return
0
,
0
pwk
.
mkdir
(
cluster_dir
)
# ---- Scale
#
n
=
int
(
len
(
dataset_desc
)
*
scale
)
dataset
=
dataset_desc
[:
n
]
cluster_size
=
int
(
cluster_size
*
scale
)
pwk
.
subtitle
(
'
Parameters :
'
)
print
(
f
'
Scale is :
{
scale
}
'
)
print
(
f
'
Image size is :
{
image_size
}
'
)
print
(
f
'
dataset length is :
{
n
}
'
)
print
(
f
'
cluster size is :
{
cluster_size
}
'
)
print
(
f
'
clusters nb is :
'
,
int
(
n
/
cluster_size
+
1
))
print
(
f
'
cluster dir is :
{
cluster_dir
}
'
)
# ---- Read and save clusters
#
pwk
.
subtitle
(
'
Running...
'
)
imgs
,
desc
,
cluster_id
=
[],[],
0
#
for
i
,
row
in
dataset
.
iterrows
():
#
filename
=
f
'
{
dataset_img
}
/
{
row
.
image_id
}
'
#
# ---- Read image, resize (and normalize)
#
img
=
io
.
imread
(
filename
)
img
=
transform
.
resize
(
img
,
image_size
)
#
# ---- Add image and description
#
imgs
.
append
(
img
)
desc
.
append
(
row
.
values
)
#
# ---- Progress bar
#
pwk
.
update_progress
(
f
'
Cluster
{
cluster_id
:
03
d
}
:
'
,
len
(
imgs
),
cluster_size
)
#
# ---- Save cluster if full
#
if
len
(
imgs
)
==
cluster_size
:
imgs
,
desc
,
cluster_id
=
save_cluster
(
imgs
,
desc
,
cols
,
cluster_id
)
# ---- Save uncomplete cluster
if
len
(
imgs
)
>
0
:
imgs
,
desc
,
cluster_id
=
save_cluster
(
imgs
,
desc
,
cols
,
cluster_id
)
duration
=
pwk
.
chrono_stop
()
return
cluster_id
,
duration
```
%% Cell type:markdown id: tags:
### 4.3 - Cluster building
All the dataset will be use for training
Reading the 200,000 images can take a long time
**(>20 minutes)**
and a lot of place
**(>170 GB)**
Example :
Image Sizes: 128x128 : 74 GB
Image Sizes: 192x160 : 138 GB
You can define theses parameters :
`scale`
: 1 mean 100% of the dataset - set 0.05 for tests
`image_size`
: images size in the clusters, should be 128x128 or 192,160 (original is 218,178)
`output_dir`
: where to write clusters, could be :
-
`./data`
, for tests purpose
-
`<datasets_dir>/celeba/enhanced`
to add clusters in your datasets dir.
`cluster_size`
: number of images in a cluster, 10000 is fine. (will be adjust by scale)
**Note :**
If the target folder is not empty, the construction is blocked.
%% Cell type:code id: tags:
```
python
# ---- Parameters you can change -----------------------------------
# ---- Tests
scale
=
0.
1
scale
=
0.
2
image_size
=
(
128
,
128
)
output_dir
=
'
./data
'
# ---- Full clusters generation, medium size
# scale = 1.
# image_size = (128,128)
# output_dir = f'{datasets_dir}/celeba/enhanced'
# ---- Full clusters generation, large size
# scale = 1.
# image_size = (192,160)
# output_dir = f'{datasets_dir}/celeba/enhanced'
```
%% Cell type:code id: tags:
```
python
# ---- Used for continous integration - Just forget this 3 lines
#
scale
=
pwk
.
override
(
'
scale
'
,
scale
)
image_size
=
pwk
.
override
(
'
image_size
'
,
image_size
)
output_dir
=
pwk
.
override
(
'
output_dir
'
,
output_dir
)
# ---- Build clusters
#
cluster_size
=
10000
lx
,
ly
=
image_size
cluster_dir
=
f
'
{
output_dir
}
/clusters-
{
lx
}
x
{
ly
}
'
cluster_nb
,
duration
=
read_and_save
(
dataset_img
,
dataset_desc
,
scale
=
scale
,
cluster_size
=
cluster_size
,
cluster_dir
=
cluster_dir
,
image_size
=
image_size
)
# ---- Conclusion...
directory
=
pathlib
.
Path
(
cluster_dir
)
s
=
sum
(
f
.
stat
().
st_size
for
f
in
directory
.
glob
(
'
**/*
'
)
if
f
.
is_file
())
pwk
.
subtitle
(
'
Conclusion :
'
)
print
(
'
Duration :
'
,
pwk
.
hdelay
(
duration
))
print
(
'
Size :
'
,
pwk
.
hsize
(
s
))
```
%% Output
<br>**Parameters :**
Scale is :
1.0
Scale is :
0.2
Image size is : (128, 128)
dataset length is :
20259
9
cluster size is :
10
000
dataset length is :
4051
9
cluster size is :
2
000
clusters nb is : 21
cluster dir is :
/gpfswork/rech/mlh/uja62cb/datasets/celeba/enhanced
/clusters-128x128
cluster dir is :
./data
/clusters-128x128
<br>**Running...**
Cluster 000 : [########################################] 100.0% of
10
000
Cluster 001 : [########################################] 100.0% of
10
000
Cluster 002 : [########################################] 100.0% of
10
000
Cluster 003 : [########################################] 100.0% of
10
000
Cluster 004 : [########################################] 100.0% of
10
000
Cluster 005 : [########################################] 100.0% of
10
000
Cluster 006 : [########################################] 100.0% of
10
000
Cluster 007 : [########################################] 100.0% of
10
000
Cluster 008 : [########################################] 100.0% of
10
000
Cluster 009 : [########################################] 100.0% of
10
000
Cluster 010 : [########################################] 100.0% of
10
000
Cluster 011 : [########################################] 100.0% of
10
000
Cluster 012 : [########################################] 100.0% of
10
000
Cluster 013 : [########################################] 100.0% of
10
000
Cluster 014 : [########################################] 100.0% of
10
000
Cluster 015 : [########################################] 100.0% of
10
000
Cluster 016 : [########################################] 100.0% of
10
000
Cluster 017 : [########################################] 100.0% of
10
000
Cluster 018 : [########################################] 100.0% of
10
000
Cluster 019 : [########################################] 100.0% of
10
000
Cluster 020 : [##########------------------------------] 25.0% of
10
000
Cluster 000 : [########################################] 100.0% of
2
000
Cluster 001 : [########################################] 100.0% of
2
000
Cluster 002 : [########################################] 100.0% of
2
000
Cluster 003 : [########################################] 100.0% of
2
000
Cluster 004 : [########################################] 100.0% of
2
000
Cluster 005 : [########################################] 100.0% of
2
000
Cluster 006 : [########################################] 100.0% of
2
000
Cluster 007 : [########################################] 100.0% of
2
000
Cluster 008 : [########################################] 100.0% of
2
000
Cluster 009 : [########################################] 100.0% of
2
000
Cluster 010 : [########################################] 100.0% of
2
000
Cluster 011 : [########################################] 100.0% of
2
000
Cluster 012 : [########################################] 100.0% of
2
000
Cluster 013 : [########################################] 100.0% of
2
000
Cluster 014 : [########################################] 100.0% of
2
000
Cluster 015 : [########################################] 100.0% of
2
000
Cluster 016 : [########################################] 100.0% of
2
000
Cluster 017 : [########################################] 100.0% of
2
000
Cluster 018 : [########################################] 100.0% of
2
000
Cluster 019 : [########################################] 100.0% of
2
000
Cluster 020 : [##########------------------------------] 25.0% of
2
000
<br>**Conclusion :**
Duration : 0:5
1
:0
4
Size :
7
4.
2
Go
Duration : 0:
0
5:0
6
Size :
1
4.
8
Go
%% Cell type:code id: tags:
```
python
pwk
.
end
()
```
%% Output
End time is : Monday 4 January 2021, 2
2:08
:5
7
Duration is : 00:
51:06 647
ms
End time is : Monday 4 January 2021, 2
3:50
:5
4
Duration is : 00:
05:08 166
ms
This notebook ends here
%% Cell type:markdown id: tags:
---
<img
width=
"80px"
src=
"../fidle/img/00-Fidle-logo-01.svg"
></img>
...
...
This diff is collapsed.
Click to expand it.
fidle/log/finished.json
+
6
−
6
View file @
c322a913
...
...
@@ -126,10 +126,10 @@
"duration"
:
"00:00:10 061ms"
},
"VAE6"
:
{
"path"
:
"/
gpfsdswork/projects/rech/mlh/uja62cb
/fidle/VAE"
,
"start"
:
"Monday 4 January 2021, 2
1:17:51
"
,
"end"
:
"Monday 4 January 2021, 2
2:08
:5
7
"
,
"duration"
:
"00:
51:06 647
ms"
"path"
:
"/
home/pjluc/dev
/fidle/VAE"
,
"start"
:
"Monday 4 January 2021, 2
3:45:46
"
,
"end"
:
"Monday 4 January 2021, 2
3:50
:5
4
"
,
"duration"
:
"00:
05:08 166
ms"
},
"GTS1"
:
{
"path"
:
"/home/pjluc/dev/fidle/GTSRB"
,
...
...
@@ -150,8 +150,8 @@
"duration"
:
"00:00:08 736ms"
},
"VAE8"
:
{
"path"
:
"/
gpfsdswork/projects/rech/mlh/uja62cb
/fidle/VAE"
,
"start"
:
"Monday 4 January 2021, 2
2:27:20
"
,
"path"
:
"/
home/pjluc/dev
/fidle/VAE"
,
"start"
:
"Monday 4 January 2021, 2
3:53:16
"
,
"end"
:
""
,
"duration"
:
"Unfinished..."
}
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment