Commit 70a06642 authored by Didier Voisin's avatar Didier Voisin
Browse files

Merge branch 'micka' into 'master'

TP3 windrose

See merge request !3
parents b3950175 3b6c17fc
%% Cell type:markdown id: tags:
# Introduction to data analysis - part 3 : weather and pollutants
# Introduction to data analysis - part 3: weather and pollutants
## 1. Today's objectives
- and what about mixing wind and pollutants ?
- and what about mixing wind and pollutants?
- windroses - pollutant roses...
most probably, time will run out before we get to the bottom of such a list...
%% Cell type:code id: tags:
``` python
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
# import seaborn as sns
```
%% Cell type:markdown id: tags:
### 2. to work !
### 2. to work!
Don't forget the [advanced Cheat-Sheet](../aide/Enthought-Python-Pandas-Cheat-Sheets-1-8-v1.0.2.pdf)...
first, we recover the data we just pretreated. Either from the file you created when you did the pre-treatment, or from `../data/data_airrhonalpes_part2.csv` that got pulled from the repository when you initiated the sequence
- **complete the following cell to upload the file properly into `dfc`... options are the key !**
- **complete the following cell to upload the file properly into `dfc`... options are the key!**
- **if you did things right, columns should be numeric data, and the index should be a `DatetimeIndex`, not an index of `dtype'object'`**
- **you will then select a subset of the data into `mon_df`, on which you'll work afterwards (ex: one pollutant in all the stations, or all the pollutants in one station, or whichever choice suits you...)**
......@@ -45,31 +43,31 @@
mon_df = ???
```
%% Cell type:markdown id: tags:
## 5. combinaison météo - qualité de l'air
## 3. Combination of weather and air quality
We have data from weather stations next to some air quality stations. These data are quite diverse in their origins. So the import step is quite painful. And I give it completely
![](../aide/carte-grenoble.png)
For those who have no former experience with importing data in pandas, it will provide a variety of examples
### 5.1. récupération des données météo
### 3.1. Get weather data
#### 5.1.1. The roof of IGE
#### 3.1.1. The roof of IGE
we have a quite complete weather station on our roof in Saint-Martin-d'Hères
%% Cell type:code id: tags:
``` python
#récupération des données du CERMO ( ~ station SMH)
mto_CERMO_name = "../data/MTO-CERMO_10min_2008-2016.csv"
mto_CERMO = pd.read_csv(mto_CERMO_name,sep=';',index_col=0,parse_dates=True, low_memory=False)
mto_CERMO = pd.read_csv(mto_CERMO_name,sep=';',index_col=0, low_memory=False, parse_dates=True, dayfirst=True)
print ('data loaded')
def convert(x):
try:
y = float(x)
except ValueError:
......@@ -87,43 +85,55 @@
%% Cell type:markdown id: tags:
this needs to be resampled on an hourly basis...
**faites le**
**do it!**
%% Cell type:code id: tags:
``` python
## your code here (use .resample(...) )
```
%% Cell type:code id: tags:
``` python
# Check the air temperature of one year to make sure everything is ok!
```
%% Cell type:markdown id: tags:
#### 5.1.2. some data from AirRhoneAlpes
#### 3.1.2. Some data from AirRhoneAlpes
these are for a big roadside site, south of grenbole. this is the site called Rocade in the air quality dataset, and is called Rondeau here ; it also has data from another station further south (Pont-de-Claix), not far from the one called Vif in the air quality dataset
These are for a big roadside site, south of Grenbole. This is the site called Rocade in the air quality dataset, and is called Rondeau here; it also has data from another station further south (Pont-de-Claix), not far from the one called Vif in the air quality dataset
these data are within an excel file !!!
sure, you could open it in excel, strip it from any useless decorations, save it as CSV...
or you open it directly from pandas !
If you get an error for reading the file, try to add the option: `engine='openpyxl'` (https://stackoverflow.com/questions/65254535/xlrd-biffh-xlrderror-excel-xlsx-file-not-supported)
%% Cell type:code id: tags:
``` python
#récupération des données air-rhone-alpes : Rocade et Pont-de-Claix (~ station Vif ?)
mto_airAURA_name = "../data/3592_Donnees_meteo_Grenoble.xlsx"
mto_airAURA = pd.read_excel(mto_airAURA_name,
sheet_name=0,
header=[23,24],
index_col=0,
parse_dates=True)
engine='openpyxl')
mto_airAURA.index = pd.to_datetime(mto_airAURA.index, dayfirst=True)
mto_airAURA = mto_airAURA.iloc[:, :6] # In case too much columns
print(mto_airAURA.info())
mto_airAURA.head()
```
%% Cell type:markdown id: tags:
......@@ -139,13 +149,20 @@
#notice that we have to specify level, as we have a multilevel index here
mto_airAURA.head()
```
%% Cell type:code id: tags:
``` python
# Check the air temperature of one year to make sure everything is ok!
```
%% Cell type:markdown id: tags:
#### 5.1.3. Météo France
#### 3.1.3. Météo France
Last... data from meteofrance, concerning one site north of Grenoble (Le Versoud for weather, Crolles for Air Quality) and another west of Grenoble (called StGeoirs for weather, and Bonnevaux for Air Quality)
the [description of the dataset is here](../data/data-dda-description.pdf).
......@@ -178,27 +195,28 @@
mto_StGeoirs.head()
```
%% Cell type:markdown id: tags:
### 5.2. windrose
### 3.2. windrose
a windrose is nothing but a histogram of wind direction and speeds, in polar coordinates.
there is a tiny library in python that deals with it in a resaonable manner: [windrose](https://github.com/python-windrose/windrose). I believe that matplotlib now has incorporated much of that library (I have students who plot windroses directly from matplotlib), but it is above my skill level...
there is a tiny library in python that deals with it in a reasonable manner: [windrose](https://github.com/python-windrose/windrose). I believe that matplotlib now has incorporated much of that library (I have students who plot windroses directly from matplotlib), but it is above my skill level...
comme nos données sont un peu en vrac, avec des noms de colonnes pas consistant, on va définir des variables pour manipuler tout ça simplement
as our data are a bit mixed, with inconsistent column names, we will define variables to manipulate all this simply
%% Cell type:code id: tags:
``` python
# sélection des données
#mto = mto_CERMO; wdir = "Dir Vent (Deg)" ; wsp = "V Vent (m/s)"; lieu = "CERMO"
#mto = mto_airAURA['Le Rondeau']; wdir = "Direction du Vent (Degrés)" ; wsp = "Vitesse du Vent (m/s)"; lieu = "Rondeau"
#mto = mto_airAURA['Pont de Claix']; wdir = "Direction du Vent (Degrés)" ; wsp = "Vitesse du Vent (m/s)"; lieu = "P. Claix"
mto = mto_airAURA['Rocade']; wdir = 'wdir'; wsp = 'wspeed (m/s)'; place = 'StGeoirs'
mto = mto_airAURA['Rocade']; place = 'Rocade'
wdir = 'Direction du Vent (Degrés)'; wsp = 'Vitesse du Vent (m/s)'
```
%% Cell type:markdown id: tags:
here is a function that plots "nice" windroses
......@@ -208,18 +226,20 @@
**then trace a few windroses to play with it, from different locations**
%% Cell type:code id: tags:
``` python
def jolierosedesvents(wdir,wsp,lieu,legend_xy=(0.8,0)):
def jolierosedesvents(wdir, wsp, lieu,legend_xy='best', save=None):
#faire la rose des vents
from windrose import WindroseAxes
mini,maxi=wsp.min(),wsp.max()
ax = WindroseAxes.from_ax()
ax.bar(wdir, wsp, normed=True, opening=0.8, edgecolor='white',
bins=np.logspace(np.floor(np.log10(mini))-1,np.floor(np.log10(maxi)), 10))
ax.bar(wdir, wsp, normed=True, opening=0.8, edgecolor='white',
# make logarithm bins in order to better see extreme winds
bins=np.logspace(np.floor(np.log10(0.1)),np.floor(np.log10(maxi)), 10)
)
# graduer correctement la rose des vents
import matplotlib.ticker as tkr
ax.set_yticks(np.arange(0,25,5))
for t in ax.get_yticklabels():
......@@ -227,38 +247,44 @@
ax.yaxis.set_major_formatter(tkr.FormatStrFormatter('%2.0f'))
#décorer la rose des vents
ax.set_legend()
ax.legend(title="vitesse (m/s)", loc=legend_xy)
plt.title('rose des vents '+lieu, y=1.08)
plt.title('rose des vents '+lieu, fontsize=30)
if save is not None: plt.savefig(save+'.jpg', dpi=300)
jolierosedesvents(mto[wdir],mto[wsp],lieu,(0,0))
jolierosedesvents(mto[wdir], mto[wsp], place,
# save='rosewind_Rocade'
)
```
%% Cell type:markdown id: tags:
### 5.3. rose des polluants
fondamentalement, une rose des polluants n'est rien d'autre qu'une rose des vents ou on remplace la vitesse du vent par une concentration de polluants.
### 3.3. Pollutant rose
une différence cepenadant est que dans une rose des vents, par convention, la direction est la direction du vent, c'et à dire la direction d'où **vient** le vent (convention météo).
Basically, a pollutant rose is nothing else than a wind rose where we replace the wind speed by a pollutant concentration.
inversement, pour les roses de polluants, les 2 conventions sont légitimes: on peut mettre la direction d'où vient le vent, et donc les polluants (dans ce cas, on considère le site comme récepteur). Mais on peut aussi considérer le site comme un site source, et dans ce cas, on veut savoir où va la pollution qu'on émet: la direction sera alors celle ou **va** le vent !
A difference is that in a wind rose, by convention, the direction is the wind direction, i.e. the direction from which **the wind comes** (weather convention).
** a vous de jouer: tracez qq rose des pollutions, et voyez si ca colle avec la rose des vents précédentes et avec votre idées des sources**
Conversely, for pollutant roses, both conventions are legitimate: one can put the direction from which the wind comes, and thus the pollutants (in this case, one considers the site as a receptor). But we can also consider the site as a source site, and in this case, we want to know where the pollution we emit goes: the direction will then be the one where **the wind goes**!
**It's up to you: draw a pollution rose, and see if it fits with the previous wind rose and with your idea of the sources**.
%% Cell type:code id: tags:
``` python
def jolierosedespolluants(wdir,polval,lieu,source=True,legend_xy=(0.8,0)):
def jolierosedespolluants(wdir, polval, lieu, source=True, legend_xy='best', save=None):
#faire la rose des vents
from windrose import WindroseAxes
mini,maxi=polval.min(),polval.max()
ax = WindroseAxes.from_ax()
ax.bar(wdir, polval, blowto = source, normed=True, opening=0.8, edgecolor='white',
bins=np.logspace(np.floor(np.log10(mini))-1,np.floor(np.log10(maxi)), 10))
ax.bar(wdir, polval, blowto=source, normed=True, opening=0.8, edgecolor='white',
bins=np.logspace(np.floor(np.log10(0.1)),np.floor(np.log10(maxi)), 10)
)
# graduer correctement la rose des vents
import matplotlib.ticker as tkr
ax.set_yticks(np.arange(0,25,5))
for t in ax.get_yticklabels():
......@@ -266,38 +292,111 @@
ax.yaxis.set_major_formatter(tkr.FormatStrFormatter('%2.0f'))
#décorer la rose des vents
ax.set_legend()
ax.legend(title="polluant (µg/m3)", loc=legend_xy)
plt.title('rose des polluants '+lieu, y=1.08)
plt.title('rose des polluants '+lieu, fontsize=30)
if save is not None: plt.savefig(save+'.jpg', dpi=300)
```
%% Cell type:markdown id: tags:
Be careful: the wind direction and pollutant data must match!
**mask the data to get vectors of the same size**
Choose the corresponding pollutant station and one pollutant.
%% Cell type:code id: tags:
``` python
mon_df = dfc[???][???]
pollutant = mon_df[mon_df.index.isin(???)]
wind = mto[wdir][mto[wdir].index.isin(???)]
jolierosedespolluants(mto[wdir],dfc['Fond','NO'],lieu,(0,0))
print(len(pollutant)); print(len(wind))
```
%% Cell type:code id: tags:
``` python
# Replace the title with your pollutant and station
jolierosedespolluants(wind, pollutant, 'Traffic NO2',
# save='traffic_NO2'
)
```
%% Cell type:markdown id: tags:
### 3.3 Share it on the whiteboard!
**As last time, choose a station and a pollutant that has not been chosen and add it to this map** (add a single pollutant or a wind rose so that everyone can participate):
https://app.mural.co/t/variabiliteclimatique4363/m/variabiliteclimatique4363/1633075668320/080f4d79efeac3b94d0c9cf606bd6b7aa79dcfd6?sender=ufcbfba826e94d93c633c7410
%% Cell type:markdown id: tags:
### 3.4 Put it on map with Python
It is also possible to add these wind roses directly on a Python map thanks to [Cartopy](https://scitools.org.uk/cartopy/docs/latest/) (but seems not to work on UGA's JupyterHub) or [Contextily](https://contextily.readthedocs.io/en/latest/). Try to find the coordinates of a station and add a wind rose or a pollutant rose at the corresponding location (https://www.atmo-auvergnerhonealpes.fr/donnees/acces-par-station/15050)!
%% Cell type:code id: tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid.inset_locator import inset_axes
import contextily as cx
import windrose
minlon, maxlon, minlat, maxlat = (5.1, 6, 44.9, 45.55)
_,main_ax = plt.subplots(figsize=(12, 6))
main_ax.set(xlim=(minlon, maxlon),ylim=(minlat, maxlat))
cx.add_basemap(main_ax, zoom=12, crs=4326)
# Coordinates of the station we were measuring windspeed
lon, lat = (???, ???)
# Inset axe it with a fixed size
wrax = inset_axes(main_ax,
width=1, # size in inches
height=1, # size in inches
loc='center', # center bbox at given position
bbox_to_anchor=(lon, lat), # position of the axe
bbox_transform=main_ax.transData, # use data coordinate (not axe coordinate)
axes_class=windrose.WindroseAxes, # specify the class of the axe
)
wrax.bar(mto[wdir], mto[wsp])
wrax.tick_params(labelleft=False, labelbottom=False)
wrax.set_title('windrose Traffic')
# plt.savefig('map.jpg', dpi=300, bbox_inches='tight')
```
%% Cell type:markdown id: tags:
Il y a plein d'autres questions qu'on pourrait se poser avec un jeu de données pareil:
Il y a plein d'autres questions qu'on pourrait se poser avec un jeu de données pareil :
- saisonalité des roses des vents ?
- saisonalité des roses de pollution ?
- liens plus fins entre météo et pollution:
- influence de la pluie sur les niveaux de pollution ? (pas si évident comme le montre la figure suivante). il faut être un peu plus sioux pour le voir (c'est pas la pluie horaire qui compte...)
- liens plus fins entre météo et pollution :
- influence de la pluie sur les niveaux de pollution ? (pas si évident comme le montre la figure suivante). Il faut être un peu plus sioux pour le voir (c'est pas la pluie horaire qui compte...)
- influence de la température sur la pollution hivernale ? (la fameuse couche d'inversion)
- influence de l'ensoleillement, de la température sur les pics d'ozone en été ?
- liens entre $NO_X$ et $O_3$ (voir cours de Steph Houdier au S2)
...
Beaucoup de ces questions peuvent être abordées à travers les stratégies évoquées ici: split-apply-combine... pour produire les données agrégées qui nous intéressent (ex le cumul de pluie sur un événement), à comparer ensuite à une autre donnée (ex: le niveau de PM10 après événement pluvieux, ou mieux, la différence de niveau de PM10 avant - après événement pluvieux)
ex: la pluie et les niveaux de particule.
%% Cell type:code id: tags:
``` python
dftout.plot(kind='scatter', y='NO', x='Pluie (mm)')
```
%% Cell type:markdown id: tags:
## 6. Conclusions
Normalement, vous devriez être un peu convaincus que l'exploitation judicieuse de gros jeux de données permet de:
......
......@@ -51,4 +51,10 @@ open it, and start working (ERCA_intro-data_p2_groupby-resample.ipynb) !
# session 3
Do back the `git pull` as for session 2
you should now have all that is needed for sesson 3!
open it, and start working (ERCA_intro-data_p3_windroses.ipynb) !
`aasqa_regression_Foteini.ipynb`
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment