Commit ba48f403 authored by Loic Huder's avatar Loic Huder
Browse files

Merge branch 'doc_improvement'


Former-commit-id: f4e97c5d
parents 00efad9e cbc59e07
......@@ -10,6 +10,7 @@
# GIT_DEBUG_LOOKUP: "1"
# GIT_TRANSLOOP_DEBUG: "1"
# GIT_TRANSPORT_HELPER_DEBUG: "1"
stages:
- build
- test
......@@ -29,6 +30,8 @@ stages:
test_with_coverage:
stage: test
variables:
GIT_SUBMODULE_STRATEGY: normal
script:
- python3 setup.py install --user
......@@ -49,12 +52,11 @@ build_docs:
stage: test
script:
- sphinx-apidoc -e -o doc -f pygeodyn/
- export PYTHONPATH=$PYTHONPATH:$(pwd) && cd doc && sphinx-build -b html . html && ls html && cd ..
- ./make_doc.sh
artifacts:
paths:
- doc/html
- docs/html
only:
- master
- web
......@@ -65,8 +67,8 @@ pages:
GIT_STRATEGY: none # Skip all git operations as this stage only works with artifacts
script:
- mkdir .public
- ls doc/html/
- cp -r doc/html/* .public
- ls docs/html/
- cp -r docs/html/* .public
- cp -r htmlcov .public
- mv .public public
artifacts:
......@@ -79,9 +81,11 @@ pages:
- master
- web
# CONTINUOUS RELEASE : hidden jobs for now (start with .), will not be launched
.test_validity_merge:
# CONTINUOUS RELEASE
test_validity_merge:
stage: test
variables:
GIT_SUBMODULE_STRATEGY: normal # Clone pygeodyn_data (default is none)
script:
- echo $CI_MERGE_REQUEST_TITLE # WILL ONLY WORK AT Gitlab 11.9
......@@ -96,7 +100,7 @@ pages:
variables:
- $CI_MERGE_REQUEST_TITLE =~ /^WIP:/ # WILL ONLY WORK AT Gitlab 11.9
.on_merge:
on_merge:
stage: deploy
variables:
GIT_STRATEGY: none
......@@ -112,4 +116,4 @@ pages:
refs:
- master
variables:
- $CI_COMMIT_MESSAGE =~ /^Merge/
- $CI_COMMIT_MESSAGE =~ /^Merge and release/i
[submodule "pygeodyn/data"]
path = pygeodyn/data
url = https://gricad-gitlab.univ-grenoble-alpes.fr/Geodynamo/pygeodyn_data.git
recursive-include pygeodyn *
recursive-exclude pygeodyn *.pyc
prune pygeodyn/data/prior
recursive-include pygeodyn/data/prior/CE_augkf *
include pygeodyn/data/prior/midpath/Real.hdf5
prune pygeodyn/tests/dummy_files
prune pygeodyn/tests/.hypothesis
include AUTHORS.txt
\ No newline at end of file
prune pygeodyn/data
prune pygeodyn/release
prune pygeodyn/tests/
\ No newline at end of file
......@@ -4,120 +4,28 @@
[![coverage report](https://gricad-gitlab.univ-grenoble-alpes.fr/Geodynamo/pygeodyn/badges/master/coverage.svg)](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/htmlcov/)
[![documentation](https://img.shields.io/website/https/geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/index.html.svg?label=documentation&up_color=%2300c5c5)](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/index.html)
pygeodyn is a Python packaging offering data assimilation algorithms applied to geomagnetic data in the spectral domain.
Data assimilation algorithms applied to geomagnetic data in the spectral domain.
Please see the [online documentation](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/index.html) for further information:
* [Installation](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/install.html)
* [Usage](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/usage_run_algo.html)
* [API reference]((https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.html))
From Olivier Barrois, Nicolas Gillet, Loïc Huder, Yannick Martin and Franck Thollard.
## Installation
### Requirements
The installation of pygeodyn requires Python 3 (tested under 3.5) to be installed with the following packages:
* _setuptools_ (tested against 40.4.3): to be able to run `setup.py`.
* _numpy_ (at least 1.7): to be able to wrap Fortran files with _f2py_.
The other dependencies will be automatically installed by the next step but are listed here for the sake of completeness:
* _numpy_ (at least 1.10)
* _scipy_ (at least 0.17)
* _h5py_
* _mpi4py_
* _sklearn_
* _matplotlib_
* _hypothesis_ (only for the tests)
### Install the pygeodyn package
Note: Two installation strategies are proposed: a classical installation and an installation in a virtual envrionnement. The latter is preferred as it gives more control on the Python environment used to run the program. The rest of the README is independent of the chosen strategy.
#### Classical installation
If the requirements are satisfied, pygeodyn can be installed by running
```bash
python3 setup.py install --user OR python3 setup.py develop --user
```
> The ```develop``` option links the current folder in the python library. Any modification will be immediately taken into account without the need to install the package again.
If you are installing in a virtual environment (preferred, see the [venv](https://docs.python.org/3/library/venv.html) page), you can remove the `--user` flag but remember to install _numpy_ first:
```bash
pip3 install numpy
python3 setup.py install
```
Test if the installation succeeded by importing pygeodyn in python3
```bash
python3 -c "import pygeodyn; print(pygeodyn.__version__)"
```
This should print the current version of pygeodyn if the installation succeeded.
### Testing
Tests on the package can be run using
```bash
python3 setup.py test
```
Please report any failed tests using the contact info given at the end of the README file.
Detailed information on the coverage of the tests can be found on [GRICAD](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/htmlcov/).
### Reinstallation
```bash
python3 setup.py clean --all
python3 setup.py develop --user
```
## Usage
pygeodyn uses configuration files (extension: `.conf`) to set the computation parameters. See [example.conf](./example.conf) for an example of such a configuration file.
### Launching the algorithm
The computation is launched by running `run_algo.py`. Several parameters can be given in arguments to tune the computation. The only one needed is `-conf` whereas default behaviours are set if the other arguments are not given.
##### Scientific parameters
* `-conf`: the path to the configuration file to use (no default value: must be given)
* `-m`: the number of realisations to consider (default: sets to 2)
* `-algo`: the name of the algorithm to use. As for now, the only supported algorithm is **augkf** (Augmented state Kalman Filter implemented in Python) which is the default value.
* `-seed`: a number used for the generation of random quantities. Set this to get reproducible results (default: generate a new one at each execution).
##### Output files
* `-path`: path where the folder results will be created (default: `~/pygeodyn_results/`).
* `-cname`: name of the computation. Used to create a folder in `-path` where the results will be saved (default: `Current_computation`).
##### Logging
* `-v`: level of the logging: 1 DEBUG, 2 INFO (default), 3 WARNING, 4 ERROR, 5 CRITICAL.
* `-l`: name of the log file that will be created in the result folder (default: display logs in the console).
#### Examples:
The following command launches the computation with default values:
```bash
python3 run_algo.py -conf augkf.conf
```
An example with all arguments given:
```bash
python3 run_algo.py -conf augkf.conf -m 5 -algo 'augkf' -seed 15 -path './results/' -cname 'AugKF' -v 4 -l 'log.txt'
```
### Launch with MPI
Forecasts can be performed in parallel if using [mpi4py](http://mpi4py.scipy.org/). For this, use the `mpiexec` command by setting the number of process. The other arguments can be set as before.
### Offline documentation
It is also possible to generate the documentation locally if the source files were downloaded. For this, install first [Sphinx](http://www.sphinx-doc.org/) and the Sphinx theme [Read the Docs](https://sphinx-rtd-theme.readthedocs.io/en/stable/):
```bash
mpiexec -np 4 python3 run_algo.py -conf augkf.conf
mpiexec -np 2 python3 run_algo.py -v 3 -conf augkf.conf -path './results/' -m 3 -cname 'AugKF' -algo 'augkf'
pip3 install sphinx [--user]
pip3 install sphinx_rtd_theme [--user]
```
Put `--user` if you are installing outside a virtual environment.
### Saving results
#### Output folder
When a new computation is launched, pygeodyn creates a folder of name `-cname` (default: `Current_computation`) at the location given in `-path` (default: `~/pygeodyn_results/`). All result files will be created in this output folder.
#### Hdf5 file
The default behaviour of pygeodyn is to automatically save the results at each time iteration in an hdf5 file of name `-cname`. This way, even in case of a crash, all previous iterations should be saved.
## For more information
### In-depth guide
We only presented here the high-level use of pygeodyn that is encapusated in `run_algo.py` with supplied data. If you wish to use your own data, have more information on the implementation or use low-level features, please refer to the [In-depth guide](doc/guide.md) in the `doc` folder.
### Developer documentation
Documentation of the submodules and methods of the package are available on [GRICAD](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/index.html).
If [Sphinx](http://www.sphinx-doc.org/) is installed, it is possible to generate the documentation locally using:
Then run the script building the docs:
```bash
sphinx-apidoc -e -o doc -f pygeodyn/
cd doc && sphinx-build -b html . html && cd ..
./make_doc.sh
```
The doc will then be available in HTML format at doc/html/index.html.
The doc will then be available for navigation in HTML format at `docs/html/index.html`.
### Scientific documentation
### Scientific references
Information on the Augmented state Kalman filter can be found in:
- [Barrois, Gillet and Aubert, GJI (2017)](http://doi.org/10.1093/gji/ggx280)
- [Barrois et al. GJI (2018)](http://doi.org10.1093/gji/ggy297)
......@@ -125,13 +33,13 @@ and subsequent erratum:
- [Barrois et al. GJI (2019)](http://doi.org10.1093/gji/ggy471)
For information on the further developments (dense AR-1), see:
- [Gillet, Huder and Aubert, submitted (2019)]()
- [Gillet, Huder and Aubert, GJI (accepted) (2019)]()
### Conditions of use
The work is licensed under the [GNU GPLv3](./LICENSE.txt).
If the use of this code led to a scientific output, please cite:
- [Huder, Gillet and Thollard, submitted (2019)]()
- [Huder, Gillet and Thollard, submitted tO GMD (2019)]()
### Git repository
The source code is stored on a Git repository (https://gricad-gitlab.univ-grenoble-alpes.fr/Geodynamo/pygeodyn) which can also be used to give feedbacks through [Issues](https://gricad-gitlab.univ-grenoble-alpes.fr/Geodynamo/pygeodyn/issues).
......
# param_name type value
Lu int 18
TauU int 30
TauU years 30
Lb int 13
Lsv int 13
TauE int 10
tmax int 64
compute_e int 1
TauE years 10
Nth_legendre int 64
t_start date 1980
t_end date 2015
dt_f months 6
dt_a months 12
prior_dir str data/prior/midpath
prior_dir str data/priors/midpath
prior_type str midpath
obs_dir str data/observations/COVOBS_hd
obs_type str COVOBS
\ No newline at end of file
# param_name type value
Lu int 18
TauU int 30
TauU years 30
Lb int 13
Lsv int 13
TauE int 10
tmax int 64
compute_e int 1
TauE years 10
t_start date 1955
t_end date 2020
dt_f months 6
dt_a months 12
prior_dir str data/prior/midpath
prior_dir str data/priors/midpath
prior_type str midpath
obs_dir str data/observations/COVOBS_hd
obs_type str COVOBS
......
......@@ -4,13 +4,11 @@ TauU int 30
Lb int 14
Lsv int 14
TauE int 10
tmax int 64
compute_e int 1
t_start date 1997-11
t_end date 2017-11
dt_f months 2
dt_a months 4
prior_dir str data/prior/CE_augkf
prior_dir str data/priors/CE_augkf
prior_type str coupled_earth
obs_dir str data/observations/GO-VO_2017
obs_type str GO_VO
# In-depth guide
## Presentation
In README.md was presented a way to use pygeodyn through the use of `run_algo.py`. The `run_algo.py` program takes care of chaining and looping the data assimilation steps (and of numerous handy things such as the uses of multiple processes with mpi4py, the saving of output files and of logs). For more details, we present here the structure of the said program:
<img src="Structure.svg" alt="run_algo_structure" width="2000"/>
The initialisation of the algorithm is done by the [init_algorithm](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.run.html) method. This is a two-step method:
* First, `read_conf` reads the configuration file (see [example.conf](../example.conf)) to build a [ComputationConfig](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.inout.config.html) object named `config`.
* Secondly, the `config` object is passed to the `launch_setup` method that will generate a _Forecaster_ object and an _Analyser_ object. The _Forecaster_ and _Analyser_ respectively implement the methods _forecast_step_ and _analysis_step_ that are the main components of the data assimilation process. The `run_algo.py` runs forecast and analysis steps on `CoreStates` (see below) at a frequency given by the user in the configuration file.
The two objects _Forecaster_ and _Analyser_ integrate all the input data that result from the algorithm initialisation:
* **Configuration**: containing all the parameters of the computation. Read from a `.conf` file.
* **Priors**: a dictionary containing the prior information for the magnetic field, the core flow and the subgrid errors. These are read from supplied data of supported `prior_type`.
* **Covariance matrices**: Spatial cross-covariance matrices computed from the priors. These depend from the input priors and are used in the forecast and analysis steps (best linear unbiased estimatations).
And for the _Analyser_ object:
* **Observations**: object containing all the informations (data, errors and operator) about measurements of magnetic field and secular variations. This is generated from supplied files of supported `obs_type`.
It may be interesting to run supplied algorithms on new priors (from which covariance matrices derive) and observations to test new models. To change the input data, it is necessary to define new types (for priors and/or obserations) that have reading methods adapted to your files. To be integrated in the program, these reading methods need to follow a certain structure that is presented below.
## Defining new reading methods
### Priors
For examples, you can refer to the priors embarked with the program (that are outputs from geodynamo simulations):
* [Coupled-Earth](http://dx.doi.org/10.1038/nature12574) (prior_type: `coupled_earth`)
* [Midpath](https://doi.org/10.1017/jfm.2016.789) (prior_type: `midpath`)
The reading methods are found in [`pygeodyn/inout/priors.py`](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.inout.priors.html)
. To use you own data, you must implement your reading method in the same file and design it with the same format. Namely:
```python
# The method name must have this name format and signature
def read_<new_prior_type>(data_directory, dt_sampling):
# Read your prior data...
# ...and returns the 1D array of times (nb_realisations) and the 2D arrays of realisations (nb_realisations x nb_coeffs) for B, U and ER
return times, B, U, ER
```
If your prior does not supply time data, return `None` instead. It will only prevent the use of dense matrices in the Auto-Regressive processes.
Now, all is left is to set `prior_type` in the config file to the name `<new_prior_type>` that you chose for it (and `prior_data_dir` to the directory where the prior data is stored). Then, try to launch a computation and check if any errors are raised (`NotImplementedError` if your prior is not associated to a reading function, `IOError` if the files were not found...). If not, your prior data will be used as average data in the AR-1 process and to compute the covariance matrices.
### Observations
Examples of valid reading methods are given for the embarked observations:
* [COVOBS](https://doi.org/10.1002/2014JB011786) (obs_type: `covobs`)
* [Ground and Virtual observatories](https://doi.org/10.1093/gji/ggy297) (obs_type: `go_vo`)
The strategy to use another type of observations follows the same paradigm as for priors: you must define a new observation type by implementing its reading method in [`pygeodyn/inout/observations.py`](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.inout.observations.html). In more details, the format must be:
```python
# The method name must have this name format and signature
def build_<new_obs_type>_observations(cfg, nb_realisations, measure_type):
# Read your observation data for measure_type='MF' or 'SV'...
# ...and returns a dict of Observation objects with numpy.datetime64 objects as keys
return observations
```
The implementation is trickier in this case and several points need to be highlighted:
* Your new reading method must take the measure_type as arg that will be equal to either 'MF' (for magnetic field observations) or 'SV' (for secular variation observations).
* The returned object must be a dictionary with numpy.datetime64 objects as keys that correspond to the date of the observations
* The said dictionary must contain [_Observation_ objects](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.inout.observations.html) that are a handy way to store all the observation parameters:
* The data `Y` as 2D numpy.ndarrays (nb_realisations x nb_observed_coefs)
* The observation operator `H` (nb_observed_coefs x nb_coefs) that verifies `Y = HX` where X is a vector containing the spectral coefficients of the magnetic field or secular variation (size: nb_coefs).
* The observation errors matrices `R` (nb_observed_coefs x nb_observed_coefs)
For reference, here is the basic way to create an _Observation_ object and the _observations_ dictionary:
```python
observations = {}
# ...
# Assuming you have extracted all the aforementioned parameters
observation_at_t = Observation(obs_data, H, R)
date = np.datetime64('1997-03') # Ex: create a datetime64 for Mar 1997
observations[date] = observation_at_t
# ... Do this for all observation dates and return the observations dict
```
After having implemented your new method, set `obs_types` in the config file to the name `<new_obs_type>` that you chose for it (and `obs_dir` to the directory where the data is stored). Then, try to launch a computation and check if any errors are raised or logged (if no reading function are found, you will get a warning in the logs saying that all analyses will be skipped). If not, your observation data will be used as observations in the analysis step of the algorithm.
## Using low-level forecast/analysis steps
### The CoreState object
A [_CoreState_](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.corestates.html) is the object on which the computation steps (forecasts and analysis) are performed. It provides an interface to store and act on the data of the involved physical quantities (called *measures*) such as the magnetic field ('MF'), the secular variation ('SV'), the core flow ('U') and the subgrid errors ('ER').
The usual way to use it is to store measure data as 3D NumPy arrays under the format (nb_realisations, nb_times, nb_coeffs) (as used for _computed_states_, _forecast_states_ and _analysed_states_ in `run_algo.py`) but it can handle 1D and 2D NumPy arrays as well.
#### CoreState operations
In the following code snippets, the preamble
```python
import numpy as np
```
is assumed.
##### Adding a measure
* 1D :
```python
from pygeodyn.corestates import CoreState
cs_1D = CoreState()
data_mf_1D = np.zeros(224)
cs_1D.addMeasure('MF', data_mf_1D)
```
* 3D :
```python
cs_3D = CoreState()
data_mf_3D = np.ones(10, 50, 224)
cs_3D.addMeasure('MF', data_mf_3D)
data_u_3D = np.ones(10, 50, 720) # Can also add the core_flow
cs_3D.addMeasure('U', data_u_3D)
```
##### Get/Set measure data
* 1D :
```python3
cs_1D.getMeasure('MF') # Returns data_mf_1D
cs_1D.B # Convenience getter: returns data_mf_1D as well
cs_1D.Lb # Returns the max_degree of 'MF' (here: 14)
```
The max degree is inferred from the size of the last dimension (here: 224 = 14*(14+2) in both 1D and 3D cases). If this behaviour does not suit you, it is possible to force the value of the max degree when adding a measure:
```python3
data_u_1D = np.zeros(800)
cs_1D.addMeasure('U', [data_u_1D, 18])
cs_1D.U # Returns data_u_1D
cs_1D.Lu # Returns the max_degree of 'U' (here: 18 the forced value)
```
Other convenience getters can be found in the complete [documentation](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.corestates.html).
* 3D :
It works exactly the same but CoreState also implements NumPy-like indexing:
```python3
cs_2D = cs_3D[0] # Stores the indexed corestate in cs_2D
cs_2D.B # returns data_mf_3D[0]
cs_2D = cs_3D[:, 10] # Works also with slices
cs_2D.B # returns data_mf_3D[:, 10]
cs_2D.U # returns data_u_3D[:, 10]
cs_2D.Lb # returns 14 as all members are copied
```
*WARNING*: mind that using indexing on the _CoreState_ creates a copy of itself ! It means that if you want to update a part of measure data, you should index the measure data and not the _CoreState_. Say you want to set the first coefficient of 'MF' to -40 for all times and realisations:
```python3
cs_3D.B[:,:,0] = -40 # OK: first it gets the B data and sets it.
cs_3D.B[0, 0, 0] # Returns the new value -40
cs_3D[:,:,0].B = -40 # WONT WORK: it first creates an indexed copy of cs_3D and then sets the B data of the copy !
cs_3D.B[0, 0, 0] # Returns the old value: we only updated a transient copy of cs_3D
```
If the measures and their shapes match, it is possible to set the measures of a _CoreState_ (or a part) using another _CoreState_ :
```python3
cs_1D = CoreState()
cs_1D.addMeasure('MF', np.ones(224))
cs_1D.addMeasure('SV', np.ones(224)*2)
cs_3D = CoreState()
cs_3D.addMeasure('MF', np.zeros(10, 50, 224))
cs_3D.addMeasure('SV', np.zeros(10, 50, 224))
cs_3D[0, 0] = cs_1D
cs_3D.B[0, 0] # Returns np.ones(224)
cs_3D.SV[0, 0] # Returns np.ones(224)*2
```
### Forecast/Analyse a CoreState
We saw in the beginning of this section that the _forecast_step_ (resp. _analysis_step_) was implemented by the _Forecaster_ (resp. _Analyser_) object. The easiest way to set up these objects is to use to the _init_algorithm_ method with the path to the desired configuration file:
```python
from pygeodyn.run import init_algorithm
algo_dict = {'name': 'augkf', 'nb_models': 10}
config_path = 'augkf.conf'
config, forecaster, analyser = init_algorithm(config_path, algo_dict)
```
Then, we can create CoreState objects to perform the forecast/analysis on them. Mind that _forecast_step_ takes a 1D CoreState as input (dim: nb_coeffs) as it is done for a single realisation. However, all realisations are needed for _analysis_step_ that therefore takes a 2D CoreState as input (dim: nb_realisations x nb_coeffs). This is shown in the following example of a forecast step followed by an analysis step:
```python
cs = CoreState()
# Build a complete CoreState with 10 realisations and 3 times
cs.addMeasure('MF', np.zeros(10, 3, 224))
cs.addMeasure('SV', np.zeros(10, 3, 224))
cs.addMeasure('ER', np.zeros(10, 3, 224))
cs.addMeasure('U', np.zeros(10, 3, 720))
# Do whatever operation to initialise the CoreState at t=0 (ex: set all realisations of B according to a magnetic model)
cs.B[:, 0] = my_awesome_magnetic_model
# Do a forecast step on all realisations
for i_real in range(0, 10):
# The input is the core state of a realisation at t=0
forecast_result = forecaster.forecast_step(cs[i_real, 0])
# The result is a realisation CoreState that we can use to update our CoreState at t=1
cs[i_real, 1] = forecast_result
# Now do an analysis step on the forecast CoreState at t=1
obs_date = np.datetime64('1980-01') # Will use the observation data in Jan 1980
# The input is the core state at t=1 (with all realisations) and the date of observations
analysis_result = analyser.analysis_step(cs[:, 1], obs_date)
# The result is a CoreState containing all analysed realisations that we can again use to update our CoreState (stored here at t=2)
cs[:, 2] = analysis_result
# Then we can access the data as previously:
cs.B[:, 0, 0] # Initial value of g10 for all realisations
cs.B[:, 1, 0] # Forecast value of g10 for all realisations
cs.U[0, 2] # Analysed values of all core flow coefs for the realisation 0
# and so on...
```
More information can be found in the documentation of the [forecast_step](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.augkf.forecaster.html) [analysis_step](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.augkf.analyser.html) for the [augkf module](https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.augkf.html) for example.
pygeodyn
========
.. toctree::
:maxdepth: 4
pygeodyn
sphinx-build -a -b html . html
#################################
Augmented Kalman filter algorithm
#################################
Introduction
============
The subpackage `augkf`_ implements algorithms based on an augmented state Kalman filter (AugKF) initiated by [BGA17]_.
The following figure shows the Kalman filter in its simplest form:
.. image:: ./img/basic_augkf.svg
:width: 500px
:align: center
:alt: basic principle of Kalman filter
In a few words, this algorithms uses a reduced numerical model to **forecast** the model state in time and use regularly **analyses** to correct the forecast model state from the observation data. The forecast uses the covariances derived from priors (geodynamo series) and the analysis performs an estimate according to observation data, integrating the said covariances and the observation errors.
In the picture above, the model state is univariate but the AugKF presented here works on the multivariate surface core state in the form of Gauss coefficients for the magnetic field, core flow, subgrid errors and secular variation.
The induction equation
======================
The basic equation coupling the components of the core state at a given time is the radial induction equation at the core surface in the spherical harmonic spectral domain:
.. math::
:label: induction
\dot{\mathbf{b}} = \mathrm{A}(\mathbf{b})\mathbf{u} + \mathbf{e_r}
* :math:`\mathbf{b}` (resp. :math:`\mathbf{u}` and :math:`\dot{\mathbf{b}}`) stores the Schmidt semi-normalized spherical harmonic coefficients of the magnetic field (resp. the core flow and the secular variation) up to a truncation degree :math:`L_b` (resp. :math:`L_u` and :math:`L_{sv}` ). The number of stored coefficients in those vectors are respectively :math:`N_b = L_b (L_b + 2)`, :math:`N_u = 2L_u(L_u+2)` and :math:`N _{sv} = L_{sv}(L_{sv}+2)`.
* :math:`\mathrm{A}(\mathbf{b})` is the matrix of Gaunt-Elsasser integrals [Moon79]_ of dimensions :math:`N_{sv} \times N_u` , depending on :math:`\mathbf{b}`.
* :math:`\mathbf{e_r}` stands for errors of representativeness (of dimension :math:`N_{sv}` ). This term accounts for both subgrid induction (arising due to the truncation of the fields) and magnetic diffusion.
Algorithm steps
===============
The sequential DA algorithm is composed of two kinds of operations:
* **Forecasts** are performed every :math:`\Delta t_f`. An ensemble of :math:`N_e` core states is time stepped between :math:`t` and :math:`t+\Delta t_f`.
* **Analyses** are performed every :math:`\Delta t_a` with :math:`\Delta t_a=n\Delta t_f` (analyses are performed every :math:`n` forecasts). The ensemble of core states at :math:`t_a` is adjusted by performing a Best Linear Unbiased Estimate (BLUE) using observations at :math:`t=t_a`.
From a technical point of view, algorithm steps take CoreState_ objects as inputs and return the CoreState_ resulting from the operations. Forecasts and analyses are handled by the Forecaster_ and Analyser_ modules that are implemented in the augkf_ subpackage according to the AugKF algorithm.
More information can be found in the relevant tutorials:
- `CoreState usage`_
- `Forecast and analysis steps usage`_
.. _Analyser: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.augkf.analyser.html
.. _augkf: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.augkf.html
.. _CoreState: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.corestates.html
.. _CoreState usage: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/usage_corestate.html
.. _Forecaster: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/pygeodyn.augkf.forecaster.html
.. _Forecast and analysis steps usage: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/usage_steps.html
Priors
------
The data assimilation steps require spatial cross-covariances that are derived from priors. The priors consist in a sampled series of spectral coefficients for the magnetic field (noted :math:`\mathbf{b}^*`), the core flow (:math:`\mathbf{u}^*`) and errors of representativeness (:math:`\mathbf{e_r}^*`). The background quantities (average of the series) are respectively noted :math:`\mathbf{b}_0`, :math:`\mathbf{u}_0` and :math:`\mathbf{e_r}_0`.
The supplied priors in *pygeodyn* are geodynamo free runs from the *coupled-earth* and *midpath* codes but other types of priors can be used (more information in the section about `prior types`_).
The corresponding cross-covariance matrices are given by
.. math::
\left\{
\begin{array}{rl}
\mathrm{P}_{bb} &\displaystyle= \mathbb{E}\left((\mathbf{b}^*-\mathbf{b}_0)(\mathbf{b}^*- \mathbf{b}_0)^T\right) \\
\mathrm{P}_{uu} &\displaystyle= \mathbb{E}\left((\mathbf{u}^*-\mathbf{u}_0)(\mathbf{u}^*- \mathbf{u}_0)^T\right) \\
\mathrm{P}_{ee} &\displaystyle= \mathbb{E}\left((\mathbf{e_r}^*-\mathbf{e_r}_0)(\mathbf{e_r}^*- \mathbf{e_r}_0)^T\right)
\end{array}\right.\,,
.. _prior types: https://geodynamo.gricad-pages.univ-grenoble-alpes.fr/pygeodyn/usage_new_types.html
Forecast and AR(1) processes
----------------------------
The forecast step consists in time-stepping :math:`\mathbf{X}(t)` between two epochs. AR-1 processes built on geodynamo cross-covariances are used to forecast :math:`\mathbf{u}(t)` and :math:`\mathbf{e}_r(t)`.
We write :math:`\mathbf{u}(t) = \mathbf{u}_0+\mathbf{u}'(t)`, with :math:`\mathbf{u}_0` the background flow (temporal average from the geodynamo run) -- and similar notations for :math:`\mathbf{e}_r(t)`.
Their numerical integration is based on an Euler-Maruyama scheme, which takes the form
.. math::
:label: aru
\left\{
\begin{array}{rl}
\mathbf{u}'(t+\Delta t_f) &= \mathbf{u}'(t) - \Delta t_f \mathrm{D}_u \mathbf{u}'(t) + \sqrt{\Delta t_f}\mathbf{\zeta_u}(t)\\
\mathbf{e}_r'(t+\Delta t_f) &= \mathbf{e}_r'(t) - \Delta t_f \mathrm{D}_e \mathbf{e}_r'(t) + \sqrt{\Delta t_f}\mathbf{\zeta_e}(t)
\end{array}\right.\,.
:math:`\mathrm{D}_u` is the drift matrix for :math:`\mathbf{u}`.
:math:`\mathbf{\zeta_u}` is a Gaussian noise, uncorrelated in time and constructed such that spatial cross-covariances :math:`\mathrm{P}_{uu}= \mathbb{E}\left(\mathbf{u}'\mathbf{u}'^T\right)` of :math:`\mathbf{u}` match those of the prior geodynamo samples :math:`\mathbf{u}^*`.
:math:`\mathbb{E}(\dots)` stands for statistical expectation.
Similar expressions and notations holds for :math:`\mathbf{e}_r`.
Note that :math:`\mathbf{u}` and :math:`\mathbf{e}_r` are supposed independent, which is verified for numerical simulations.
Diagonal AR(1)
^^^^^^^^^^^^^^
In the case where the geodynamo series used as priors do not allow to derive meaningful temporal statistics, the two drift matrices are simply diagonal, and controlled by a single free parameter tunable in the configuration file (:math:`\tau_u` for :math:`\mathbf{u}` and :math:`\tau_e` for :math:`\mathbf{e}_r`):
.. math::
:label: drift_diag
\mathrm{D}_{u} = \frac{1}{\tau_{u}}\mathrm{I}_{u} \text{ and } \mathrm{D}_{e} = \frac{1}{\tau_{e}}\mathrm{I}_{e}\,,
with :math:`\mathrm{I}_{u}` (resp. :math:`\mathrm{I}_{e}`) the identity matrix of rank :math:`N_{u}` (resp. :math:`N_e`).
Dense AR(1)
^^^^^^^^^^^
In the case where geophysically meaningful temporal statistics can be extracted from geodynamo samples, time cross-covariance matrices
.. math::
\left\{
\begin{array}{rl}
\mathrm{P}_{uu^+} &\displaystyle= \mathbb{E}\left(\mathbf{u}'(t)\mathbf{u}'(t+\Delta t^*)^T\right) \\
\mathrm{P}_{ee^+} &\displaystyle= \mathbb{E}\left(\mathbf{e}_r'(t)\mathbf{e}_r'(t+\Delta t^*)^T\right)
\end{array}\right.\,,
are derived according to a sampling time :math:`\Delta t^*`. :math:`\mathrm{D}_{u}` and :math:`\mathrm{D}_{e}` are then defined as:
.. math::
:label: drift_dense_e
\left\{
\begin{array}{rl}
\mathrm{D}_{u} &= \displaystyle\frac{\mathrm{I}_u - (\mathrm{P}_{uu}^{-1}\mathrm{P}_{uu^+})^T}{\Delta t^*}\\
\mathrm{D}_{e} &= \displaystyle\frac{\mathrm{I}_e - (\mathrm{P}_{ee}^{-1}\mathrm{P}_{ee^+})^T}{\Delta t^*}
\end{array}\right.\,.
The drift matrices :math:`\mathrm{D}_{u}` and :math:`\mathrm{D}_{e}` are now dense, hence processes using this expression are referred to as *dense* AR-1 processes.