Details good practice toward container image storage, tracability and referencing
In workflow systems the reproducible features are mostly defended by the use of container image at every step. Therefore in order to pinpoint the strengths and limitations of this I have some questions regarding container images:
For some I have some answers that I think are good, I have added them below the questions.
How image referencing works, difference between tag or digest ?
- tag is just a label given by maintainer, it can be changed.
- digest is seems more "reproducible" in the sense where you are sure to retrieve the same image ("bitwise")
- On the other side the tag can be linked with release cycle with proper CI/CD that guarantees traceability. In some projects tag intangibility is policy-enforced (one good example is the rocker images and the tensorflow images).
How can we ensure the long-term storage of docker images ?
- Are there academic image registry in France/Europe (e.g. based on harbor) ?
- Use of gitlab/forge internal registries ?
- Dockerhub image, not pulled for 6 months are deleted
- No long term guarantee on quay.io
- Aside from taking the archive of the image and putting it on zenodo or another "entrepot de données" ?
Do we need to store the image itself if the process of build the image, identified by a tag (tag of the image = tag of repository) is reproducible by itself ?
Are there rules such as https://github.com/hadolint/hadolint#rules to ensure the writing of reproducible Dockerfiles ?
What is the proper way to provide container images for reproducible research ?
Is the term GitOps usefull here ?
- One image repo = one git repo ?
- one git tag = one image tag ?
- images pushed via CI/CD
- Dockerfiles following a set of rules ? (I found nothing good enough from my perspective regarding rules to write reproducible Dockerfiles)