How to Provide Data?

The ENES infrastructure project (IS-ENES) provides to data producers and managers tools and information on the quality of the datasets hosted at the ESGF portals.

  • Data Ingestion: the ESGF data ingestion steps provides guidance on how to publish data to ESGF.
  • Data Quality: the Quality Assurance tool QA-DKRZ checks conformance of meta-data of climate simulations given in NetDCF format with CF Conventions, CORDEX, CMIP5 and 6 conventions and rules.
  • Data Citation: the World Data Center for Climate WDCC offers the ENES community the assignment of a DataCite DOI for model data citation to long-term archived data. DOI assignment is combined with further quality checks. Find specific information on the CORDEX Data Citation process.

Data Ingestion

Except replications, which are treated differently, ESGF data ingestion consists of the steps shown below:

Steps of ESGF data ingestion

At the end of the publishing step, the data are visible in the ESGF and can be downloaded from there. For long-term archiving and DataCite DOI assignment, additional ingestion steps have to be appended.

Aggregation, format and unit conversion, generation of metadata, and additional data*

These steps have to be performed by the data provider.

Data with time frequency day, month, season and year are usually aggregated. Depending on the nature of the variable and the rules of the project, an integral (e.g. for precipitation), a mean, minimum or maximum has to be calculated. Sometimes, more than one aggregation step is necessary, for example monthly CORDEX sfcWindmax is a mean of the daily maxima.

Metadata are usually inquired with a form and are project-dependent. If the utilized grid is not the usual latitude-longitude grid, additional data for grid description are needed, rotation angles in case of rotated poles or a grid map file in case of curvilinear coordinates.

Rewriting of NetCDF file

NetCDF allows many data structures, variable definitions and attributes. To guarantee uniformity within a project, each project defines rules and translates them into machine-readable tables, the CMOR tables. The program CMOR (Climate Model Output Rewriter) is able to read these tables and adapts the attributes in the NetCDF header according the project’s rules, i.e. overwrites them. It is able to perform unit conversions (if not already done) and can calculate auxiliar variables as time bounds or grid points from the grid map. It also performs some quality checks, e.g. it can detect gaps in time series.

Original CMOR is a subroutine library. Climate data projects usually have a main program and routines for reading the data developed or reuse existing software. Usually, the data provider has to perform the rewriting of the data files. IS-ENES offers guidance.

Quality assurance

This step is usually done in the data centre but the quality assurance tool can also be used by the data provider. The tool has been developed in the IS-ENES project and is freely available. The quality assurance tool can check the consistency with the CF standard, with CMIP5 and CORDEX rules. The checks are not limited to the data itself, the directory structure can also be examined. Which tests are performed is project-dependent. The quality assurance tool throws a warning if a rule is violated. It does not perform corrections. In case quality assurance detects inconsistencies with the project’s rules, the data are returned to the data provider for adaptation.

ESGF publisher

This last step can only be launched by an ESGF data node administrator. The ESGF publisher script fills-in data and metadata. It also checks readability again.


CMIP Data Quality

CMIP6

CMIP5

For CMIP5 a three stage quality assurance process was agreed on. This process has been an important ENES contribution to the international CMIP5 data management effort. The following list collects information related to the CMIP5 quality check (QC) process:

Quality checked data is the basis of the ENES long term archival of CMIP5 data in the Data Distribution Centre (DDC) hosted at DKRZ.


CORDEX Data Citation

WDC Climate data citation service As for CMIP5, the World Data Center for Climate (WDCC) offers the assignment of DataCite DOIs to CORDEX data to support the citation of CORDEX data. This service is provided for stable data for long-term usage.

Process:

  • Contact WDCC at data@dkrz.de including “DOI CORDEX” in your email subject and information on your affiliation in the email body
  • Data replication (if necessary) by WDCC in cooperation with you
  • Additional quality control checks on data by WDCC
  • Long-term archiving of data at WDCC
  • Technical Quality Assurance (TQA) checks by WDCC: consistency of data and metadata
  • Scientific Quality Assurance (SQA) by you: check of WDCC’s metadata and additional documentation of citation and scientific quality information
  • DataCite DOI assignment and notification of the data creator/author by WDCC

DOI granularity for CORDEX:

  • The targeted granularity for a DataCite DOI / citation entity for CORDEX is: one DOI for data of a single domain created with the same RCM by the same institute
  • Example: EUR-44 results of SMHI for RCA4-v1 for all driving models

Documentation and Overview:

  • WDCC’s DataCite DOI process is documented here.

Contact and Information: