Advanced Interactive time series plot of a global yearly mean anomaly with xarray, pandas and hvplot using CMIP6 data#

We will show how to combine, analyse and quickly plot data of the Coupled Model Intercomparison Project CMIP6. We will choose one variable of multiple experiments and compare the results of different models. In particular, we analyse the historical experiment in combination with one of the shared socioeconomic pathway (ssp) experiments.

This Jupyter notebook is meant to run in the Jupyterhub server of the German Climate Computing Center DKRZ. The DKRZ hosts the CMIP data pool including 4 petabytes of CMIP6 data. Please, choose the Python 3 unstable kernel on the Kernel tab above, it contains all the common geoscience packages. See more information on how to run Jupyter notebooks at DKRZ here.

Running this Jupyter notebook in your premise, which is also known as client-side computing, will require that you install the necessary packages and download data.

Learning Objectives#

  • How to access a dataset from the DKRZ CMIP data pool with intake-esm

  • How to calculate global field means and yearly means with xarray and numpy

  • How to visualize the results with hvplot

import intake
import pandas as pd
import hvplot.pandas
import numpy as np

First, we need to set the variable_id which we like to plot. This is a selection of the most often analysed variables:

  • tas is Near-surface Air Temperature

  • pr is Precipitation

  • psl is Sea level pressure

  • tasmax is Near-surface Maximum Air Temperature

  • tasmin is Near-surface Minimum Air Temperature

  • clt is Total Cloud Cover Percentage

Choose the variable:

# Choose one of
# pr, psl, tas, tasmax, tasmin, clt
variable_id = "tas"
%store -r
# get formating done automatically according to style `black`
#%load_ext lab_black

The intake-esm software reads Catalogs which we use to find, access and load the data we are interested in. Daily updated CMIP6 catalogs are provided in DKRZ’s cloud swift.

Similar to the shopping catalog at your favorite online bookstore, the intake catalog contains information (e.g. model, variables, and time range) about each dataset that you can access before loading the data. It means that thanks to the catalog, you can find out where the “book” is just by using some keywords and you do not need to hold it in your hand to know the number of pages.

We specify the catalog descriptor for the intake package. The catalog descriptor is created by the DKRZ developers that manage the catalog, you do not need to care so much about it, knowing where it is and loading it is enough:

# Path to master catalog on the DKRZ server
#dkrz_catalog=intake.open_catalog(["https://dkrz.de/s/intake"])
#
#only for the web page we need to take the original link:
parent_col=intake.open_catalog(["https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml"])
list(parent_col)
['dkrz_cmip5_archive',
 'dkrz_cmip5_disk',
 'dkrz_cmip6_cloud',
 'dkrz_cmip6_disk',
 'dkrz_cordex_disk',
 'dkrz_dyamond-winter_disk',
 'dkrz_era5_disk',
 'dkrz_monsoon_disk',
 'dkrz_mpige_disk',
 'dkrz_nextgems_disk',
 'dkrz_palmod2_disk']
col=parent_col["dkrz_cmip6_disk"]

Browsing through the catalog#

We define a query and specify keyvalues for search facets in order to search the catalogs. Possible Search facets are all columns of the table identified by its name.

In this example, we compare the MPI-ESM1-2-HR model of the Max-Planck-Institute and the AWI-CM-1-1-MR from the Alfred Wegner Institute for 3 different experiments. CMIP6 comprises many experiments with lots of simulation members and we will use some of them. You can find more information in the CMIP6 Model and Experiment Documentation.

We will concatenate historical experiment with two different Shared Socioeconomic Pathway (SSPs) scenarios. The historical experiment uses best estimates for anthropogenic and natural forcing for simulating the historical period 1850-2014. SSPs are scenarios of projected socioeconomic global changes.

  • historical

    This experiments usese the best estimates for anthropogenic and natural forcing for simulating the historical period 1850-2014.

  • ssp245

    The 45 corresponds to the growth in radiative forcing reached by 2100, in this case, 4.5 W/m2 or ~650 ppm CO2 equivalent

  • ssp585

    The 85 corresponds to the growth in radiative forcing reached by 2100, in this case, 8.5 W/m2

query = dict(
    variable_id=variable_id,
    table_id="Amon",
    experiment_id=["historical", "ssp585"], # we have excluded "ssp245" from the list because it would take 15min to finish the nb
    source_id=["MPI-ESM1-2-HR", "AWI-CM-1-1-MR"],
)
col.df["uri"]=col.df["uri"].str.replace("lustre/","lustre02/") 
cat = col.search(**query)

Let’s have a look into the new catalog subset cat. We use the underlaying pandas dataframe object df to display the catalog as a table. Each row refers to one file.

cat.df
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label dcpp_init_year version time_range project format uri
0 CMIP AWI AWI-CM-1-1-MR historical r1i1p1f1 Amon tas gn NaN v20200720 185001-185012 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-...
1 CMIP AWI AWI-CM-1-1-MR historical r1i1p1f1 Amon tas gn NaN v20200720 185101-185112 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-...
2 CMIP AWI AWI-CM-1-1-MR historical r1i1p1f1 Amon tas gn NaN v20200720 185201-185212 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-...
3 CMIP AWI AWI-CM-1-1-MR historical r1i1p1f1 Amon tas gn NaN v20200720 185301-185312 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-...
4 CMIP AWI AWI-CM-1-1-MR historical r1i1p1f1 Amon tas gn NaN v20200720 185401-185412 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1272 ScenarioMIP DWD MPI-ESM1-2-HR ssp585 r2i1p1f1 Amon tas gn NaN v20190710 208001-208412 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/...
1273 ScenarioMIP DWD MPI-ESM1-2-HR ssp585 r2i1p1f1 Amon tas gn NaN v20190710 208501-208912 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/...
1274 ScenarioMIP DWD MPI-ESM1-2-HR ssp585 r2i1p1f1 Amon tas gn NaN v20190710 209001-209412 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/...
1275 ScenarioMIP DWD MPI-ESM1-2-HR ssp585 r2i1p1f1 Amon tas gn NaN v20190710 209501-209912 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/...
1276 ScenarioMIP DWD MPI-ESM1-2-HR ssp585 r2i1p1f1 Amon tas gn NaN v20190710 210001-210012 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/...

1277 rows × 14 columns

Loading the data#

We can load the data into memory with only one code line. The catalog’s to_dataset_dict command will aggregate and combine the data from files into comprehending xarray datasets using the specifications from the intake descriptor file. The result is a dict-type object where keys are the highest granularity which cannot be combined or aggregated anymore and values are the datasets.

xr_dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks":{"time":1}})
print(xr_dset_dict.keys())
/tmp/ipykernel_1312/4092023078.py:1: DeprecationWarning: cdf_kwargs and zarr_kwargs are deprecated and will be removed in a future version. Please use xarray_open_kwargs instead.
  xr_dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks":{"time":1}})
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.source_id.experiment_id.table_id.grid_label'
100.00% [4/4 11:08<00:00]
dict_keys(['ScenarioMIP.AWI-CM-1-1-MR.ssp585.Amon.gn', 'ScenarioMIP.MPI-ESM1-2-HR.ssp585.Amon.gn', 'CMIP.MPI-ESM1-2-HR.historical.Amon.gn', 'CMIP.AWI-CM-1-1-MR.historical.Amon.gn'])
xr_dset_dict['ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Amon.gn']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[9], line 1
----> 1 xr_dset_dict['ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Amon.gn']

KeyError: 'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Amon.gn'

Global yearly mean calculation#

We define a function for calculation the global mean by weighting grid boxes according to their surface area. Afterwards, we groupby years and calculate the yearly mean. This all is done by using xarray.

def global_yearly_mean(hist_dsets):
    # Get weights
    weights = np.cos(np.deg2rad(hist_dsets.lat))
    # Tas weighted
    variable_array = hist_dsets.get(variable_id)
    variable_weights = variable_array.weighted(weights)
    # Tas global mean:
    variable_globalmean = variable_weights.mean(("lon", "lat"))
    # Tas yearly mean:
    variable_gmym = variable_globalmean.groupby("time.year").mean("time")
    return variable_gmym.values

Historical reference period#

We define the period from 1851-1880 as our reference period. In the following, we calculate future simulation anomalies from that period. But first things first, we need the global yearly mean for that period:

historical = [key for key in xr_dset_dict.keys() if "historical" in key][0]
dshist = xr_dset_dict[historical]
dshist_ref = dshist.sel(time=dshist.time.dt.year.isin(range(1851, 1881)))
# 10member
var_ref = global_yearly_mean(dshist_ref)
var_refmean = var_ref.mean()

Get Meta Data#

In order to label the plot correctly, we retrieve the attributes long_name and unitsfrom the chosen variable.

lname = dshist.get(variable_id).attrs["long_name"]
units = dshist.get(variable_id).attrs["units"]
label = "Delta " + lname + "[" + units + "]"

Calculate Anomaly#

  1. We save the result - the anomaly values - in a pandas dataframe var_global_yearly_mean_anomaly. We use this dataframe object because it features the plot function hvplot which we would like to use. We start by creating this dataframe based on the datasets which we got from intake.

lxr = list(xr_dset_dict.keys())
columns = [".".join(elem.split(".")[1:4]) for elem in lxr]
print(columns)
var_global_yearly_mean_anomaly = pd.DataFrame(index=range(1850, 2101), columns=columns)
['AWI-CM-1-1-MR.ssp585.Amon', 'MPI-ESM1-2-HR.ssp585.Amon', 'MPI-ESM1-2-HR.historical.Amon', 'AWI-CM-1-1-MR.historical.Amon']
  1. For all datasets in our dictionary, we calculate the anomaly by substracting the the global mean of the reference period from the global yearly mean.

  2. We add the results to the dataframe. Only years that are in the dataset can be filled into the dataframe.

for key in xr_dset_dict.keys():
    print([".".join(key.split(".")[1:4])])
    datatoappend = global_yearly_mean(xr_dset_dict[key])[0, :] - var_refmean
    years = list(xr_dset_dict[key].get(variable_id).groupby("time.year").groups.keys())
    var_global_yearly_mean_anomaly.loc[
        years, ".".join(key.split(".")[1:4])
    ] = datatoappend
['AWI-CM-1-1-MR.ssp585.Amon']
['MPI-ESM1-2-HR.ssp585.Amon']
['MPI-ESM1-2-HR.historical.Amon']
['AWI-CM-1-1-MR.historical.Amon']

Plotting the multimodel comparison of the global annual mean anomaly#

plot = var_global_yearly_mean_anomaly.hvplot.line(
    xlabel="Year",
    ylabel=label,
    value_label=label,
    legend="top_left",
    title="Global and yearly mean anomaly in comparison to 1851-1880",
    grid=True,
    height=600,
    width=820,
)
#hvplot.save(plot, "globalmean-yearlymean-tas.html")
plot

Used data#

We acknowledge the CMIP community for providing the climate model data, retained and globally distributed in the framework of the ESGF. The CMIP data of this study were replicated and made available for this study by the DKRZ.”