Advanced Interactive time series plot of a global yearly mean anomaly with xarray, pandas and hvplot using CMIP6 data#

We will show how to combine, analyse and quickly plot data of the Coupled Model Intercomparison Project CMIP6. We will choose one variable of multiple experiments and compare the results of different models. In particular, we analyse the historical experiment in combination with one of the shared socioeconomic pathway (ssp) experiments.

This Jupyter notebook is meant to run in the Jupyterhub server of the German Climate Computing Center DKRZ. The DKRZ hosts the CMIP data pool including 4 petabytes of CMIP6 data. Please, choose the Python 3 unstable kernel on the Kernel tab above, it contains all the common geoscience packages. See more information on how to run Jupyter notebooks at DKRZ here.

Running this Jupyter notebook in your premise, which is also known as client-side computing, will require that you install the necessary packages and download data.

Learning Objectives#

How to access a dataset from the DKRZ CMIP data pool with intake-esm
How to calculate global field means and yearly means with xarray and numpy
How to visualize the results with hvplot

import intake
import pandas as pd
import hvplot.pandas
import numpy as np

First, we need to set the variable_id which we like to plot. This is a selection of the most often analysed variables:

tas is Near-surface Air Temperature
pr is Precipitation
psl is Sea level pressure
tasmax is Near-surface Maximum Air Temperature
tasmin is Near-surface Minimum Air Temperature
clt is Total Cloud Cover Percentage

Choose the variable:

# Choose one of
# pr, psl, tas, tasmax, tasmin, clt
variable_id = "tas"
%store -r

# get formating done automatically according to style `black`
#%load_ext lab_black

The intake-esm software reads Catalogs which we use to find, access and load the data we are interested in. Daily updated CMIP6 catalogs are provided in DKRZ’s cloud swift.

Similar to the shopping catalog at your favorite online bookstore, the intake catalog contains information (e.g. model, variables, and time range) about each dataset that you can access before loading the data. It means that thanks to the catalog, you can find out where the “book” is just by using some keywords and you do not need to hold it in your hand to know the number of pages.

We specify the catalog descriptor for the intake package. The catalog descriptor is created by the DKRZ developers that manage the catalog, you do not need to care so much about it, knowing where it is and loading it is enough:

# Path to master catalog on the DKRZ server
#dkrz_catalog=intake.open_catalog(["https://dkrz.de/s/intake"])
#
#only for the web page we need to take the original link:
parent_col=intake.open_catalog(["https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml"])
list(parent_col)

['dkrz_cmip5_archive',
 'dkrz_cmip5_disk',
 'dkrz_cmip6_cloud',
 'dkrz_cmip6_disk',
 'dkrz_cordex_disk',
 'dkrz_dyamond-winter_disk',
 'dkrz_era5_disk',
 'dkrz_monsoon_disk',
 'dkrz_mpige_disk',
 'dkrz_nextgems_disk',
 'dkrz_palmod2_disk']

col=parent_col["dkrz_cmip6_disk"]

Browsing through the catalog#

We define a query and specify keyvalues for search facets in order to search the catalogs. Possible Search facets are all columns of the table identified by its name.

In this example, we compare the MPI-ESM1-2-HR model of the Max-Planck-Institute and the AWI-CM-1-1-MR from the Alfred Wegner Institute for 3 different experiments. CMIP6 comprises many experiments with lots of simulation members and we will use some of them. You can find more information in the CMIP6 Model and Experiment Documentation.

We will concatenate historical experiment with two different Shared Socioeconomic Pathway (SSPs) scenarios. The historical experiment uses best estimates for anthropogenic and natural forcing for simulating the historical period 1850-2014. SSPs are scenarios of projected socioeconomic global changes.

historical

This experiments usese the best estimates for anthropogenic and natural forcing for simulating the historical period 1850-2014.
ssp245

The 45 corresponds to the growth in radiative forcing reached by 2100, in this case, 4.5 W/m2 or ~650 ppm CO2 equivalent
ssp585

The 85 corresponds to the growth in radiative forcing reached by 2100, in this case, 8.5 W/m2

query = dict(
    variable_id=variable_id,
    table_id="Amon",
    experiment_id=["historical", "ssp585"], # we have excluded "ssp245" from the list because it would take 15min to finish the nb
    source_id=["MPI-ESM1-2-HR", "AWI-CM-1-1-MR"],
)
col.df["uri"]=col.df["uri"].str.replace("lustre/","lustre02/") 
cat = col.search(**query)

Let’s have a look into the new catalog subset cat. We use the underlaying pandas dataframe object df to display the catalog as a table. Each row refers to one file.

cat.df

	activity_id	institution_id	source_id	experiment_id	member_id	table_id	variable_id	grid_label	dcpp_init_year	version	time_range	project	format	uri
0	CMIP	AWI	AWI-CM-1-1-MR	historical	r1i1p1f1	Amon	tas	gn	NaN	v20200720	185001-185012	CMIP6	netcdf	/work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-...
1	CMIP	AWI	AWI-CM-1-1-MR	historical	r1i1p1f1	Amon	tas	gn	NaN	v20200720	185101-185112	CMIP6	netcdf	/work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-...
2	CMIP	AWI	AWI-CM-1-1-MR	historical	r1i1p1f1	Amon	tas	gn	NaN	v20200720	185201-185212	CMIP6	netcdf	/work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-...
3	CMIP	AWI	AWI-CM-1-1-MR	historical	r1i1p1f1	Amon	tas	gn	NaN	v20200720	185301-185312	CMIP6	netcdf	/work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-...
4	CMIP	AWI	AWI-CM-1-1-MR	historical	r1i1p1f1	Amon	tas	gn	NaN	v20200720	185401-185412	CMIP6	netcdf	/work/ik1017/CMIP6/data/CMIP6/CMIP/AWI/AWI-CM-...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1272	ScenarioMIP	DWD	MPI-ESM1-2-HR	ssp585	r2i1p1f1	Amon	tas	gn	NaN	v20190710	208001-208412	CMIP6	netcdf	/work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/...
1273	ScenarioMIP	DWD	MPI-ESM1-2-HR	ssp585	r2i1p1f1	Amon	tas	gn	NaN	v20190710	208501-208912	CMIP6	netcdf	/work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/...
1274	ScenarioMIP	DWD	MPI-ESM1-2-HR	ssp585	r2i1p1f1	Amon	tas	gn	NaN	v20190710	209001-209412	CMIP6	netcdf	/work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/...
1275	ScenarioMIP	DWD	MPI-ESM1-2-HR	ssp585	r2i1p1f1	Amon	tas	gn	NaN	v20190710	209501-209912	CMIP6	netcdf	/work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/...
1276	ScenarioMIP	DWD	MPI-ESM1-2-HR	ssp585	r2i1p1f1	Amon	tas	gn	NaN	v20190710	210001-210012	CMIP6	netcdf	/work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DWD/...

1277 rows × 14 columns

Loading the data#

We can load the data into memory with only one code line. The catalog’s to_dataset_dict command will aggregate and combine the data from files into comprehending xarray datasets using the specifications from the intake descriptor file. The result is a dict-type object where keys are the highest granularity which cannot be combined or aggregated anymore and values are the datasets.

xr_dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks":{"time":1}})
print(xr_dset_dict.keys())

/tmp/ipykernel_1312/4092023078.py:1: DeprecationWarning: cdf_kwargs and zarr_kwargs are deprecated and will be removed in a future version. Please use xarray_open_kwargs instead.
  xr_dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks":{"time":1}})

--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.source_id.experiment_id.table_id.grid_label'

100.00% [4/4 11:08<00:00]

dict_keys(['ScenarioMIP.AWI-CM-1-1-MR.ssp585.Amon.gn', 'ScenarioMIP.MPI-ESM1-2-HR.ssp585.Amon.gn', 'CMIP.MPI-ESM1-2-HR.historical.Amon.gn', 'CMIP.AWI-CM-1-1-MR.historical.Amon.gn'])

xr_dset_dict['ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Amon.gn']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[9], line 1
----> 1 xr_dset_dict['ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Amon.gn']

KeyError: 'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Amon.gn'

Global yearly mean calculation#

We define a function for calculation the global mean by weighting grid boxes according to their surface area. Afterwards, we groupby years and calculate the yearly mean. This all is done by using xarray.

def global_yearly_mean(hist_dsets):
    # Get weights
    weights = np.cos(np.deg2rad(hist_dsets.lat))
    # Tas weighted
    variable_array = hist_dsets.get(variable_id)
    variable_weights = variable_array.weighted(weights)
    # Tas global mean:
    variable_globalmean = variable_weights.mean(("lon", "lat"))
    # Tas yearly mean:
    variable_gmym = variable_globalmean.groupby("time.year").mean("time")
    return variable_gmym.values

Historical reference period#

We define the period from 1851-1880 as our reference period. In the following, we calculate future simulation anomalies from that period. But first things first, we need the global yearly mean for that period:

historical = [key for key in xr_dset_dict.keys() if "historical" in key][0]
dshist = xr_dset_dict[historical]
dshist_ref = dshist.sel(time=dshist.time.dt.year.isin(range(1851, 1881)))
# 10member
var_ref = global_yearly_mean(dshist_ref)
var_refmean = var_ref.mean()

Get Meta Data#

In order to label the plot correctly, we retrieve the attributes long_name and unitsfrom the chosen variable.

lname = dshist.get(variable_id).attrs["long_name"]
units = dshist.get(variable_id).attrs["units"]
label = "Delta " + lname + "[" + units + "]"

Calculate Anomaly#

We save the result - the anomaly values - in a pandas dataframe var_global_yearly_mean_anomaly. We use this dataframe object because it features the plot function hvplot which we would like to use. We start by creating this dataframe based on the datasets which we got from intake.

lxr = list(xr_dset_dict.keys())
columns = [".".join(elem.split(".")[1:4]) for elem in lxr]
print(columns)
var_global_yearly_mean_anomaly = pd.DataFrame(index=range(1850, 2101), columns=columns)

['AWI-CM-1-1-MR.ssp585.Amon', 'MPI-ESM1-2-HR.ssp585.Amon', 'MPI-ESM1-2-HR.historical.Amon', 'AWI-CM-1-1-MR.historical.Amon']

For all datasets in our dictionary, we calculate the anomaly by substracting the the global mean of the reference period from the global yearly mean.
We add the results to the dataframe. Only years that are in the dataset can be filled into the dataframe.

for key in xr_dset_dict.keys():
    print([".".join(key.split(".")[1:4])])
    datatoappend = global_yearly_mean(xr_dset_dict[key])[0, :] - var_refmean
    years = list(xr_dset_dict[key].get(variable_id).groupby("time.year").groups.keys())
    var_global_yearly_mean_anomaly.loc[
        years, ".".join(key.split(".")[1:4])
    ] = datatoappend

['AWI-CM-1-1-MR.ssp585.Amon']

['MPI-ESM1-2-HR.ssp585.Amon']

['MPI-ESM1-2-HR.historical.Amon']

['AWI-CM-1-1-MR.historical.Amon']

Plotting the multimodel comparison of the global annual mean anomaly#

plot = var_global_yearly_mean_anomaly.hvplot.line(
    xlabel="Year",
    ylabel=label,
    value_label=label,
    legend="top_left",
    title="Global and yearly mean anomaly in comparison to 1851-1880",
    grid=True,
    height=600,
    width=820,
)

#hvplot.save(plot, "globalmean-yearlymean-tas.html")

plot

Used data#

We acknowledge the CMIP community for providing the climate model data, retained and globally distributed in the framework of the ESGF. The CMIP data of this study were replicated and made available for this study by the DKRZ.”

Advanced Interactive time series plot of a global yearly mean anomaly with xarray, pandas and hvplot using CMIP6 data

Contents