Advanced Summer Days calculation with xarray using CMIP6 data#

We will show here how to count the annual summer days for a particular geolocation of your choice using the results of a climate model, in particular, we can chose one of the historical or one of the shared socioeconomic pathway (ssp) experiments of the Coupled Model Intercomparison Project CMIP6.

Thanks to the data and computer scientists Marco Kulüke, Fabian Wachsmann, Regina Kwee-Hinzmann, Caroline Arnold, Felix Stiehler, Maria Moreno, and Stephan Kindermann at DKRZ for their contribution to this notebook.

In this use case you will learn the following:

  • How to access a dataset from the DKRZ CMIP6 model data archive

  • How to count the annual number of summer days for a particular geolocation using this model dataset

  • How to visualize the results

You will use:

  • Intake for finding the data in the catalog of the DKRZ archive

  • Xarray for loading and processing the data

  • hvPlot for visualizing the data in the Jupyter notebook and save the plots in your local computer

0. Load Packages#

import numpy as np                    # fundamental package for scientific computing
import pandas as pd                   # data analysis and manipulation tool
import xarray as xr                   # handling labelled multi-dimensional arrays
import intake                         # to find data in a catalog, this notebook explains how it works
from ipywidgets import widgets        # to use widgets in the Jupyer Notebook
from geopy.geocoders import Nominatim # Python client for several popular geocoding web services
import folium                         # visualization tool for maps
import hvplot.pandas                  # visualization tool for interactive plots

1. Which dataset do we need? -> Choose Shared Socioeconomic Pathway, Place, and Year#

# Produce the widget where we can select what experiment we are interested on 

#experiments = {'historical':range(1850, 2015), 'ssp126':range(2015, 2101), 
#               'ssp245':range(2015, 2101), 'ssp370':range(2015, 2101), 'ssp585':range(2015, 2101)}
#experiment_box = widgets.Dropdown(options=experiments, description="Select  experiment: ", disabled=False,)
#display(experiment_box)
# Produce the widget where we can select what geolocation and year are interested on 

#print("Feel free to change the default values.")

#place_box = widgets.Text(description="Enter place:", value="Hamburg")
#display(place_box)

#x = experiment_box.value
#year_box = widgets.Dropdown(options=x, description="Select year: ", disabled=False,)
#display(year_box)
pb="Hamburg"
yb=2021
eb="ssp370"
%store -r

1.1 Find Coordinates of chosen Place#

If ambiguous, the most likely coordinates will be chosen, e.g. “Hamburg” results in “Hamburg, 20095, Deutschland”, (53.55 North, 10.00 East)

# We use the module Nominatim gives us the geographical coordinates of the place we selected above

geolocator = Nominatim(user_agent="any_agent")
location = geolocator.geocode(pb)

print(location.address)
print((location.latitude, location.longitude))
Hamburg, Deutschland
(53.550341, 10.000654)

1.2 Show Place on a Map#

# We use the folium package to plot our selected geolocation in a map

m = folium.Map(location=[location.latitude, location.longitude])
tooltip = location.latitude, location.longitude
folium.Marker([location.latitude, location.longitude], tooltip=tooltip).add_to(m)
display(m)
Make this Notebook Trusted to load map: File -> Trust Notebook

We have defined the place and time. Now, we can search for the climate model dataset.

2. Intake Catalog#

2.1 Load the Intake Catalog#

# Path to master catalog on the DKRZ server
#dkrz_catalog=intake.open_catalog(["https://dkrz.de/s/intake"])
#
#only for the web page we need to take the original link:
parent_col=intake.open_catalog(["https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml"])
list(parent_col)

# Open the catalog with the intake package and name it "col" as short for "collection"
col=parent_col["dkrz_cmip6_disk"]

2.2 Browse the Intake Catalog#

In this example we chose the Max-Planck Earth System Model in High Resolution Mode (“MPI-ESM1-2-HR”) and the maximum temperature near surface (“tasmax”) as variable. We also choose an experiment. CMIP6 comprises several kind of experiments. Each experiment has various simulation members. you can find more information in the CMIP6 Model and Experiment Documentation.

# Store the name of the model we chose in a variable named "climate_model"

climate_model = "MPI-ESM1-2-HR" # here we choose Max-Plack Institute's Earth Sytem Model in high resolution

# This is how we tell intake what data we want

query = dict(
    source_id      = climate_model, # the model 
    variable_id    = "tasmax", # temperature at surface, maximum
    table_id       = "day", # daily maximum
    experiment_id  = eb, # what we selected in the drop down menu,e.g. SSP2.4-5 2015-2100
    member_id      = "r10i1p1f1", # "r" realization, "i" initialization, "p" physics, "f" forcing
)

# Intake looks for the query we just defined in the catalog of the CMIP6 data pool at DKRZ
col.df["uri"]=col.df["uri"].str.replace("lustre/","lustre02/") 
cat = col.search(**query)

del col

# Show query results
cat.df
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label dcpp_init_year version time_range project format uri
0 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20150101-20191231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
1 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20200101-20241231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
2 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20250101-20291231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
3 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20300101-20341231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
4 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20350101-20391231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
5 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20400101-20441231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
6 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20450101-20491231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
7 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20500101-20541231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
8 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20550101-20591231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
9 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20600101-20641231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
10 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20650101-20691231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
11 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20700101-20741231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
12 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20750101-20791231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
13 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20800101-20841231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
14 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20850101-20891231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
15 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20900101-20941231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
16 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 20950101-20991231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...
17 ScenarioMIP DKRZ MPI-ESM1-2-HR ssp370 r10i1p1f1 day tasmax gn NaN v20190710 21000101-21001231 CMIP6 netcdf /work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ...

2.3 Create Dictionary and get Data into one Object#

# cdf_kwargs are given to xarray.open_dataset
# cftime is like datetime but extends to all four digit years and many calendar types

dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}, "use_cftime": True})
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.source_id.experiment_id.table_id.grid_label'
/tmp/ipykernel_883/1061271266.py:4: DeprecationWarning: cdf_kwargs and zarr_kwargs are deprecated and will be removed in a future version. Please use xarray_open_kwargs instead.
  dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}, "use_cftime": True})
100.00% [1/1 00:07<00:00]
# get all data into one object

for key, value in dset_dict.items():
    model = key.split(".")[2]  # extract model name from key
    tasmax_xr = value["tasmax"].squeeze()  # extract variable from dataset
tasmax_xr
<xarray.DataArray 'tasmax' (time: 31411, lat: 192, lon: 384)>
dask.array<getitem, shape=(31411, 192, 384), dtype=float32, chunksize=(1827, 192, 384), chunktype=numpy.ndarray>
Coordinates:
  * time            (time) object 2015-01-01 12:00:00 ... 2100-12-31 12:00:00
  * lat             (lat) float64 -89.28 -88.36 -87.42 ... 87.42 88.36 89.28
  * lon             (lon) float64 0.0 0.9375 1.875 2.812 ... 357.2 358.1 359.1
    height          float64 2.0
    member_id       <U9 'r10i1p1f1'
    dcpp_init_year  float64 nan
Attributes:
    standard_name:  air_temperature
    long_name:      Daily Maximum Near-Surface Air Temperature
    comment:        maximum near-surface (usually, 2 meter) air temperature (...
    units:          K
    cell_methods:   area: mean time: maximum
    cell_measures:  area: areacella

3. Select Year and Look at (Meta) Data#

tasmax_year_xr = tasmax_xr.sel(time=str(yb))

# Let's have a look at the xarray data array

tasmax_year_xr
<xarray.DataArray 'tasmax' (time: 365, lat: 192, lon: 384)>
dask.array<getitem, shape=(365, 192, 384), dtype=float32, chunksize=(365, 192, 384), chunktype=numpy.ndarray>
Coordinates:
  * time            (time) object 2021-01-01 12:00:00 ... 2021-12-31 12:00:00
  * lat             (lat) float64 -89.28 -88.36 -87.42 ... 87.42 88.36 89.28
  * lon             (lon) float64 0.0 0.9375 1.875 2.812 ... 357.2 358.1 359.1
    height          float64 2.0
    member_id       <U9 'r10i1p1f1'
    dcpp_init_year  float64 nan
Attributes:
    standard_name:  air_temperature
    long_name:      Daily Maximum Near-Surface Air Temperature
    comment:        maximum near-surface (usually, 2 meter) air temperature (...
    units:          K
    cell_methods:   area: mean time: maximum
    cell_measures:  area: areacella

We see not only the numbers, but also information about it, such as long name, units, and the data history. This information is called metadata.

4. Compare Model Grid Cell with chosen Location#

# Find nearest model coordinate by finding the index of the nearest grid point

abslat = np.abs(tasmax_year_xr["lat"] - location.latitude)
abslon = np.abs(tasmax_year_xr["lon"] - location.longitude)
c = np.maximum(abslon, abslat)

([xloc], [yloc]) = np.where(c == np.min(c)) # xloc and yloc are the indices of the neares model grid point
# Draw map again

m = folium.Map(location=[location.latitude, location.longitude], zoom_start=8)


tooltip = location.latitude, location.longitude
folium.Marker(
    [location.latitude, location.longitude],
    tooltip=tooltip,
    popup="Location selected by You",
).add_to(m)

#
tooltip = float(tasmax_year_xr["lat"][yloc].values), float(tasmax_year_xr["lon"][xloc].values)
folium.Marker(
    [tasmax_year_xr["lat"][yloc], tasmax_year_xr["lon"][xloc]],
    tooltip=tooltip,
    popup="Model Grid Cell Center",
).add_to(m)

# Define coordinates of model grid cell (just for visualization)
rect_lat1_model = (tasmax_year_xr["lat"][yloc - 1] + tasmax_year_xr["lat"][yloc]) / 2
rect_lon1_model = (tasmax_year_xr["lon"][xloc - 1] + tasmax_year_xr["lon"][xloc]) / 2
rect_lat2_model = (tasmax_year_xr["lat"][yloc + 1] + tasmax_year_xr["lat"][yloc]) / 2
rect_lon2_model = (tasmax_year_xr["lon"][xloc + 1] + tasmax_year_xr["lon"][xloc]) / 2

# Draw model grid cell
folium.Rectangle(
    bounds=[[rect_lat1_model, rect_lon1_model], [rect_lat2_model, rect_lon2_model]],
    color="#ff7800",
    fill=True,
    fill_color="#ffff00",
    fill_opacity=0.2,
).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

Climate models have a finite resolution. Hence, models do not provide the data of a particular point, but the mean over a model grid cell. Take this in mind when comparing model data with observed data (e.g. weather stations).

Now, we will visualize the daily maximum temperature time series of the model grid cell.

5. Draw Temperature Time Series and Count Summer days#

The definition of a summer day varies from region to region. According to the German Weather Service, “a summer day is a day on which the maximum air temperature is at least 25.0 °C”. Depending on the place you selected, you might want to apply a different threshold to calculate the summer days index.

tasmax_year_place_xr = tasmax_year_xr[:, yloc, xloc] - 273.15 # Convert Kelvin to °C
tasmax_year_place_df = pd.DataFrame(index = tasmax_year_place_xr['time'].values, 
                                    columns = ['Temperature', 'Summer Day Threshold']) # create the dataframe

tasmax_year_place_df.loc[:, 'Model Temperature'] = tasmax_year_place_xr.values # insert model data into the dataframe
tasmax_year_place_df.loc[:, 'Summer Day Threshold'] = 25 # insert the threshold into the dataframe

# Plot data and define title and legend
tasmax_year_place_df.hvplot.line(y=['Model Temperature', 'Summer Day Threshold'], 
                                 value_label='Temperature in °C', legend='bottom', 
                                 title='Daily maximum Temperature near Surface for '+pb, 
                                 height=500, width=620)

As we can see, the maximum daily temperature is highly variable over the year. As we are using the mean temperature in a model grid cell, the amount of summer days might we different that what you would expect at a single location.

# Summer days index calculation
no_summer_days_model = tasmax_year_place_xr[tasmax_year_place_xr.load() > 25].size # count the number of summer days

# Print results in a sentence
print("According to the German Weather Service definition, in the " +eb +" experiment the " 
      +climate_model +" model shows " +str(no_summer_days_model) +" summer days for " + pb 
      + " in " + str(yb) +".")
According to the German Weather Service definition, in the ssp370 experiment the MPI-ESM1-2-HR model shows 2 summer days for Hamburg in 2021.

Try another location and year