Lesson 3 - Working with WRF-Hydro inputs and output files

Overview

We will briefly discuss working with some of the WRF-Hydro input and output (IO) files. The IO files for WRF-Hydro generally are standard netCDF4 files, and there are many way to work with these data. In this lesson we simply cover a few Python libraries and commands that will be needed for later lessons in this tutorial. This is by no means a comprehensive guide to working with netCDF files.

More information on working with netCDF files can be found on the Unidata website at https://www.unidata.ucar.edu/software/netcdf/.

Introduction to our Python environment and libraries

We are using Python 3 for all exercises in this tutorial. There are also a number of tools developed in R that have similar capabilities, but we have chosen Python here for its ease of use and strong netCDF4 and geospatial processing support.

Libraries

We are using the Miniconda distribution of Python 3 with the Python libraries listed below and their dependencies. Miniconda is a stripped down version of Anaconda, a Python distribution for scientific computing. You can obtain Miniconda from the Continuum Analytics website at https://conda.io/miniconda.html.

There are many resources for learning more about miniconda, conda, and Python. Answers to just about any question can be found with a little searching on either Google or Stack Overflow.

Below are the libraries we will be using. These libraries have been installed for you if you are running this tutorial in the wrfhydro/training Docker container. Otherwise, you will need to install miniconda and the required python libraries if running on your own system.

NOTE: The libraries listed below are only the required Python libraries. These Python libraries also require a number of system libraries that you may or may not need to install on your own system. Notably, you WILL need the NETCDF4 system library

Required Python libraries:

xarray: xarray is an open-source project for working with self-describing Common Data Model scientific datasets, primarily in netCDF4 format. It eases many of the pain-points in loading, manipulating, and plotting multidimensional arrays. xarray is well documented and you can learn more by reading their documentation at https://xarray.pydata.org/en/stable/ or https://github.com/pydata/xarray.

netCDF4: Library for reading and writing netCDF files. This is a required dependency for xarray if you will be using xarray with netCDF4 datasets.

xarray datasets

Below is a brief list of the Python commands we will be running, virtually all of the commands are from the xarray package, indicated by a xr. prefix preceding the command.

xr.open_dataset('path-to-netcdf-file'): Open a single netCDF file in xarray.

Note: This command only opens the netCDF file and reads header information, it does not load any of the data payload into memory This is a handy feature of netCDF4 and xarray that allows for viewing basic information about very large netCDF files without loading into memory.

xr.open_mfdataset(list-of-netcdf-files or 'path-to-netcdf-directory', combine=’by_coords’): Similar to xr.open_dataset, xr.open_mfdataset opens multiple netCDF files as a single dataset, concatenating them along a common dimension(s).

After we have opened the datasets there are a few more methods we will use on these datasets.

my_dataset = xr.open_dataset('path-to-netcdf-file')

my_dataset.info(): Print information about the netCDF file, similar to ncdump command line utility.

my_dataset.load(): Load the netCDF4 data payload into memory

my_dataset.myvariable: Access a variable named myvariable from the dataset.

my_dataset.myvariable.plot(): Plot the variable my variable. Xarray will attempt to guess the axes, and in the case of spatial or timeseries data with only 1 dimension it typically does a good job. However, you may need to specify this manually if not.

There is MUCH more you can do with xarray, but that covers the basic commands we will use in this training.

In the next section we will go over a couple of basic examples of plotting some of the outputs from our ~/wrf-hydro-training/output/lesson2/run_gridded_default simulation.

Examples

2D spatial with no temporal component

GEOGRID

We will start with plotting a couple of variables from our geogrid file.

Load the libraries

Open the geogrid dataset

Print some info about the dataset

Plot the HGT_M variable, the topographic height in meters for each grid cell

Plot the LU_INDEX variable, the dominant land-use class index for each grid cell

Plot the SCT_DOM variable, the dominant soil type for each grid cell

So how do you know what these values mean? You can check the parameter tables that come with the code to check lookup values. For example, the MPTABLE.TBL file lists the land cover categories.

The SOILPARM.TBL file lists the soil types.

FULLDOM

Next we will look at the high-resolution routing domain file, Fulldom_hires.nc.

Open the Fulldom dataset

Print some info about the dataset

Plot the TOPOGRAPHY variable, the high-resolution elevation layer

This is the layer that controls much of the terrain routing. You'll notice the higher resolution of this layer compared to the HGT_M field in the geogrid.

Plot the CHANNELGRID variable, the location of channel cells on the high-resolution routing grid

You should notice an odd gap in the gridded channel network. This is where the lake sits in this particular configuration (gridded routing with a lake).

SOIL_PROPERTIES

Let's also take a look in the NoahMP 2D/3D parameter file, soil_properties.nc. This is actually a bit of a misnomer, as this file contains parameters related to vegetation, surface, and soil properties. Vegetation and surface properties are in 2D, while soil properties can also (theoretically) vary with depth and are therefore in 3D. All are on the LSM grid.

Open the soil_properties dataset

Print some info about the file

Plot the soil porosity (smcmax)

Default parameters by soil texture class are mapped from the SOILPARM.TBL lookup table to the soil type layer in the geogrid (SCT_DOM) to create an initial distribution of porosity values.

Plot the vegetation height (hvt)

Similarly, for default configurations, vegetation height values are pulled from MPTABLE.TBL and mapped via the LU_INDEX field in the geogrid.

2D spatial with no temporal component

Now we will plot a timeseries from multiple netcdf files using the open_mfdataset command. We will plot a hydrograph at a gage point.

Open the chanobs multi-file dataset We are going to use the *CHANOBS* files because it will limit outputs only to those from grid cells that we have specified as an observation point. We will discuss more about this and other output files in Lesson 4.

NOTE: open_mfdataset supports wildcards for pattern matching but requires that the path be absolute with no tilde

We will use wildcards * to open all files that contain 'CHANOBS' in the name.

**NOTE: Because we are opening multiple files, we need to tell xarray how to concatenate them. Because this is a timeseries with time dimension called 'time' we will specify 'time' as the concatenation dimension.

Print some info about the file

Here we can see that we have a time dimension of length 168 corresponding to the 168 hourly output files from our simulation run_gridded_default.

Plot a hydrograph for 1 gage point

Now we will select 1 gage from the dataset and plot our streamflow variable. For more information on indexing and selecting data with xarray see the xarray documentation

Next up - Run-time options

This concludes lesson 3. In the next lesson we will discuss run-time options and experiment with different options and viewing the effect on the model behavoir using xarray.

IT IS BEST TO EITHER SHUTDOWN THIS LESSON OR CLOSE IT BEFORE PROCEEDING TO THE NEXT LESSON TO AVOID POSSIBLY EXCEEDING ALLOCATED MEMORY. Shutdown the lesson be either closing the browser tab for the lesson or selecting Kernel -> Shut Down Kernel in JupyterLab.

© UCAR 2020