Background

We are using WRF-Hydro to predict streamflow for Fourmile Creek at the Orodell USGS gage for the 2013 snowmelt period. We ran WRF-Hydro with NoahMP as the LSM for a 3-year spinup period and then did a daily output run for 5 months starting March 1, 2013. We want to evaluate model performance in estimating streamflow with and without overland, subsurface, and groundwater routing active.

Load the rwrfhydro package.

library("rwrfhydro")

## To check rwrfhydro updates run: CheckForUpdates()

Set a data path to the Fourmile Creek test case.

dataPath <- '~/wrfHydroTestCases/Fourmile_Creek/'

Import modelled and observed datasets

Model 1: Only channel routing turned on (hourly model run).

modStrd.chrt.fc <- ReadFrxstPts(paste0(dataPath, '/RUN.RTTESTS/OUTPUT_CHRT_DAILY/frxst_pts_out.txt'))

Model 2: All WRF-Hydro routing options turned on (hourly model run).

modStrd.allrt.fc <- ReadFrxstPts(paste0(dataPath, '/RUN.RTTESTS/OUTPUT_ALLRT_DAILY/frxst_pts_out.txt'))

USGS gage observed data at 5-minute intervals. Find the nearest gage to the forecast point in the above files. (The following approach can be used when multiple forecast points are output by the model. The ddply function essentially performs a loop over unique st_id passing the corresponding subset of the data frame modStrd.chrt.fc to the function which just returns the first row’s lon and lat.)

library(plyr)
fcLocation <- ddply(modStrd.chrt.fc, .(st_id), function(df) df[1,c('st_lon','st_lat')])
near <- FindUsgsStns(stnLon=c(fcLocation$st_lon), stnLat=c(fcLocation$st_lat), within=.005)
near

stnLon	stnLat	within	siteType	hasDataTypeCd	agency_cd	site_no	station_nm	site_tp_cd	dec_lat_va	dec_long_va	queryTime
-105.3278	40.01932	0.005	ST	iv	USGS	06727500	FOURMILE CREEK AT ORODELL, CO	ST	40.01867	-105.3263	2015-04-29 12:10:00

The following checks to see if the data are local and gets them if not. The ‘00060’ product (streamflow) is returned and then made pretty, including conversion to metric.

dbPath <- '~/wrfHydroTestCases/usgsDb/'
obsStr5min.fc <- 
  PrettyUsgs(QueryHaveSite(near$site_no, path=dbPath, ret='00060', get=TRUE))
names(obsStr5min.fc)[4] <- 'q_cms'

Plot hydrographs

Compare hydrographs for the full model run.

PlotFluxCompare(obsStr5min.fc, "q_cms", modStrd.chrt.fc, "q_cms", strDf.mod2=modStrd.allrt.fc, 
     strCol.mod2="q_cms", labelObs="Observed Fourmile Creek at Orodell", 
     labelMod1="Channel Routing Only", labelMod2="All Routing", 
     title="Streamflow: Fourmile Creek")

Now limit the plot to the peak May flow period only. The reported stats are updated to the new time period. (Note that the R warning is innocuous because the subset adjusts for timezone, so it’s ok that the timezones don’t match.)

PlotFluxCompare(obsStr5min.fc, "q_cms", modStrd.chrt.fc, "q_cms", strDf.mod2=modStrd.allrt.fc, 
     strCol.mod2="q_cms", labelObs="Observed Fourmile Creek at Orodell", 
     labelMod1="Channel Routing Only", labelMod2="All Routing", 
     title="Streamflow: Fourmile Creek", 
     stdate=as.POSIXct("2013-05-01 00:00:00", format="%Y-%m-%d %H:%M:%S", tz="UTC"), 
     enddate=as.POSIXct("2013-05-31 00:00:00", format="%Y-%m-%d %H:%M:%S", tz="UTC"))

Review flow duration curves

NOTE: You generally evaluate flow duration curves and staistics over much longer time periods (e.g., multiple years) than what we demo here. To make the test case more portable, we are only evaluating once-a-day model output over 5 months.

Calculate percent exceedances for flow duration curves. Note that we need to subset the observations to match our model run output times, and vice versa.

obsStr5min.comp.fc <- CalcFdc(subset(obsStr5min.fc, POSIXct %in% c(modStrd.chrt.fc$POSIXct)))
modStrd.chrt.comp.fc <- CalcFdc(subset(modStrd.chrt.fc, POSIXct %in% c(obsStr5min.comp.fc$POSIXct)))
modStrd.allrt.comp.fc <- CalcFdc(subset(modStrd.allrt.fc, POSIXct %in% c(obsStr5min.comp.fc$POSIXct)))

Compare how the models are doing predicting flow values that will be exceeded 20% of the time. First, calculate the fitted spline functions.

fdc.obsStr5min.comp.fc <- CalcFdcSpline(obsStr5min.comp.fc)
fdc.modStrd.chrt.comp.fc <- CalcFdcSpline(modStrd.chrt.comp.fc)
fdc.modStrd.allrt.comp.fc <- CalcFdcSpline(modStrd.allrt.comp.fc)

Then, evaluate at the 20% exceedance percentage (high flows).

fdc.obsStr5min.comp.fc(0.2)
fdc.modStrd.chrt.comp.fc(0.2)
fdc.modStrd.allrt.comp.fc(0.2)

## [1] 0.7706495
## [1] 0.9271272
## [1] 0.8216668

Now try the 80% exceedance percentage (low flows).

fdc.obsStr5min.comp.fc(0.8)
fdc.modStrd.chrt.comp.fc(0.8)
fdc.modStrd.allrt.comp.fc(0.8)

## [1] 0.08530777
## [1] 0.1153106
## [1] 0.1228697

Plot flow duration curves for a more complete picture. This tool will do the date matching for us, so no need to subset the datasets.

PlotFdcCompare(obsStr5min.fc, "q_cms", modStrd.chrt.fc, "q_cms", strDf.mod2=modStrd.allrt.fc, 
     strCol.mod2="q_cms", labelObs="Observed Fourmile Creek", 
     labelMod1="Channel Routing Only", labelMod2="All Routing")

## [1] "Combined min flow for y-axis (capped at 0.001): 0.03114853126"
## [1] "Combined max flow for y-axis: 2.5201993474"

Review model performance statistics

Calculate model performance stats (special formatting comands hidden). Again, this tool does the date matching for us.

CalcModPerf(modStrd.chrt.fc, obsStr5min.fc)

	n	nse	nselog	cor	rmse	rmsenorm	bias	mae	errcom	errmaxt	errfdc
units	count	unitless	unitless	unitless	flux units	unitless	percent	flux units	hours\|days	hours\|days	flux units
t	116	0.76	0.68	0.88	0.24	9.66	12.2	0.14	NA	NA	0.07
daily	116	0.76	0.68	0.88	0.24	9.66	12.2	0.14	0	0	0.07
monthly	5	0.9	0.79	0.96	0.12	11.41	14.5	0.09	-1.4	-2	NA
yearly	1	NA	NA	NA	NA	NA	12.2	0.05	-1	0	NA
max10	12	0.32	0.35	0.74	0.39	26.41	-12.1	0.32	NA	NA	NA
min10	12	-602.36	-20.18	0.35	0.21	672.75	159.8	0.08	NA	NA	NA

CalcModPerf(modStrd.allrt.fc, obsStr5min.fc)

	n	nse	nselog	cor	rmse	rmsenorm	bias	mae	errcom	errmaxt	errfdc
units	count	unitless	unitless	unitless	flux units	unitless	percent	flux units	hours\|days	hours\|days	flux units
t	116	0.87	0.87	0.94	0.17	6.97	5.7	0.09	NA	NA	0.04
daily	116	0.87	0.87	0.94	0.17	6.97	5.7	0.09	0	0	0.04
monthly	5	0.99	0.95	1	0.03	3.08	7.3	0.03	-0.6	0.4	NA
yearly	1	NA	NA	NA	NA	NA	5.7	0.03	1	3	NA
max10	12	0.07	0.23	0.38	0.46	30.96	-8	0.28	NA	NA	NA
min10	12	-34.62	-13.81	-0.13	0.05	163.45	92.5	0.05	NA	NA	NA

Help on CalcModPerf gives details on the individual statistics returned.

help(CalcModPerf)

CalcModPerf

R Documentation

Computes model performance statistics for WRF-Hydro flux output

Description

CalcModPerf calculates model performance statistics for flux output.

Usage

CalcModPerf(flxDf.mod, flxDf.obs, flxCol.mod = "q_cms",
  flxCol.obs = "q_cms", stdate = NULL, enddate = NULL,
  subdivisions = 1000)

Arguments

`flxDf.mod`	The flux output dataframe (required). Assumes only one forecast point per file, so if you have multiple forecast points in your output dataframe, use subset to isolate a single forecast point’s data. Also assumes model output and observation both contain POSIXct fields (called “POSIXct”).
`flxDf.obs`	The observed flux dataframe. Assumes only one observation point per file, so if you have multiple observation points in your dataframe, use subset to isolate a single point’s data. Also assumes model output and observation both contain POSIXct fields (called “POSIXct”).
`flxCol.mod`	The column name for the flux time series for the MODEL data (default=“q_cms”)
`flxCol.obs`	The column name for the flux time series for the OBSERVED data (default=“q_cms”)
`stdate`	Start date for statistics (DEFAULT=NULL, all records will be used). Date MUST be specified in POSIXct format with appropriate timezone (e.g., as.POSIXct(“2013-05-01 00:00:00”, format=“%Y-%m-%d %H:%M:%S”, tz=“UTC”))
`enddate`	End date for statistics (DEFAULT=NULL, all records will be used). Date MUST be specified in POSIXct format with appropriate timezone (e.g., as.POSIXct(“2013-05-01 00:00:00”, format=“%Y-%m-%d %H:%M:%S”, tz=“UTC”))
`subdivisions`	Number of subdivisions used in flow duration curve integration (DEFAULT=1000); increase value if integrate function throws an error.

Details

CalcModPerf reads a model flux time series (i.e., created using ReadFrxstPts) and an observation time series (i.e., created using ReadUsgsGage) and calculates model performance statistics (Nash-Sutcliffe Efficiency, Rmse, etc.) at various time scales and for low and high fluxes. The tool will subset data to matching time periods (e.g., if the observed data is at 5-min increments and modelled data is at 1-hr increments, the tool will subset the observed data to select only observations on the matching hour break).

Performance Statistics:
(mod = model output, obs = observations, n = sample size)

n: sample size
nse: Nash-Sutcliffe Efficiency

nse = 1 - ( sum((obs - mod)^2) / sum((obs - mean(obs))^2) )
nselog: log-transformed Nash-Sutcliffe Efficiency

nselog = 1 - ( sum((log(obs) - log(mod))^2) / sum((log(obs) - mean(log(obs)))^2) )
cor: correlation coefficient

cor = cor(mod, obs)
rmse: root mean squared error

rmse = sqrt( sum((mod - obs)^2) / n )
rmsenorm: normalized root mean squared error

rmsenorm = rmse / (max(obs) - min(obs))
bias: percent bias

bias = sum(mod - obs) / sum(obs) * 100
mae: mean absolute error

mae = mean(abs(mod - obs))
errcom: error in the center-of-mass of the flux, where center-of-mass is the hour/day when 50% of daily/monthly/water-year flux has occurred. Reported as number of hours for daily time scale and number of days for monthly and yearly time scales.
errmaxt: Error in the time of maximum flux. Reported as number of hours for daily time scale and number of days for monthly and yearly time scales).
errfdc: Error in the integrated flow duration curve between 0.05 and 0.95 exceedance thresholds (in native flow units).

Time scales/Flux types:

ts = native model/observation time step (e.g., hourly)
daily = daily time step
monthly = monthly time step
yearly = water-year time step
max10 = high flows; restricted to the portion of the time series where the observed flux is in the highest 10% (native time step)
min10 = low flows; restricted to the portion of the time series where the observed flux is in the lowest 10% (native time step)

Value

A new dataframe containing the model performance statistics.

Examples

## Take forecast point model output for Fourmile Creek (modStrh.mod1.fc) and a corresponding
## USGS gage observation file (obsStrh.fc), both at an hourly time step, and calculate
## model performance statistics. The model forecast point data was imported using ReadFrxstPts
## and the gage observation data was imported using ReadUsgsGage.


## Not run: 
CalcModPerf(modStr1h.allrt.fc, obsStr5min.fc)

> Output:
          nse nselog  cor rmse rmsenorm  bias  mae errcom errmaxt errfdc
ts       0.57   0.61 0.79 1.43     9.48 -28.0 0.70     NA      NA  -0.42
daily    0.71   0.64 0.87 1.17     9.86 -28.1 0.61   0.19   -2.25  -0.37
monthly  0.80   0.70 0.93 0.89    12.73 -26.6 0.53  -0.18   -0.96     NA
yearly   0.05   0.37 0.36 0.55    41.50  -6.5 0.45  -1.50   -3.38     NA
max10   -7.50 -15.94 0.19 3.82    38.89 -24.5 0.04     NA      NA     NA
min10   -2.84  -1.83 0.10 0.05    33.36 -23.7   NA     NA      NA     NA

## End(Not run)

Calculate flow duration curve performance statistics.

CalcFdcPerf(modStrd.allrt.fc, obsStr5min.fc)

p.exceed	q.mod	q.obs	q.err	q.perr
0.1	1.23	1.04	0.19	18.3
0.2	0.82	0.77	0.05	6.5
0.3	0.48	0.56	-0.08	-14.3
0.4	0.36	0.36	0	0
0.5	0.29	0.24	0.05	20.8
0.6	0.22	0.14	0.08	57.1
0.7	0.17	0.12	0.05	41.7
0.8	0.12	0.09	0.03	33.3
0.9	0.09	0.06	0.03	50

Again, help on CalcFdcPerf gives details on the individual statistics returned.

help(CalcFdcPerf)

CalcFdcPerf

R Documentation

Computes flow duration curve statistics for WRF-Hydro streamflow output

Description

CalcFdcPerf calculates flow duration curve statistics for streamflow output.

Usage

CalcFdcPerf(strDf.mod, strDf.obs, strCol.mod = "q_cms",
  strCol.obs = "q_cms", stdate = NULL, enddate = NULL)

Arguments

`strDf.mod`	The forecast point output dataframe (required). Assumes only one forecast point per file, so if you have multiple forecast points in your output dataframe, use subset to isolate a single forecast point’s data. Also assumes model output and observation both contain POSIXct fields (called “POSIXct”).
`strDf.obs`	The observed streamflow dataframe. Assumes only one gage per file, so if you have multiple gages in your dataframe, use subset to isolate a single gage’s data. Also assumes model output and observation both contain POSIXct fields (called “POSIXct”).
`strCol.mod`	The column name for the streamflow time series for the MODEL data (default=“q_cms”)
`strCol.obs`	The column name for the streamflow time series for the OBSERVED data (default=“q_cms”)
`stdate`	Start date for statistics (DEFAULT=NULL, all records will be used). Date MUST be specified in POSIXct format with appropriate timezone (e.g., as.POSIXct(“2013-05-01 00:00:00”, format=“%Y-%m-%d %H:%M:%S”, tz=“UTC”))
`enddate`	End date for statistics (DEFAULT=NULL, all records will be used). Date MUST be specified in POSIXct format with appropriate timezone (e.g., as.POSIXct(“2013-05-01 00:00:00”, format=“%Y-%m-%d %H:%M:%S”, tz=“UTC”))

Details

CalcFdcPerf reads a model forecast point streamflow timeseries (i.e., created using ReadFrxstPts) and a streamflow observation timeseries (i.e., created using ReadUsgsGage) and calculates flow duration curve statistics at various exceedance thresholds (e.g., 10%, 20%, etc.). The tool will subset data to matching time periods (e.g., if the observed data is at 5-min increments and modelled data is at 1-hr increments, the tool will subset the observed data to select only observations on the matching hour break).

Flow Duration Curve Statistics:
(mod = model output, obs = observations)

p.exceed: exceedance threshold (e.g., 0.2 means a flow value that is exceeded 20% of the time)
q.mod: MODEL flow value at specified exceedance threshold (in native flow units)
q.obs: OBSERVED flow value at specified exceedance threshold (in native flow units)
q.err: difference between model and observed flow values [mod-obs] (in native flow units)
q.perr: percent error in model flow [(mod-obs)/obs]

Value

A new dataframe containing the flow duration curve statistics.

Examples

## Take forecast point model output for Fourmile Creek (modStrh.mod1.fc)
## and a corresponding USGS gage observation file (obsStrh.fc), both at an
## hourly time step, and calculate flow duration curve statistics. The
## model forecast point data was imported using ReadFrxstPts and the gage
## observation data was imported using ReadUsgsGage.

## Not run: 
CalcFdcPerf(modStr1h.allrt.fc, obsStr5min.fc)

Output:
 p.exceed    q.mod   q.obs
 0.1         3.07    5.25
 0.2         1.35    2.31
 0.3         0.82    1.06
 0.4         0.48    0.65
 0.5         0.29    0.45
 0.6         0.18    0.34
 0.7         0.14    0.25
 0.8         0.11    0.19
 0.9         0.08    0.16

## End(Not run)

Evaluate streamflow simulation with rwrfhydro

Aubrey Dugger and James McCreight

2015-04-29

Background

Import modelled and observed datasets

Plot hydrographs

Review flow duration curves

Review model performance statistics

Computes model performance statistics for WRF-Hydro flux output

Description

Usage

Arguments

Details

Value

See Also

Examples

Computes flow duration curve statistics for WRF-Hydro streamflow output

Description

Usage

Arguments

Details

Value

Examples