Background

We are using WRF-Hydro to predict streamflow for Fourmile Creek at the Orodell USGS gage for the 2013 snowmelt period. We ran WRF-Hydro with NoahMP as the LSM for a 3-year spinup period and then did a daily output run for 5 months starting March 1, 2013. We want to evaluate model performance in estimating streamflow with and without overland, subsurface, and groundwater routing active.

Load the rwrfhydro package.

library("rwrfhydro")
## To check rwrfhydro updates run: CheckForUpdates()

Set a data path to the Fourmile Creek test case.

dataPath <- '~/wrfHydroTestCases/Fourmile_Creek/'

Import modelled and observed datasets

Model 1: Only channel routing turned on (hourly model run).

modStrd.chrt.fc <- ReadFrxstPts(paste0(dataPath, '/RUN.RTTESTS/OUTPUT_CHRT_DAILY/frxst_pts_out.txt'))

Model 2: All WRF-Hydro routing options turned on (hourly model run).

modStrd.allrt.fc <- ReadFrxstPts(paste0(dataPath, '/RUN.RTTESTS/OUTPUT_ALLRT_DAILY/frxst_pts_out.txt'))

USGS gage observed data at 5-minute intervals. Find the nearest gage to the forecast point in the above files. (The following approach can be used when multiple forecast points are output by the model. The ddply function essentially performs a loop over unique st_id passing the corresponding subset of the data frame modStrd.chrt.fc to the function which just returns the first row’s lon and lat.)

library(plyr)
fcLocation <- ddply(modStrd.chrt.fc, .(st_id), function(df) df[1,c('st_lon','st_lat')])
near <- FindUsgsStns(stnLon=c(fcLocation$st_lon), stnLat=c(fcLocation$st_lat), within=.005)
near
stnLon stnLat within siteType hasDataTypeCd agency_cd site_no station_nm site_tp_cd dec_lat_va dec_long_va queryTime
-105.3278 40.01932 0.005 ST iv USGS 06727500 FOURMILE CREEK AT ORODELL, CO ST 40.01867 -105.3263 2015-04-29 12:10:00

The following checks to see if the data are local and gets them if not. The ‘00060’ product (streamflow) is returned and then made pretty, including conversion to metric.

dbPath <- '~/wrfHydroTestCases/usgsDb/'
obsStr5min.fc <- 
  PrettyUsgs(QueryHaveSite(near$site_no, path=dbPath, ret='00060', get=TRUE))
names(obsStr5min.fc)[4] <- 'q_cms'

Plot hydrographs

Compare hydrographs for the full model run.

PlotFluxCompare(obsStr5min.fc, "q_cms", modStrd.chrt.fc, "q_cms", strDf.mod2=modStrd.allrt.fc, 
     strCol.mod2="q_cms", labelObs="Observed Fourmile Creek at Orodell", 
     labelMod1="Channel Routing Only", labelMod2="All Routing", 
     title="Streamflow: Fourmile Creek")

Now limit the plot to the peak May flow period only. The reported stats are updated to the new time period. (Note that the R warning is innocuous because the subset adjusts for timezone, so it’s ok that the timezones don’t match.)

PlotFluxCompare(obsStr5min.fc, "q_cms", modStrd.chrt.fc, "q_cms", strDf.mod2=modStrd.allrt.fc, 
     strCol.mod2="q_cms", labelObs="Observed Fourmile Creek at Orodell", 
     labelMod1="Channel Routing Only", labelMod2="All Routing", 
     title="Streamflow: Fourmile Creek", 
     stdate=as.POSIXct("2013-05-01 00:00:00", format="%Y-%m-%d %H:%M:%S", tz="UTC"), 
     enddate=as.POSIXct("2013-05-31 00:00:00", format="%Y-%m-%d %H:%M:%S", tz="UTC"))

Review flow duration curves

NOTE: You generally evaluate flow duration curves and staistics over much longer time periods (e.g., multiple years) than what we demo here. To make the test case more portable, we are only evaluating once-a-day model output over 5 months.

Calculate percent exceedances for flow duration curves. Note that we need to subset the observations to match our model run output times, and vice versa.

obsStr5min.comp.fc <- CalcFdc(subset(obsStr5min.fc, POSIXct %in% c(modStrd.chrt.fc$POSIXct)))
modStrd.chrt.comp.fc <- CalcFdc(subset(modStrd.chrt.fc, POSIXct %in% c(obsStr5min.comp.fc$POSIXct)))
modStrd.allrt.comp.fc <- CalcFdc(subset(modStrd.allrt.fc, POSIXct %in% c(obsStr5min.comp.fc$POSIXct)))

Compare how the models are doing predicting flow values that will be exceeded 20% of the time. First, calculate the fitted spline functions.

fdc.obsStr5min.comp.fc <- CalcFdcSpline(obsStr5min.comp.fc)
fdc.modStrd.chrt.comp.fc <- CalcFdcSpline(modStrd.chrt.comp.fc)
fdc.modStrd.allrt.comp.fc <- CalcFdcSpline(modStrd.allrt.comp.fc)

Then, evaluate at the 20% exceedance percentage (high flows).

fdc.obsStr5min.comp.fc(0.2)
fdc.modStrd.chrt.comp.fc(0.2)
fdc.modStrd.allrt.comp.fc(0.2)
## [1] 0.7706495
## [1] 0.9271272
## [1] 0.8216668

Now try the 80% exceedance percentage (low flows).

fdc.obsStr5min.comp.fc(0.8)
fdc.modStrd.chrt.comp.fc(0.8)
fdc.modStrd.allrt.comp.fc(0.8)
## [1] 0.08530777
## [1] 0.1153106
## [1] 0.1228697

Plot flow duration curves for a more complete picture. This tool will do the date matching for us, so no need to subset the datasets.

PlotFdcCompare(obsStr5min.fc, "q_cms", modStrd.chrt.fc, "q_cms", strDf.mod2=modStrd.allrt.fc, 
     strCol.mod2="q_cms", labelObs="Observed Fourmile Creek", 
     labelMod1="Channel Routing Only", labelMod2="All Routing")
## [1] "Combined min flow for y-axis (capped at 0.001): 0.03114853126"
## [1] "Combined max flow for y-axis: 2.5201993474"

Review model performance statistics

Calculate model performance stats (special formatting comands hidden). Again, this tool does the date matching for us.

CalcModPerf(modStrd.chrt.fc, obsStr5min.fc)
  n nse nselog cor rmse rmsenorm bias mae errcom errmaxt errfdc
units count unitless unitless unitless flux units unitless percent flux units hours|days hours|days flux units
t 116 0.76 0.68 0.88 0.24 9.66 12.2 0.14 NA NA 0.07
daily 116 0.76 0.68 0.88 0.24 9.66 12.2 0.14 0 0 0.07
monthly 5 0.9 0.79 0.96 0.12 11.41 14.5 0.09 -1.4 -2 NA
yearly 1 NA NA NA NA NA 12.2 0.05 -1 0 NA
max10 12 0.32 0.35 0.74 0.39 26.41 -12.1 0.32 NA NA NA
min10 12 -602.36 -20.18 0.35 0.21 672.75 159.8 0.08 NA NA NA
CalcModPerf(modStrd.allrt.fc, obsStr5min.fc)
  n nse nselog cor rmse rmsenorm bias mae errcom errmaxt errfdc
units count unitless unitless unitless flux units unitless percent flux units hours|days hours|days flux units
t 116 0.87 0.87 0.94 0.17 6.97 5.7 0.09 NA NA 0.04
daily 116 0.87 0.87 0.94 0.17 6.97 5.7 0.09 0 0 0.04
monthly 5 0.99 0.95 1 0.03 3.08 7.3 0.03 -0.6 0.4 NA
yearly 1 NA NA NA NA NA 5.7 0.03 1 3 NA
max10 12 0.07 0.23 0.38 0.46 30.96 -8 0.28 NA NA NA
min10 12 -34.62 -13.81 -0.13 0.05 163.45 92.5 0.05 NA NA NA

Help on CalcModPerf gives details on the individual statistics returned.

help(CalcModPerf)
CalcModPerf R Documentation

Computes model performance statistics for WRF-Hydro flux output

Description

CalcModPerf calculates model performance statistics for flux output.

Usage

CalcModPerf(flxDf.mod, flxDf.obs, flxCol.mod = "q_cms",
  flxCol.obs = "q_cms", stdate = NULL, enddate = NULL,
  subdivisions = 1000)

Arguments

flxDf.mod

The flux output dataframe (required). Assumes only one forecast point per file, so if you have multiple forecast points in your output dataframe, use subset to isolate a single forecast point’s data. Also assumes model output and observation both contain POSIXct fields (called “POSIXct”).

flxDf.obs

The observed flux dataframe. Assumes only one observation point per file, so if you have multiple observation points in your dataframe, use subset to isolate a single point’s data. Also assumes model output and observation both contain POSIXct fields (called “POSIXct”).

flxCol.mod

The column name for the flux time series for the MODEL data (default=“q_cms”)

flxCol.obs

The column name for the flux time series for the OBSERVED data (default=“q_cms”)

stdate

Start date for statistics (DEFAULT=NULL, all records will be used). Date MUST be specified in POSIXct format with appropriate timezone (e.g., as.POSIXct(“2013-05-01 00:00:00”, format=“%Y-%m-%d %H:%M:%S”, tz=“UTC”))

enddate

End date for statistics (DEFAULT=NULL, all records will be used). Date MUST be specified in POSIXct format with appropriate timezone (e.g., as.POSIXct(“2013-05-01 00:00:00”, format=“%Y-%m-%d %H:%M:%S”, tz=“UTC”))

subdivisions

Number of subdivisions used in flow duration curve integration (DEFAULT=1000); increase value if integrate function throws an error.

Details

CalcModPerf reads a model flux time series (i.e., created using ReadFrxstPts) and an observation time series (i.e., created using ReadUsgsGage) and calculates model performance statistics (Nash-Sutcliffe Efficiency, Rmse, etc.) at various time scales and for low and high fluxes. The tool will subset data to matching time periods (e.g., if the observed data is at 5-min increments and modelled data is at 1-hr increments, the tool will subset the observed data to select only observations on the matching hour break).

Performance Statistics:
(mod = model output, obs = observations, n = sample size)

  • n: sample size

  • nse: Nash-Sutcliffe Efficiency

    nse = 1 - ( sum((obs - mod)^2) / sum((obs - mean(obs))^2) )

  • nselog: log-transformed Nash-Sutcliffe Efficiency

    nselog = 1 - ( sum((log(obs) - log(mod))^2) / sum((log(obs) - mean(log(obs)))^2) )

  • cor: correlation coefficient

    cor = cor(mod, obs)

  • rmse: root mean squared error

    rmse = sqrt( sum((mod - obs)^2) / n )

  • rmsenorm: normalized root mean squared error

    rmsenorm = rmse / (max(obs) - min(obs))

  • bias: percent bias

    bias = sum(mod - obs) / sum(obs) * 100

  • mae: mean absolute error

    mae = mean(abs(mod - obs))

  • errcom: error in the center-of-mass of the flux, where center-of-mass is the hour/day when 50% of daily/monthly/water-year flux has occurred. Reported as number of hours for daily time scale and number of days for monthly and yearly time scales.

  • errmaxt: Error in the time of maximum flux. Reported as number of hours for daily time scale and number of days for monthly and yearly time scales).

  • errfdc: Error in the integrated flow duration curve between 0.05 and 0.95 exceedance thresholds (in native flow units).

Time scales/Flux types:

  • ts = native model/observation time step (e.g., hourly)

  • daily = daily time step

  • monthly = monthly time step

  • yearly = water-year time step

  • max10 = high flows; restricted to the portion of the time series where the observed flux is in the highest 10% (native time step)

  • min10 = low flows; restricted to the portion of the time series where the observed flux is in the lowest 10% (native time step)

Value

A new dataframe containing the model performance statistics.

See Also

Other modelEvaluation: CalcModPerfMulti; CalcNoahmpFluxes; CalcNoahmpWatBudg

Examples

## Take forecast point model output for Fourmile Creek (modStrh.mod1.fc) and a corresponding
## USGS gage observation file (obsStrh.fc), both at an hourly time step, and calculate
## model performance statistics. The model forecast point data was imported using ReadFrxstPts
## and the gage observation data was imported using ReadUsgsGage.


## Not run: 
CalcModPerf(modStr1h.allrt.fc, obsStr5min.fc)

> Output:
          nse nselog  cor rmse rmsenorm  bias  mae errcom errmaxt errfdc
ts       0.57   0.61 0.79 1.43     9.48 -28.0 0.70     NA      NA  -0.42
daily    0.71   0.64 0.87 1.17     9.86 -28.1 0.61   0.19   -2.25  -0.37
monthly  0.80   0.70 0.93 0.89    12.73 -26.6 0.53  -0.18   -0.96     NA
yearly   0.05   0.37 0.36 0.55    41.50  -6.5 0.45  -1.50   -3.38     NA
max10   -7.50 -15.94 0.19 3.82    38.89 -24.5 0.04     NA      NA     NA
min10   -2.84  -1.83 0.10 0.05    33.36 -23.7   NA     NA      NA     NA

## End(Not run)



Calculate flow duration curve performance statistics.

CalcFdcPerf(modStrd.allrt.fc, obsStr5min.fc)
p.exceed q.mod q.obs q.err q.perr
0.1 1.23 1.04 0.19 18.3
0.2 0.82 0.77 0.05 6.5
0.3 0.48 0.56 -0.08 -14.3
0.4 0.36 0.36 0 0
0.5 0.29 0.24 0.05 20.8
0.6 0.22 0.14 0.08 57.1
0.7 0.17 0.12 0.05 41.7
0.8 0.12 0.09 0.03 33.3
0.9 0.09 0.06 0.03 50

Again, help on CalcFdcPerf gives details on the individual statistics returned.

help(CalcFdcPerf)
CalcFdcPerf R Documentation

Computes flow duration curve statistics for WRF-Hydro streamflow output

Description

CalcFdcPerf calculates flow duration curve statistics for streamflow output.

Usage

CalcFdcPerf(strDf.mod, strDf.obs, strCol.mod = "q_cms",
  strCol.obs = "q_cms", stdate = NULL, enddate = NULL)

Arguments

strDf.mod

The forecast point output dataframe (required). Assumes only one forecast point per file, so if you have multiple forecast points in your output dataframe, use subset to isolate a single forecast point’s data. Also assumes model output and observation both contain POSIXct fields (called “POSIXct”).

strDf.obs

The observed streamflow dataframe. Assumes only one gage per file, so if you have multiple gages in your dataframe, use subset to isolate a single gage’s data. Also assumes model output and observation both contain POSIXct fields (called “POSIXct”).

strCol.mod

The column name for the streamflow time series for the MODEL data (default=“q_cms”)

strCol.obs

The column name for the streamflow time series for the OBSERVED data (default=“q_cms”)

stdate

Start date for statistics (DEFAULT=NULL, all records will be used). Date MUST be specified in POSIXct format with appropriate timezone (e.g., as.POSIXct(“2013-05-01 00:00:00”, format=“%Y-%m-%d %H:%M:%S”, tz=“UTC”))

enddate

End date for statistics (DEFAULT=NULL, all records will be used). Date MUST be specified in POSIXct format with appropriate timezone (e.g., as.POSIXct(“2013-05-01 00:00:00”, format=“%Y-%m-%d %H:%M:%S”, tz=“UTC”))

Details

CalcFdcPerf reads a model forecast point streamflow timeseries (i.e., created using ReadFrxstPts) and a streamflow observation timeseries (i.e., created using ReadUsgsGage) and calculates flow duration curve statistics at various exceedance thresholds (e.g., 10%, 20%, etc.). The tool will subset data to matching time periods (e.g., if the observed data is at 5-min increments and modelled data is at 1-hr increments, the tool will subset the observed data to select only observations on the matching hour break).

Flow Duration Curve Statistics:
(mod = model output, obs = observations)

  • p.exceed: exceedance threshold (e.g., 0.2 means a flow value that is exceeded 20% of the time)

  • q.mod: MODEL flow value at specified exceedance threshold (in native flow units)

  • q.obs: OBSERVED flow value at specified exceedance threshold (in native flow units)

  • q.err: difference between model and observed flow values [mod-obs] (in native flow units)

  • q.perr: percent error in model flow [(mod-obs)/obs]

Value

A new dataframe containing the flow duration curve statistics.

Examples

## Take forecast point model output for Fourmile Creek (modStrh.mod1.fc)
## and a corresponding USGS gage observation file (obsStrh.fc), both at an
## hourly time step, and calculate flow duration curve statistics. The
## model forecast point data was imported using ReadFrxstPts and the gage
## observation data was imported using ReadUsgsGage.

## Not run: 
CalcFdcPerf(modStr1h.allrt.fc, obsStr5min.fc)

Output:
 p.exceed    q.mod   q.obs
 0.1         3.07    5.25
 0.2         1.35    2.31
 0.3         0.82    1.06
 0.4         0.48    0.65
 0.5         0.29    0.45
 0.6         0.18    0.34
 0.7         0.14    0.25
 0.8         0.11    0.19
 0.9         0.08    0.16

## End(Not run)