I have been trying to use R for about a month now and there are days when I feel good about it and other days when I hate it. What I am experiencing now is comparable to learning a new language, I guess… So, a week ago, after realizing how unstructured my knowledge is, I signed-up for the online course R Programming. The course haven’t started yet, but I’m having great expectations.
I am interested in low-flow hydrology these days (see for example this report on hydrological low-flow indices and their uses), so I have been researching what R packages (and functions) will be relevant to my research. While doing that, I stumbled upon these generally useful posts/resources:
- R resources for Hydrologists and USGS packages for Hydrology: blog posts of AboutHydrology with a quite extensive list with R packages for Hydrologists
- “Why R is hard to learn” very entertaining blog post by Bob Muenchen (r4stats.com)
- the CRAN Task View: Analysis of Ecological and Environmental Data: includes a list with R packages covering many environmental and statistical data analyses. There is also a subheading on Hydrology and Oceanography.
- a course announcement by National Park Service with some R relevant links.
- Data Visualization with ggplot2 Cheat Sheet by R studio
- R reference card by CRAN
- The Ultimate R Cheat Sheet – Data Management (Version 4)
- Google R-style guide: guidelines for how to code in R, so it is easier to others (and ourselves) to read, share, and verify the code
My very own “cheat sheet” of functions (and packages) for low-flow hydrology analyses is probably not as complete as some of the sources given in the links above. Nevertheless, I decided to post it here as it may help someone else. The “cheat sheet” is not meant to be a complete review, but If I have omitted something that is relevant, leave a comment.
The package name is given in the tab and for each package I have listed some functions with brief description on what they do, and some of the arguments that can be used. The links to the package descriptions are given below the functions.
My search started from packages that implement at least one method of automated baseflow separation (BFS). But eventually, while reading the package documentations, I found other functions that can make my life easier. Even though most of the packages bellow use the Lyne-Hollick filter (see [1] for more details) for BFS, there are some differences: some of the packages allow for choice of number of filter passes, or ignore NA values, or have different format preferences… One of the packages implements multiple BFS methods with the same function.
BaseflowSeparation(streamflow, filter_parameter , passes) # BFS method [1], “streamflow” is a vector
get_usgs_gage(flowgage_id,begin_date, end_date) # imports streamflow data from the USGS database in m3/day, decimal Lat&Lon, catchment area above gage
ConvertFlowUnits(cfs, cmdL, cms, WA, mmd, AREAunits) # converts flow units, e.g. from cfs to mm/day etc.
vector2zoo(x, dates, date.fmt) # converts character, factor, POSIXct into zoo object (the package uses zoo)
smry(x, na.rm, digits,…) # computes statistics: min, 1stquartile, mean, median, 3rd quartile, interquartile range, standard deviation, coefficient of variation, skewness, kurtosis, total number of elements, amount of missing values)
monthlyfunction(x, FUN, na.rm, output,…) # Generic function for obtaining 12 monthly values by applying any R function to ALL the values in the object belonging to each one of the 12 calendar months (Jan…Dec)
dwi(x, out.unit, from, to, date.fmt, tstep, …) # generates a table with the number of days with information different from NA within a zoo object
matrixplot(x, ColorRamp, ncolors, main, …) # plots a color matrix representing the values stored in x
extract(x, trgt, …) # extracts from zoo object all the values belonging to a given month, year, or weather season
daily2monthly(x, FUN, na.rm = TRUE, …) # transforming daily (sub-dayly or weekly) regular timeseries into a monthly one
daily2annual(x, FUN, na.rm = TRUE, out.fmt = “%Y-%m-%d”, …) # transforming daily to annual timeseries
hydroplot(x, FUN, na.rm, ptype, pfreq, var.type, var.unit, main, xlab, ylab, win.len1, win.len2, tick.tstep, lab.tstep, lab.fmt, cex, cex.main, cex.lab, cex.axis=1.3, col, from, to, date.fmt, stype, season.names, h=NULL, …) # plots (a maximum of) 9 graphs (timeseries plots, boxplots and/or histograms) of the daily, monthly, annual and/or seasonal time series, can’t handle multivariate zoo
baseflows(flow.ts, a, ts) # BFS method: [1] can’t chose number of filter passes, ignores NA values, date must be in POSIX format
low.spells(flow.ts, quant, threshold, duration, volume, plot, ann.stats, ann.stats.only, hydro.year) # calculates a suite of statistics describing low-flow spell characteristics, such as the timing, frequency and duration of events below a threshold
baseflow_sep(runoff, method, parms) # option for multiple BFS methods: single pass of [1], constant slope, re-scaled LOWESS-smoothed window minima; “runoff” is a vector
bfi(runoff, …) # baseflow index
IHA stands for Indicators for Hydrologic Alteration and it is a R package implementing the Nature Conservancy’s Indicators of Hydrologic Alteration software.
water.year(x) # returns a string with water year, x is a zoo object
water.month(x, label, abbr) # returns month of a water year (Oct-Sept)
group1(x, year, FUN) # magnitude of monthly water conditions (mean or median for each month)
group2(x, year, mimic.tnc,…) # magnitude and duration of annual extreme water conditions (annual minima and maxima, 1,3,7,30,90 day means; number of 0 flow days, BFI=7-day min flow/mean flow for the year)
group3(x, year, mimic.tnc) # timing of annual extreme water conditions (Julian date of each annual 1-day maximum, Julian date of each annual 1-day minimum)
group4(x, year, thresholds) # frequency and duration of high and low pulses (Number of low and high pulses within each water year, mean or median duration of low and high pulses (days))
group5(x, year) # rate and frequency of water condition changes (rise and fall rates: Mean or median of all positive and negative differences between consecutive daily values; number of hydrologic reversals)
rpackage, indicators of hydrologic alteration
lfstat stands for Low Flow Statistics and follows the World Meteorological Organisation’s “Manual on Low-flow Estimation and Prediction” (link at the bottom).
BFI(lfobj, year, breakdays, yearly) # BFS method [2], hydrological year possible, lfbobj is a special object type created for this package
bfplot(lfobj, year, col, bfcol, ylog) # plots streamflow and baseflow
MAM(lfobj, n, year, breakdays, yearly) # computes Mean Annual Minimum for a given n (MAM-n); if n=1 and yearly=TRUE – computes annual minima
Qxx(lfobj, Qxx, year, monthly, yearly, breakdays, na.rm)
Q95 or Q90 or Q70(lfobj, year, monthly, yearly, breakdays, na.rm)
fdc(lfobj, year, breakdays, colors, xnorm, ylog, legend, separate, …) # flow duration curve
lfnacheck(lfobj) # check for missing (NA) values
lfnainterpolate(lfobj) # replacing missing values with linear interpolation
meanflow(lfobj, year, monthly, yearly, breakdays, na.rm) # calculates meanflow
tyears(lfobj, event, n, dist, legend, zetawei) # where dist is the distribution, dist=c(“exp”, “gam”, “gev”, “glo”, “gpa”, “gno”, “gum”, “kap”, “lognormal”, “normal”, “pe3”, “wak”, “wei”): minimum D-day-mean streamflow (n=D) expected to occur on average once during an interval of T-years (event=T). Can be used for 7Q10 or similar indices.
cleanUp(dataset, task, replace) # identifies and fixes common problems with hydrologic data (negative and 0 values)
fillMiss(dataset, block, pmiss, model, smooth, …) # fills missing values
importDVs(staid, code, stat, sdate, edate) # imports daily hydrologic time series data given a USGS streamgage identification number
siteInfo(staid) # retrieves information about a USGS streamgage site – name, decimal latitude and longitude
summaryStats(dataset, staid = 1) # returns matrix with summary statistics (beginning and end of timeseries, number of rows, number NA and negative, min, max, median, mean, Q1, Q3, StdDev, IQR)
Used references for baseflow separation (BFS) methods
- R. J. Natahan and T. A. McMahon, “Evaluation of automated techniques for baseflow and recession analyses”, Water Resour. Res., vol. 26, no. 7, pp. 1465–1473, 1990.
- Institute of Hydology, “Low Flow Studies”, 1980.
Probably after I decide on the statistics I will use in my project, I will prepare similar overview on the different packages for timeseries trend analyses. Unless I give up completely on R.
Thank very much. Which one you recommend for a long daily stream-water discharge with high density NA?
HI Alis, I don’t think it matters which separation method you choose. In any case you have to deal with the NA first. I hope this helps. Denitza