Background
The rivRmon
package was developed back in 2019 to fill a
need in the Department’s reporting on the Swan and Canning Rivers. The
initial brief was to create a product that had the same look and feel as
reporting that had been undertaken using a proprietary software.
After the initial development, further functionality (e.g. phytoplankton reporting) was added as were tweaks to the original surfer plots to accommodate extra sites, deeper profiles, better interpolation etc. As with a lot of projects like this, time was always the limiting factor and the package, whilst still robust at time of writing, would benefit from some refactoring and removal of older package dependencies.
The intention of these notes is to aid any future developer in navigating the package structure and hopefully give some context around decisions made when formulating the code base.
Package structure
The basic package structure should look familiar to anyone who has had experience with writing R packages. In a nutshell though:
-
rivRmon
is hosted on the Department’s GitHub account. -
rivRmon
has a supporting website hosted through GitHub pages created by using thepkgdown
R package which automates the creation of all files necessary for a static website. As such there are extra directories and files in the GitHub repo that do not form part of an R package and are noted in the .Rbuildignore file. - new version releases are managed through GitHub’s Release functionality.
- new releases have been linked to a Zenodo DOI. If future versions are published please add your own ORCID account and I would appreciate still being attached.
A fabulous resource to get up to speed on R package development and covers most of the above is the excellent resource R Packages. For GitHub integration with R Studio try Happy Git and GitHub for the useR.
Any work on the package should follow:
- Source the latest version from GitHub.
- Make edits/changes to code.
- Commit changes locally.
- Perform extensive testing locally.
- Rebuild package (new version), documentation and website locally.
- Update package NEWS file.
- Test loading of package.
- Run package R CMD checks and clear up any errors.
- Retest package after loading new version.
- Commit all changes along the way locally.
- Push new package to GitHub repository.
- Create a new release and ensure same in Zenodo (where DOI is minted).
- Tell users that a new package is available and how to access.
Additional package notes
Internal data
There are two sources of internal data that are required for the
package to operate and are bundled with the software. The first can be
found in .data/
and covers the management response
triggers. It is a .rda
file and if required can be
recreated and updated easily in R if required.
The second internal data source resides in the .R/
directory as the sysdata.rda
file and contains a lot of
extra data including:
- site locations and information.
- locations of the oxy plants.
- coordinates used for the bathymetry (bottom profiles of the surfer plots).
- pre-baked interpolation grids.
- reclassification matrices used to bin continuous metric values so that discrete colour scales can be used.
- coordinates used for “black-out” rectangles for missing data.
- the phytoplankton plotting colour scale.
Recreating this data can be accomplished by editing and running the
script found at data-raw/internal_data.R
. Note that adding
new sites is accomplished by editing a shape file that maintains the
spatial integrity of the relationship between the sampling sites and the
bathymetry. The file path to the shape file is in the script and access
can be gained by contacting the DBCA RSSA program.
It is also worth noting that not all of the
inetrnal_data.R
script can be successfully run at present.
Whilst this has no impact on rivRmon
in its present form an
update to this should be completed to negate future issues. Attention
needs to be paid to updating any older package dependencies that are
required to build the interpolation grids, namely rgdal
,
rgeos
, raster
and possibly sp
.
These are all spatial packages and would benefit from being upgraded to
utilise functionality from terra
and sf
, their
modern equivalents.
As sysdata.rda
is just a data file with multiple objects
there is an easy workaround if only certain objects need updating.
Starting with a clean environment, clicking on sysdata.rda
in R studio will load all objects into the environment. The developer
can then update the required object and then resave as
sysdata.rda
(see bottom of the inetrnal_data.R
script for more notes and what objects are required).
Functions
Most functions live in their own .R
files as they are
generally quite long and can be complicated. Internal functions
(i.e. helper functions run inside the main functions but are not
exported with the package) are contained in data-prep.R
and
contain helpers for:
- for finding the correct workbooks for input to surfer and phyto functions.
- ingesting the sonde data and standardizing the output for the surfer
plots and a special case version for use with the ad hoc
plot_metric()
function. - colour scales for the surfer plots.
- creating a pretty date (i.e. 1st, 2nd, 3rd, 4th etc).
Surfer concept
As previously mentioned, the surfR
functions were
developed to imitate the exact outputs from a proprietary software. To
accomplish this, it necessitated a departure from some very capable (and
easier to maintain) plotting paradigms. Therefore it may be helpful to
understand the internal “process” and design choices in a
sufR
function and this is generalised below:
- File paths to data workbooks (1 per sonde) are established with
internal checks to establish correct river
sufR
is being called. - Sonde data is read in and aggregated. The helper function here was necessarily complicated as many different sondes had been used with no real standardisation in naming of metrics.
- Depth profiles are sourced.
- Data for the interpolations is collated.
- Data is interpolated over sourced grids.
- Sampling locations are determined and locations restricted to those appearing in data (standard and response sites).
- Sampling site labels are generated, omitting those that overplot.
- Black out rectangles are generated for any missing data.
- Individual ggplot2 objects are created for each of the 4 metrics with very specific themes per plot to allow correct axis labels etc.
- plots are arranged into a panel layout and written to
.png
. - Swan
surfR
always outputs a whole of river plot and a “Narrows and up” plot. - Canning
surfR
determines if the Kent Street weir should be included based on the presence of some sampling sites. If it is, an alternate bathymetry profile is chosen and the interpolation is now split over two grids that meet at the weir. This is because the weir produces a boundary for the interpolation.
Last thoughts
If this is all very new to a new maintainer/developer then please
practice creating packages before attempting any irreversible changes to
the DBCA-WA repository. Whilst Git is great for version control you have
to know Git to do it. There are dozens of resources out there for
practicing package creation and GitHub workflows. Lastly you will need
to contact OIM to be added as an administrator to the
dbac-wa/rivRmon
repository to be able to push any changes
to the package.