How to make plots and tables for annual reports
Bart Huntley
2025-01-31
Source:vignettes/annual_reports.Rmd
annual_reports.Rmd
Introduction
All water quality data, historical and present, is accessed through the Water Information Reporting portal maintained by the Department of Water and Environment link. Data should be downloaded for the Canning Estuary and the Swan Estuary sites using the keywords search terms “SG-E-CANEST” and “SG-E-SWANEST” respectively. Data requested should be “Water Quality (discrete) for Site(s) cross-tabulated” and as the data size is large, requests should be split into appropriate yearly chunks. This data is the starting point for using the functions that follow to produce plots and tables for the annual reports. It is suggested that the original downloads be kept and the data simply updated yearly.
The plots created by the functions are output as panel plots and follow the same format as previous annual reports. A panel is based on a variable and contains a separate plot for each combination of ecological management zone (emz) and sampling method. Visualisation consists of:
- a “ribbon” designating the 10th and 90th percentiles of the background data
- a line showing the background data median
- points showing the current reporting period medians
The tables generated from the same functions contain counts, minimums, maximums and medians for the current reporting period per month. These are summarised by emz and sampling method and can be copied straight into the report when it is being compiled.
The process for creating the plots and tables has two stages.
Stage One - Data assembly
Data required for the annual reports consists of a current reporting period (01 June to 31 May) and background period (5 years prior to current reporting period). In this stage all downloaded data is read in and categorised into a current and background period. The downloaded data has some challenging formatting (for automatic reading and variable naming etc) so these functions are very specific to the current export format. Should this format change these functions will fail and require some editing.
When the report is read in it is tidied with the following occuring:
- variable names are standardised
- values are checked for “<” or “>”:
- a value following a “<” is halved
- a value following a “>” is left as is
- the variable emz is created based on site
Once the data is read in and tidied it is exported as a csv with the format “XX-X-XXXXX_annual_report_data_for_YYY.csv” where:
- “XX-X-XXXXX” is the estuary (i.e. SG-E-CANEST or SG-E-SWANEST)
- “YYYY” is the current reporting year.
Remember to change the file path slashes if on a Windows OS. Examples below.
#load the package
library(rivRmon)
#Canning estuary - forward slash example
canning_WIN_report_data(inpath = "C:blah/blah/annual_data_location",
reportingYear = 2019,
outpath = "C:blah/blah/somewhere")
#Swan estuary - backslash example
swan_WIN_report_data(inpath = "C:blah\\blah\\annual_data_location",
reportingYear = 2019,
outpath = "C:blah\\blah\\somewhere")
inpath is a character file path to the downloaded data xlsx files reportingYear is a 4 digit numerical representing current reporting year outpath is a character file path to desired export location for writing of the csv files
This function needs to be run each year to set up the data for the annual report. The output from this stage will be the input for the next two stages.
Stage Two - Creating plots and tables
There is a separate function for each estuary. These are essentially
wrapper functions around other functions that work on data that has the
same requirements for visualisation (mainly formatting such as plots per
panel, or data filtering parameters etc). Outputs are saved to
directories created by running the functions. There will be a
panels
and a tables
directory with the prefix
“c_” or “s_” for Canning and Swan estuaries respectively.
#Canning estuary - forward slash example
canning_reportR(inpath = "C:blah/blah/somewhere",
outpath = "C:blah/blah/somewhere_else",
surface = "blue",
bottom = "red",
chloro = "darkgreen")
#Swan estuary - backslash example
swan_reportR(inpath = "C:blah\\blah\\somewhere",
outpath = "C:blah\\blah\\somewhere_else",
surface = "blue",
bottom = "red",
chloro = "darkgreen")
inpath a character file path to pre-made annual report data (the out path from Stage 1 above)
outpath a character file path to desired export location
The parameters surface, bottom and chloro can take a named colour or a hex colour code. The defaults produce colours similar to those produced in prior reports, however access to these parameters allows users to tweak if required. surface relates to surface measurements, bottom to bottom measurements and chloro to integrated measurements specific only to chlorophyll plots.
Groupings of variables
As stated previously some variables are grouped so that code can filter data under similar circumstances. These groupings are as follows.
For the Canning estuary:
- chlorophyll a - by itself
- dissolved organic carbon - by itself
- group 1
- total nitrogen
- ammonia nitrogen
- total oxidised nitrogen
- dissolved organic nitrogen
- total phosphorus
- filterable reactive phosphorus
- group 3
- dissolved oxygen
- pH
- salinity
- temperature
- secchi depth - by itself
- silica - by itself
For the Swan estuary:
- chlorophyll a - by itself
- group 1
- total nitrogen
- ammonia nitrogen
- total oxidised nitrogen
- dissolved organic nitrogen
- total phosphorus
- filterable reactive phosphorus
- silica
- group 2
- alkalinity (not report variable for Canning estuary)
- dissolved organic carbon
- tss (not report variable for Canning estuary)
- group 3
- dissolved oxygen
- pH
- salinity
- temperature
- secchi depth
Note that swan_reportR
outputs 2 panel plots as per the
original report for the group 3 variables. The extra panel plots are for
the additional emz of “Swan”.
Plotting individual variable groupings
As previously mentioned, the reportR functions are wrappers. The
individual functions are available to the user and can be found in the
reference index. The groupings match the above and function names take
the format ann_rep_vvvv_e()
where:
- “vvvv” is the variable or grouping
- “_e” is the estuary (“_c” or “_s”)
As these functions are written to be internal to the wrapper they need to be run in the following manner in order to read in the correct data.
#load additional packages
library(tidyverse)
#setup a file path to the annual report data (i.e. csv from Stage One)
inpath <- "C:path/to/anuual_report_data"
#get the full name and pathrow to the required data - change pattern as required
file <- list.files(path = inpath, pattern = "SG-E-CANEST_annual_report",
full.names = TRUE)
#create parameter variables for the function you want to use
outpath <- "C:path/to/desired_export_location"
data <- readr::read_csv(file = file, guess_max = 10000)
#run function - here we have chosen group 1 variables for the Swan
ann_rep_grp1_s(outpath, data, surface = "blue", bottom = "red")