Data on plant elemental stoichiometry are essential for ecologists investigating ecosystem function, nutrient cycling, plant-herbivore interactions, and leaf economics. A cost effective approach to leaf elemental analysis is near-infrared spectroscopy (NIRS), whereby components such as carbon (C), nitrogen (N), phosphorus (P), and potassium (K) can be predicted using NIR spectra from plant material. However, a major factor limiting the widespread use of NIRS by plant ecologists is the availability of free software and calibration data for developing plant chemistry datasets from spectra. Here, we present a pair of companion R packages (‘plantspec’ and ‘plantspecDB’) that satisfy this need by providing an entire workflow, from spectra to predicted elemental data. The main R package, called ‘plantspec’, allows users to manipulate spectral data and develop custom partial least square (PLS) models for predicting the elemental composition of their own datasets. The second package, ‘plantspecDB’, provides NIR spectra, and matched elemental data obtained with standard analytical techniques, for herbaceous samples collected from 18 grassland sites around the world, primarily sourced from the Nutrient Network experiment. The ‘plantspecDB’ data package also provides calibrations for C, N, P, and K for bulk samples and separated by plant functional type (grasses, forbs and legumes).
Here, we provide an example of the ‘plantspec’ workflow that produces an NIR calibration for N, similar to the one reported in Anderson et al. (2018). This workflow provides a template to produce robust calibration models and increase the application of NIR in ecology. Although we focus on plant stoichiometry, the workflow presented here can be applied broadly for a broad range of applications, including soils, remote sensing data, solutions and tissues. Finally, our global dataset provides unprecedented access to NIR calibration data as they pertain to tissue concentrations of key plant limiting elements. We hope that ‘plantspec’ will help to form a community of users who share data in order to increase the utility of NIR for ecologists, and we provide a submission portal to facilitate data sharing: plantspec submission portal.
Install ‘plantspec’ and ‘plantspecDB’ from GitHub using devtools.
library(devtools)
install_github(repo = "griffithdan/plantspec") # Install analysis package
library(plantspec)
install_github(repo = "griffithdan/plantspecDB") # Install data package
library(plantspecDB)
data(plantspec.spectra)
data(plantspec.data)
The dataset ‘plantspec.data’ includes all of the wet lab data for this example. We will start by subsetting those data until we have only the data we want to use for model fitting. For example, we only want data from the original dataset and excluding Bryophytes and litter samples.
# Use dataset version 1
plantspec.data <- plantspec.data[plantspec.data$VERSION==1,]
# Data summaries (results hidden)
table(plantspec.data$FUNCTIONAL_TYPE)
table(plantspec.data$YEAR)
table(plantspec.data$SITE_NAME)
table(plantspec.data$SITE_COUNTRY)
# Subset the dataset
# Remove NA values for N
plantspec.data <- plantspec.data[!is.na(plantspec.data$N),]
# Remove Bryophyte and litter data
plantspec.data <- subset(plantspec.data, !(FUNCTIONAL_TYPE == "bryophyte"))
plantspec.data <- subset(plantspec.data, !(FUNCTIONAL_TYPE == "litter"))
We should now plot the spectral data using an interactive plot so that we can determine if there are outliers or other issues with the data.
p <- plot(plantspec.spectra[200:250,])
p$width <- 900 # Change width for rmarkdown
p
# Remove suspect scan
plantspec.data <- subset(plantspec.data, SCAN_FILE != "SCAN_00225.dpt")
# Subset the spectral data based on the parameters we specified for wet lab data
NUTNET_SCANS <- plantspec.spectra[plantspec.data$SCAN_FILE,]
# Base R plot for filtered scans
plot(NUTNET_SCANS, base_plot = TRUE) # inspect scans
# Select the component of interest (i.e., Nitrogen)
component_N <- plantspec.data$N # new vector for clarity
# Select a training set for model fitting (i.e., FALSE values for test set)
# - use "!" because subdivideDataset() returns TRUE for test set selections.
# - use "type = 'validation'" so both training and test data are representative.
training_set_MDKS <- !(subdivideDataset(spectra = NUTNET_SCANS,
component = component_N,
method = "MDKS",
p = 0.1, # 10% for this example
type = "validation"))
# Optimize the preprocessing for spectra and spectral subsetting to emphasize
# the important spectra features related to predicting N. This can be very time
# consuming.
N_opt <- optimizePLS(component = component_N,
spectra = NUTNET_SCANS,
training_set = training_set_MDKS)
# Fit our PLS regression using the optimal setting from our optimization.
N_cal <- calibrate(component = component_N,
spectra = NUTNET_SCANS,
optimal_params = N_opt,
optimal_model = 1, # In N_opt, use the best model
validation = "testset",
training_set = training_set_MDKS)
Now, evaluate the model validation and use the model to predict new data.
# Test set validation R2 from the PLS regression
N_cal$R2_Val
## [1] 0.9510736
# Plot PLS calibration
# - N_cal is an object defined in the pls R package (Mevik et al 2013)
plot(N_cal)
# Use out model to predict and an external dataset
# - makeCompatible() gives the new data the same properties as our training data
data(shootout)
N_pred <- predict(N_cal, newdata = makeCompatible(x = shootout_scans,
ref = NUTNET_SCANS))
# N_pred has attributes that include outlier assessment by Mahalanobis distance
hist(N_pred, col = "grey")