Philipp Baumann
9d9d971caa
|
8 years ago | |
---|---|---|
R | 8 years ago | |
man | 8 years ago | |
tests | 8 years ago | |
.Rbuildignore | 8 years ago | |
.gitignore | 8 years ago | |
DESCRIPTION | 8 years ago | |
LICENSE.md | 8 years ago | |
NAMESPACE | 8 years ago | |
README.md | 8 years ago | |
simplerspec.Rproj | 8 years ago |
README.md
simplerspec
Simplerspec aims to facilitate spectra and additional data handling and model development for spectroscopy applications such as FT-IR soil spectroscopy. Different helper functions are designed to create a
data and modeling workflow. Data inputs and outputs are stored in R
objects with specific data structures. The following steps are covered in the current beta version of the package:
- Read spectral data from text files (
.csv
; an implementation for reading OPUS binary files is planned) - Average spectra for replicate scans
- Detect and remove outlier spectra based on robust PCA
- Resample spectra to new wavenumber intervals
- Perform pre-processing of spectra
- Join chemial and spectral data sets
- Perform calibration sampling and PLS regression modeling
- Predict chemical properties from a list of calibrated models and new soil spectra
Installation
The newest version of the package is available on this GitHub repository. Note that the package is still under development. If you find bugs you are highly welcome to report your issues (write an email or create an issue) You can install simplerspec
using the devtools package. Currently, there seems to be still an issue that install_github()
does not automatically install all packages that are listed under "imports" (see here). In case you obtain error messages that packages can't be found, install the following packages:
# List of packages to be installed
list_packages <- c("ggplot2", "plyr", "data.table", "reshape2",
"mvoutlier", "hexView", "Rcpp", "hyperSpec", "prospectr",
"dplyr", "caret", "tidyverse")
# Install packages from CRAN
install.packages(list_packages, dependencies = TRUE)
Then run:
# install.packages("devtools")
# Install the simplerspec package from the github repository
# (https://github.com/philipp-baumann/simplerspec)
devtools::install_github("philipp-baumann/simplerspec")
Key concepts and data analysis workflow
The functions are built to work in a pipeline and cover commonly used procedures for spectral model development. Many R packages are available to do tasks in spectral modeling such as pre-processing of spectral data. The motivation to create this package was:
- Avoid repetition of code in model developement (common source of errors)
- Provide a reproducible data analysis workflow for FT-IR spectroscopy
- R packges are an ideal way to organize and share R code
- Make soil FT-IR spectroscopy modeling accessible to people that have basic R knowledge
- Provide a package interface that keeps data with various structures for spectral modeling related in R objects
This package builds mainly upon functions from the following R packages:
prospectr
: Various utilities for pre-processing and sample selection. An introduction to the package with examples can be found here.plyr
anddplyr
: Fast data manipulation tools with a unified interface. See here for details.ggplot2
: Plotting system for R, based on the grammar of graphics. See herecaret
: Classification and regression training. A set of functions that attempt to streamline the process for creating predictive models. See here for details.
Consistent and reproducible data and metadata management is a important prerequisite for spectral model development. Therefore, different outputs should be stored as R objects in a consistent way using R data structures. Simplerspec functions use lists as R data structures because they allow to store complex, hierarchical objects in a flexible way. Lists can e.g. contain other lists, vectors, data.frames, or matrices.
Example workflow
In a fist step, the spectra (one file per spectrum and replicate scan) are read from
the text (.txt
) files. Currently, an export macro within the Bruker OPUS software
is used to convert OPUS binary files to spectra in the form of a text file.
The argument path
specifies the the folder where all spectral files to be loaded
into R are located. The files contain two columns that are comma-separated. The first
is the wavenumber and the second is the absorbance value.
# Read spectra in text format (Alpha spectrometer) -----------------------------
# Currently
soilspec_in <- read_spectra(
path = "data/spectra/alpha_txt"
)
Pipes can make R code more readable and fit to the stepwise data processing in the context of developing spectral models. The pipe operator (%>%, called "then") is a new operator in R that was introduced with the magrittr package. It facilitates readability of code and avoids to type intermediate objects. The basic behaviour of the pipe operator is that the object on the left hand side is passed as the first argument to the function on the right hand side. More details can be found here.
The model development process can be quickly coded as the example below illustrates:
################################################################################
## Part 1: Read and pre-process spectra, Read chemical data, and join
## spectral and chemical data sets
################################################################################
# Average, remove outlier, resample, then pre-process spectra ------------------
soilspec <- soilspec_in %>% average_spectra() %>%
remove_outliers(remove = FALSE) %>%
resample_spectra(wn_lower = 510, wn_upper = 3988, wn_interval = 2) %>%
do_pretreatment(select = "MIR0")
# Read chemical data from csv (comma-separated values) file --------------------
soilchem <- read.csv(
file = "out/data/soilchem_yamsys.csv"
)
# Join chemical and spectra data -----------------------------------------------
spec_chem <- join_chem_spec(dat_chem = soilchem, dat_spec = soilspec)
################################################################################
## Part 2: Run PLS regression models for different soil variables
################################################################################
# Example Partial Least Squares (PLS) Regression model for total Carbon (C)
pls_C <- pls_ken_stone(
spec_chem = spec_chem[!is.na(spec_chem$C), ],
# Use 2/3 of samples for calibration and 1/3 of samples for validation
ratio_val = 1/3,
variable = C,
validation = TRUE
)
Details on functions, arguments, and input and output data structures
read.spectra()
function
Credits
I would like to thank the following people for the inspiration by concepts, code and packages:
- Antoine Stevens Leonardo Ramirez-Lopez for their contributions to the prospectr package and the Guide to Diffuse Reflectance Spectroscopy & Multivariate Calibration
- Andrew Sila, Tomislav Hengl, and Thomas Terhoeven-Urselmans for the
read.opus()
function from the soil.spec package developed at ICRAF. - Hadley Wickham for his work and concepts on data science within R