Identify start and end of measurements

After import, the user can choose from two methods to define the start and end points of each measurement and assign a UniqueID:

Automatic identification of measurements

If the imported file already contains the column chamID, the file is ready for flux calculation.

# Retrieve file path from example file in the goFlux package
# using the function system.file
file.path <- system.file("extdata", "LI8200/LI8200.json", package = "goFlux")

# Import in the environment
imp.LI8200 <- import.LI8200(inputfile = file.path)

# Is there a column called chamID in this file?
any(grepl("chamID", names(imp.LI8200)))
[1] TRUE

Alternatively, one may want to modify the start and end time even if there are automatic recordings of chamber closure and opening. For example, there might be a delay after chamber closure that needs to be removed (e.g. LI-6400).

In such a case, you can choose from two options, and proceed to step 2 of the manual identification of measurements:

  • Use the file generated by the instrument as inputfile (e.g. LI-6400) and do not provide an auxfile in the function obs.win.
  • Use the file generated by the greenhouse gas analyzer as inputfile (e.g. LI-7820) and the file generated by the chamber as auxfile (e.g. Smart Chamber) in the function obs.win.

From that point forward, follow the same procedure as for manual identification of measurements.

Manual identification of measurements

The manual identification of measurements is done in three steps:

  1. Create an auxiliary file
  2. Define a window of observation for each measurement
  3. Click on a scatter plot to identify start and end times

1. Create an auxiliary file

The auxiliary file (auxfile) requires two elements: a UniqueID and a start.time for each measurement. The UniqueID must be unique. For example, one could combine the name of a site (733a), a plot number (C) and a subplot (C), which would give the UniqueID “733a_C_C”. If repeated measurements are done on the same experimental unit, then a date (e.g. 2022-09-28) could be added to the UniqueID (e.g. “733a_C_C_220928”) to make it truly unique and easy to understand. The start.time must be in the format “%Y-%m-%d %H:%M:%S” (e.g. 2022-09-28 12:17:00) to be converted to POSIXct.

You can an example Excel spreadsheet that you can use to create your auxiliary file.

Load the auxfile

Creating an auxiliary file with date and time can be a pain, especially while using Excel. It is recommended to save your file as .txt before import into R, to make sure that the date and time formats are appropriate (“%Y-%m-%d %H:%M:%S”, e.g. 2022-09-28 12:17:00).

As a text file (.txt)

In this example, the start.time for each measurement (UniqueID) was noted manually in the field, and are provided in an auxiliary file constructed in Excel and saved as a text file (aux_UGGA.txt).

aux.path <- system.file("extdata", "aux_UGGA/aux_UGGA.txt", package = "goFlux")
auxfile <- read.delim(aux.path) %>% 
  # Use the function as.POSIXct to convert start.time to a POSIXct format
  mutate(start.time = as.POSIXct(start.time, tz = "UTC"))
  UniqueID          start.time Area Vtot Tcham Pcham
1 733a_B_E 2022-09-28 12:36:00  324 6.17  11.0  99.4
2 733a_B_S 2022-09-28 12:31:00  324 5.84  11.0  99.4
3 733a_B_W 2022-09-28 12:26:00  324 6.40  10.9  99.4
4 733a_C_C 2022-09-28 12:17:00  324 5.61  11.0  99.4
5 733a_C_E 2022-09-28 12:21:00  324 6.00  11.0  99.4
6 733a_C_S 2022-09-28 12:11:00  324 6.36  11.1  99.4

As an Excel spreadsheet (.xlsx)

It is also possible to load an Excel sheet into R using the function read_excel from the package readxl.

aux.path <- system.file("extdata", "aux_UGGA/aux_UGGA.xlsx", package = "goFlux")
auxfile.xlsx <- read_excel(aux.path)

You’ll note that this function detects Date format and converts it to POSIXct automatically.

class(auxfile.xlsx$start.time)
[1] "POSIXct" "POSIXt" 

Make sure that the time zone in your auxfile matches the time zone in your imported gas measurements files. By default, all import functions, as well as the function read_excel, use the time zone UTC.

auxfile.xlsx$start.time
[1] "2022-09-28 12:36:00 UTC" "2022-09-28 12:31:00 UTC"
[3] "2022-09-28 12:26:00 UTC" "2022-09-28 12:17:00 UTC"
[5] "2022-09-28 12:21:00 UTC" "2022-09-28 12:11:00 UTC"

In this example, the auxfile also contains additional auxiliary data that will be required for the flux calculation: the surface area inside the chamber (Area; cm2), the total volume inside the chamber, the tubing, and the instrument (Vtot; L), the atmospheric temperature inside the chamber (Tcham; Celsius) and the atmospheric pressure inside the chamber (Pcham; kPa).

If start time is unknown

If field notes did not include the start.time of each measurement, it is possible to quickly construct an auxfile by visually inspecting each gas measurement files.

# Retrieve file path from example file in the goFlux package
# using the function system.file
file.path <- system.file("extdata", "UGGA/UGGA.txt", package = "goFlux")

# Import in the environment
imp.UGGA <- import.UGGA(inputfile = file.path)

# Visualise data
plot(x = imp.UGGA$POSIX.time, y = imp.UGGA$CO2dry_ppm, # Data
     xlab = "Time", ylab = "CO2dry_ppm", # Labels
     ylim = c(400, 550)) # Plot limits

Tip to improve x axis time display

To help visualize the time on the x axis, remove the x axis in the function plot and use the function axis.POSIXct to display the time in the desired format:

# Visualise data
plot(x = imp.UGGA$POSIX.time, y = imp.UGGA$CO2dry_ppm, # Data
     xlab = "Time", ylab = "CO2dry_ppm", # Labels
     ylim = c(400, 550), # Plot limits
     xaxt = 'n') # remove x axis tick marks

# get the right time zone from your data
time.zone <- attr(imp.UGGA$POSIX.time, "tzone")
# force axis.POSIXct to use that time zone by changing the system timezone
Sys.setenv(TZ = time.zone)
# add the new x axis to the plot
axis.POSIXct(1, at = seq(min(imp.UGGA$POSIX.time), max(imp.UGGA$POSIX.time), 
                         by = "3 mins"), format = "%H:%M")
# change the system timezone back to default
Sys.unsetenv("TZ")

In this example there are six measurements of approximately three minutes each. Looking at the graph, you can roughly estimate the start time of each measurement:

Peak_# UniqueID start.time
(estimated)
start.time
(from field notes)
Peak_1 733a_C_S 12:12:00 12:11:00
Peak_2 733a_C_C 12:18:00 12:17:00
Peak_3 733a_C_E 12:23:00 12:21:00
Peak_4 733a_B_W 12:27:00 12:26:00
Peak_5 733a_B_S 12:33:00 12:31:00
Peak_6 733a_B_E 12:37:00 12:36:00

In the next step (2), you will define a window of observation for each measurement based on the start.time, the observation length (obs.length) and some buffer time (shoulder) before and after the measurement. Knowing this, it does not matter how exact your estimation is, because you can allow for more buffer before and after the measurement.

2. Define a window of observation for each measurement

In the next step (3), you must click on the start point and the end point of each measurement in a scatter plot to identify the start.time and the end.time, using the function click.peak2. Before that, you must create a list of data frame containing one data frame per UniqueID using the function obs.win. In this example, this step will separate the file imp.UGGA into a list of 6 data frames.

The purpose of this step is to zoom in on each measurement in the file to help you identify the start and end time more easily. In order to correctly zoom in on each measurement, you must know the start.time (defined in step 1) and the observation length (obs.length) of each measurement. In addition, you can define some buffer time around each measurement in case the defined start.time is not exact. If the observation length is different for each measurement, use the longest one.

Usage

Note

Code chunks under Usage sections are not part of the demonstration. They are meant to show you how to use the arguments in the function.

obs.win(
  inputfile,
  auxfile = NULL,
  gastype = "CO2dry_ppm",
  obs.length = NULL,
  shoulder = 120
)

Arguments

inputfile data.frame; output from import or align functions.
auxfile data.frame; auxiliary data frame containing the columns start.time and UniqueID. start.time must contain a date and be in POSIXct format. The time zone must be the same as the POSIX.time in inputfile. The default time zone for the import functions is “UTC”. A data frame from the Smart Chamber (LI-8200) can be used as an auxiliary file. In that case, chamID will be used instead of UniqueID, if UniqueID cannot be found.
gastype character string; specifies which gas should be displayed on the plot to manually select start time and end time of measurements. Must be one of the following: “CO2dry_ppm”, “COdry_ppb”, “CH4dry_ppb”, “N2Odry_ppb”, “NH3dry_ppb” or “H2O_ppm”. Default is “CO2dry_ppm”.
obs.length numerical; chamber closure time (seconds). Default is NULL. If obs.length is not provided, a column obs.length should be contained in auxfile or inputfile. Alternatively, obs.length will be calculated from start.time and cham.open or end.time if found in auxfile or inputfile.
shoulder numerical; time before and after measurement in observation window (seconds). Default is 120 seconds.

Details

In gastype, the gas species listed are the ones for which this package has been adapted. Please write to the maintainer of this package for adaptation of additional gases.

Example

In this example, the observation length was approximately three minutes (180 seconds) for each measurement. For a shoulder of 60 seconds, the observation window of each measurement will show 60 seconds before the start.time and 240 seconds after.

ow.UGGA <- obs.win(inputfile = imp.UGGA, auxfile = auxfile,
                   obs.length = 180, shoulder = 60)
Peak_# UniqueID start.time obs.win min obs.win max
Peak_1 733a_C_S 12:11:00 12:10:00 12:15:00
Peak_2 733a_C_C 12:17:00 12:16:00 12:21:00
Peak_3 733a_C_E 12:21:00 12:20:00 12:25:00
Peak_4 733a_B_W 12:26:00 12:25:00 12:30:00
Peak_5 733a_B_S 12:31:00 12:30:00 12:35:00
Peak_6 733a_B_E 12:36:00 12:35:00 12:40:00
Note

Note that, in this example, there is an overlap between the peaks 2 and 3. Therefore, 60 seconds of the data will be duplicated in the data frames of these two UniqueID.

Tip: Use the function on multiple files at a time

To load multiple RData files at once in your environment and store them all in one object, use the function map_df from the package purrr.

Use the argument pattern to load only the files that match a pattern.

my.files <- list.files(path = "RData", pattern = "imp.RData", full.names = TRUE) %>%
  map_df(~ get(load(.x)))

ow.UGGA <- obs.win(inputfile = my.files, auxfile = auxfile,
                   obs.length = 180, shoulder = 60)
Warning

Pay attention to the warning message given by obs.win when there are more than 20 measurements (which is not the case in this example). If there had been more than 20 measurements, a warning like this would appear:

WARNING! Do not loop through more than 20 measurements at a time to avoid mistakes.
You have 21 measurements in your dataset.
You should split the next step into at least 2 loops.

In such a case, follow this tip with the function click.peak2

3. Click on a scatter plot to identify start and end times

When running the function click.peak2, for each measurement, a window will open, in which you must click on the start point and the end point. The observation window is based on the start.time given in the auxfile, the length of the measurement (obs.length), and a shoulder before start.time and after start.time + obs.length.

Usage

click.peak2(
  ow.list,
  gastype = "CO2dry_ppm",
  sleep = 3,
  plot.lim = c(380, 1000),
  seq = NULL,
  warn.length = 60,
  save.plots = NULL
)

Arguments

ow.list list of data.frame; output from the function obs.win. Must contain the columns gastype (see below), POSIX.time and UniqueID.
gastype character string; specifies which gas should be displayed on the plot to manually select start time and end time of measurements. Must be one of the following: “CO2dry_ppm”, “COdry_ppb”, “CH4dry_ppb”, “N2Odry_ppb”, “NH3dry_ppb” or “H2O_ppm”. Default is “CO2dry_ppm”.
sleep numerical value; delay before closing the resulting plot. Grants a delay between measurements to visually inspect the output before processing the next measurement. Sleep must be shorter than 10 seconds. If sleep = NULL, the plots will not close.
plot.lim numerical vector of length 2; sets the Y axis limits in the plots. Default values are set for a typical gas measurement of “CO2dry_ppm” from soils: plot.lim = c(380,1000).
seq a numerical sequence that indicates objects in a list. By default, seq = NULL and the function loops through all data frames in ow.list.
warn.length numerical value; limit under which a measurement is flagged for being too short (nb.obs < warn.length). Default value is warn.length = 60.
save.plots character string; a file name without the extension .pdf to save the plots produced with click.peak2. By default, save.plot = NULL and plots are not saved.

Details

The argument plot.lim is used to remove any data points below and above the plot limits for a better view of the scatter plot. If the gas measurements are larger than the minimum or smaller than the maximum plot limit values, then the plot will automatically zoom in and adjust to those values. The default plot limits are set for a typical gas measurement of “CO2dry_ppm” from a soil respiration measurement: plot.lim = c(380,1000), where 380 ppm is the minimum plotted concentration, which should be close to atmospheric concentration, and 1000 ppm is the maximum plotted concentration, which correspond to a maximal accumulated concentration in the chamber before considering it an outlier (e.g. caused by breath or gas bubble). For other gasses, the user must specify the plot limits themselves. Here are some suggestions of plot limits for the other gases:

  • “CH4dry_ppb”: plot.lim = c(2200, 1800)
  • “N2Odry_ppb”: plot.lim = c(250, 500)
  • “NH3dry_ppb”: plot.lim = c(0, 200)
  • “COdry_ppb”: plot.lim = c(0, 200)
  • “H2O_ppm”: plot.lim = c(10000, 20000)

These values will vary depending on ecosystem type and chamber application scheme.

The argument seq is used to select a subset of data frame from the list of data frames in ow.list. For example, to apply the function on the first measurement in the list, set seq = 1, or seq = seq(1,10) for the first 10 measurements.

warn.length is the limit below which the chamber closure time is flagged for being too short (nb.obs < warn.length). Portable greenhouse gas analyzers typically measure at a frequency of 1 Hz. Therefore, for the default setting of warn.length = 60, the chamber closure time should be approximately one minute (60 seconds). If the number of observations is smaller than the threshold, a warning is printed: “Number of observations for UniqueID: ‘UniqueID’ is X observations”.

In gastype, the gas species listed are the ones for which this package has been adapted. Please write to the maintainer of this package for adaptation of additional gases.

Value

The function returns a data.frame, identical to an unlisted version of the input ow.list, with the additional flagEtimestart.time_corrend.time_corr and obs.length_corr.

Example

In this example, the observation time is 3 minutes (180 seconds) and the shoulder is 60 seconds. Therefore, the observation window shows 60 seconds before the start.time and 240 seconds after.

# Manually identify measurements by clicking on the start and end points
manID.UGGA <- click.peak2(ow.UGGA, seq = 1)

Multiple loops with large data sets

Note that the output from the function obs.win is a list of data frames. The function click.peak2 loops through all data frames in the list.

For more than 20 measurements, it is highly recommended to create multiple loops. This helps avoid issues with miss-clicking and repeating the process.

If you need to create multiple loops, use the argument seq in click.peak2:

# Create two loops
manID.UGGA.1 <- click.peak2(ow.UGGA, seq = seq(1,10))
manID.UGGA.2 <- click.peak2(ow.UGGA, seq = seq(11,20))

# Combine the two objects back into one object
manID.UGGA <- rbind(manID.UGGA.1, manID.UGGA.2)
Sleep between loops

Between each measurement, the result of the function click.peak2 is displayed for 3 seconds. To increase this delay, change the parameter sleep in the function click.peak2. To prevent the quality check plots from closing, use sleep = NULL.

Save plots

Both plots generated in the function click.peak2 can be saved as a pdf using the argument save.plots with a character string indicating the name of the pdf file (without the extension .pdf):

manID.UGGA <- click.peak2(ow.UGGA, save.plots = "click.peak.plots")

Warning

If the number of observation is under a certain threshold (warn.length = 60), a warning will be given after clicking on the start and end points as such:

Warning message: Number of observations for UniqueID: 733_C_C is 59 observations

Otherwise, if the number of observation satisfies this threshold, then the following message is given instead:

Good window of observation for UniqueID: 733a_C_C

This argument can be adjusted depending on the expected number of observations in your experiment.

Tip

To re-click on a single measurement based on UniqueID, see this example shown on the Troubleshoot page.

Additional auxiliary parameters

To convert the flux estimate’s units into nmol CO2/H2O m-2s-1 or µmol CH4/N2O m-2s-1, the temperature inside the chamber (Tcham; °C) and the atmospheric pressure inside the chamber (Pcham; kPa) are also required. If Pcham and Tcham are missing, normal atmospheric pressure (101.325 kPa) and an air temperature of 15 °C are used as default.

Additionally, one must provide the surface area inside the chamber (Area; cm2) and the total volume in the system, including tubing, instruments and chamber (Vtot; L). If Vtot is missing, one must provide an offset (distance between the chamber and the soil surface; cm) and the volume of the chamber (Vcham; L). In that case, the volume inside the tubing and the instruments is considered negligible, or it should be added to Vcham.

The final output, before flux calculation with goFlux requires the columns: UniqueID, Etime, flag, Vtot (or Vcham and offset), Area, Pcham, Tcham, H2O_ppm and other gases.