Joining Data from Multiple Sources
This article explains how to compile data from different sources – spectrometer, lab analysis spreadsheet, time information – for the subsequent creation of analysis models.
You will end up with a nicely cleaned data set to immediately continue with modeling.
Creating and training a chemometric model requires the raw spectra, but also involves associated properties of the measured material:
- Timestamps
- Lab analysis values (reference data)
- Categorical labels
- Meta data
Typically, these data originate from various sources like spectrum files, spreadsheets, lab/process information management systems, etc., and need to be linked to the spectra. Only then can the dataset reasonably be used for calibration, classification, and analysis purposes. This article describes the typical workflow when creating such a dataset.
Loading Spectra
PEAXACT can load spectra from single-spectrum files or from files containing multiple spectra, like time-resolved measurements. Consider using a flat folder structure to organize your files (e.g., one folder per experiment or per date).
Loading spectra into PEAXACT is easily achieved by dragging individual files or whole folders from the Windows Explorer and dropping them into the main PEAXACT window's Samples Panel. The panel gets filled with a list of recognized spectra, named after the file plus an identifier (#1, #2, ...), which is useful for multi-spectrum files.
For inspection and visualization of the dataset as a whole, and for adding additional information to the spectra, use the PEAXACT Data Inspector.
Adding Timestamps
Most spectrum file formats carry additional information about parameters of the measurement. If you have a choice, consider saving your spectra in a format that supports such metadata. The acquisition timestamp is of particular interest because it is the typical key to link sample information from other sources, e.g., results from offline analysis. Use the Load Timestamp button in the Data Inspector to automatically pull these timestamps into PEAXACT.
If you are rather interested in relative times than in absolute timestamps, e.g., when analyzing the time evolution of a batch process, use the Timestamp > Time conversion, allowing you to select the spectrum of time = 0 and the unit (seconds, minutes, hours, ...) of the relative time.
Adding Labels
Samples are often labeled or grouped, e.g., by a batch ID, material supplier name, sample code, or the like. It is always useful to combine these values with the spectra in order to elucidate the specific differences between groups, or to train a Classification Model at a later stage.
In cases where the folder path or the spectrum filename can already be used for grouping – great, just use a filter expression in Data Inspector to filter the table! In cases where you want to use a separate text label, you first need to add a new categorical feature to the table and insert some values. In PEAXACT, a categorical feature is recognized as such if its name is put in curly brackets (e.g., {Batch ID}).
Join Tables
A crucial step in putting together your data set is adding reference values, which are often stored in external spreadsheets or can be exported from databases into such. Instead of manually copying values one by one to the related spectra, the PEAXACT Data Inspector provides a convenient way for automatically joining tables. If your current table in PEAXACT and your reference table have (at least) one feature in common, it can be used for matching up rows of both tables.
For example, you could join tables by matching timestamps. This even works if the acquisition times of spectra and measurement times of reference values do not agree perfectly, because PEAXACT allows you to specify a tolerance and joins the closest match.
Marking Bad Samples
To spot problematic samples in your dataset – e.g., measurement errors or deficient reference values – a visual inspection is often sufficient, eventually supported by data pretreatments to highlight the systematics in the data. You can use the Selection Tool in any plot to select spectra and set their Quality to Bad. Bad samples will be ignored in future modeling and analysis steps.
Preserving the Dataset
After you have put so much work into gathering all the pieces of information in one place, you certainly wouldn't want to lose it. With the PEAXACT Data Inspector, you can export the new table to a spreadsheet file and preserve it for upcoming modeling challenges. Next time, just load the table file, and PEAXACT will automatically reload all the spectra along with the associated features – without the need to repeat any of the steps above. Nice, isn't it?