Blog, Tutorials & Release Notes

Welcome to the PEAXACT blog! Here, you'll find in-depth insights into the software, along with expert tips and tricks to enhance your experience.

Getting Started with PLS

The main objective of this tutorial is to get you familiar with multivariate calibration using Projection to Latent Structures, also known as Partial Least Squares (PLS). The tutorial is intended for PEAXACT users and persons interested in PEAXACT.

In this tutorial, you learn how to:

  1. Modify the Pretreatment Model and create Data Filters
  2. Perform PLS calibration
  3. Improve the Calibration Model
  4. Evaluate calibration alternatives

If you have PEAXACT installed on your computer, you may try this tutorial right away. If you don't have PEAXACT yet, get a free trial now.

Preparations

You can find data for this tutorial in %ProgramFiles%\S-PACT\PEAXACT 5\Data\NIR - Gasoline. The directory will be referred to as DATA from now forward.

  • Start PEAXACT.
  • Choose File > New Session > NIR from the menu, which opens a new modeling session with default settings for near-infrared data.

Pretreatment Model and Data Filters

If you have never worked with PLS before, this tutorial is probably not the best starting point, except when you are willing to accept that PLS is harder to interpret than... well, almost all other methods PEAXACT has to offer. So we don't bother to try explaining it here. On the other hand, PLS is very easy to use and produces good results, if (and that's a big if) you provide good training samples, and make that a lot of it.

  • Choose Data > Load Table... from the menu, browse to DATA\References and select DataTableCalibration.xls to load 60 near-infrared spectra of gasoline with associated octane numbers.
  • Select the first sample in the Samples Panel.
  • Choose File > New to create a new model. Expand the model tree and select the Pretreatment item. The Pretreatment Model is displayed in the Model Properties Panel.
  • For PLS, it is highly recommended to enable some sort of Resampling and reduce the Global Range a bit to ensure identical x-axes for all spectra (it's a PLS requirement). Change the Pretreatment Model as follows:
    • Resampling: Equidistant Points
    • Number of Points: 400
    • Global Range: 5880 – 11110 cm-1
  • PLS correlates the variance in the spectral signal with the variance in the feature values. So let's also increase the spectral variance. Make the following modifications to the Pretreatment Model:
    • Derivative/Smoothing: 1st order derivatives
    • Filter Length: 5
    • Standardization: SNV normalization

You probably agree that some spectral regions have less variance than others, so we should exclude those irrelevant regions. The problem is that we cannot know which regions PLS considers irrelevant until after the calibration. The solution to this problem are Data Filters. A Data Filter defines a spectral region that can be modified during calibration.

  • Enable the Data Filter Tool in the toolbar. Click into the Plot Panel and use the mouse to draw a filtered region.
  • Select the whole spectral region for now. We are going to adjust the Data Filter later.

Calibration

  • Select all samples in the Samples Panel and choose Edit Model > Calibration Model > New.... This displays the Calibration Setup Dialog.
  • Tick the feature Octane Number and assign the Data Filter.
  • Choose a Maximum rank of 10. PEAXACT performs separate regressions for each rank from 1 up to the maximum and calculates figures of merit, which you can subsequently use to decide on a specific rank. Speaking of figures of merit: also enable cross-validation by setting Partitioning to K-fold with k=10.
  • Click OK to run the calibration and cross-validation.
  • Calibration results are displayed in a Report Window. The RMSE vs. Rank plot displays calibration and validation errors over PLS rank. The best rank is as small as possible, but has a small RMSE, too. E.g., a rank of 3 with RMSECV = 0.32 would be a reasonable choice here.
  • Mark your choice in the bottom-right table. Verify your choice by looking at other reports, e.g., the Predicted vs. True plot, which displays the deviation from an ideal calibration, or the Predicted vs. ... plot, which shows error bars representing the prediction interval for a 95% level of confidence.
  • You could finish the calibration now, and nobody would blame you – except you can do better.

Improving the Calibration Model

  • First, let's modify the Data Filter. Select the Variable Importance in Projection (VIP) report from the top-right drop-down list and enable the Data Filter Tool in the toolbar.
  • Use the mouse to adjust the filter to include spectral regions of high importance. Ignore the region at the very right edge of the spectrum, though (trust me on this).
  • Right-click and select Apply Changes from the context menu. This triggers a recalculation of the Calibration Model using the changed Data Filter.
  • Next, we want to check for and remove outliers. Select the Mahalanobis Distance vs. RMS Spectral Residuals plot from the top-right drop-down list and enable the Selection Tool in the toolbar.
  • Change the selected Rank from 1 to 2 to 3. You will notice that one sample has unusually large values for the Mahalanobis Distance and RMS Residuals. Select the outlier sample (use the mouse to draw a rectangle around it), then right-click and choose Usage > Ignore from the context menu. Again, this triggers a recalculation of the Calibration Model.
  • Select the RMSE vs. Rank plot again. A rank of 3 still looks like a good choice and has a much better cross-validation error, close to 0.2 now.
  • You could click OK to accept the calibration, and all would be good except you can do even better.

Evaluating Calibration Alternatives

One problem with PLS is that you can easily get tricked into believing that all is good. Granted, the cross-validation error is small, and we didn't even have to use a high rank to achieve it, but in the end, you are not interested in how PLS performs on the data it was trained with, but how robust the Calibration Model is when predicting features from new, unknown samples. Ideally, you would want to validate the model on an independent set of test samples, but in this case we only have training samples. Given these conditions, the next best option for testing the model's robustness is a 2-fold cross-validation.

  • Click the + button next to the drop-down list saying Calibration #1. This displays the Calibration Setup Dialog again.
  • Change the cross-validation settings to K-fold with k=2 and click OK. A second Calibration Model, Calibration #2, is added to the drop-down list.
  • Inspect the RMSE vs. Rank plot one last time. The RMSECV for rank 3 is still good given the strict cross-validation conditions, which gives us some confidence in the model's robustness.
  • Pick one Calibration Model and don't forget to set the correct rank in the bottom-right table before clicking OK.

This concludes the tutorial on PLS. Congratulations, you made it to the end despite the lack of details here and there. If you want to learn more about PLS, please contact us.

Have you seen our other tutorials yet? Check the overview on PEAXACT Quick Start!

Back to Blog Overview