What is Chemometrics?
Chemometrics is the application of mathematical and statistical methods to process chemical data in an optimal way. Chemometrics comprises mathematical modeling (multivariate data analysis) and experimental designs.
Improving model robustness
The development of models in chemometrics and vibrational spectroscopy is closely linked. Spectrometers provide non-destructive, real-time and cost-effective measurements. However, they are “secondary methods”: they need to be calibrated, i.e. the optical signals they deliver (called spectra) must be correlated with “reference measurements”, which are either chemical measurements or more complex characteristics.
Thanks to cutting-edge multivariate modeling methods, Ondalys is highly skilled in calibrating the numerous and diverse measurements collected from different sensors. One of the main issues for industrial applications is model robustness. Robustness means that the prediction results are steady whatever the experimental conditions, e.g., when the measurement is influenced by external perturbations (temperature, hygrometry, etc.). Ondalys works in cooperation with various research groups to develop innovative chemometrics methods able to overcome this crucial hurdle. The following paragraphs describe our approach.
Concept : theoretical principle
Let x be the vector containing the multivariate data to model (spectra, curve, time series, group of measurement values on a product or a process, etc.) and y the reference value to be predicted (sugar rate, polymerization degree, homogeneity, etc.).
The multilinear prediction model can be written as: ,
where b is the regression vector (linear model), is the bias, , the x transposed vector and the value to be predicted.
Every change in the measurement system triggers interference on the x vector (spectrum). In the industrial environment, these variations are related to:
– the sample (particle size, density, temperature, etc.),
– the sensor (new probe or bulb source, spectrometer transfer, change in the sampling system or sample presentation, etc.),
– the measurement conditions (temperature, pressure, hygrometry, etc.).
This interference, , added to the measured spectrum x, provoking a disturbance on the prediction, which thus becomes :
Thus, optimizing the model robustness means minimizing .
It can be written as: , or if developed :
Thus, in order to minimize error, the 3 following terms must be reduced:
1. : minimize the perturbations on the x-vector:
o By changing the measurement system. For example, the main perturbations often come from variations of the scattering signal (changes in the physical structure of the sample). Research currently underway at Ondalys is aimed at developing innovative sensors capable of separating the “pure chemical” absorption information from the physical scattering signal.
o By selecting wavelengths which are insensitive to the perturbations.
o By preprocessing the x data (for instance with spectroscopic data preprocessing techniques.
2. : minimize the regression vector norm, in accordance with the “parsimony principle”:
o By reducing the number of model variables ( wavelengths selection methods),
o By reducing the amplitude of the regression coefficients (minimize overfitting with fewer latent variables).
3. : minimize co-linearity between ?x and b, i.e. between the perturbation and the regression vector:
o By selecting the variables for which the perturbation effect is orthogonal to the regression coefficients;
o By integrating the orthogonalization constraint during the model building process.
A new innovative method for robustness improvement is based on this latest strategy, and was developed in cooperation with Cemagref/Irstea from Montpellier, a French public research institute. This study generated a host of new methods based on the spectrum orthogonal projection in a perturbation-free space.
In particular, External Parameter Orthogonalisation (EPO) (1-2-3) aims at removing from the spectrum information resulting from the influence of external parameters. After orthogonalization, is equal to 0, and the perturbation impact on the prediction results is removed .
– Guaranteed robustness : after EPO processing, the main perturbation effect disappears.
– Durable robustness : even when the perturbation disappears model performance remains unchanged (unlike with slope & bias correction for example)
– Global robustness : perturbations can be diverse in nature (temperature, addition of a constituent, turbidity, particle size). The model takes into account all the possible effects.
– Better understanding of the causes of perturbation : the removed perturbation effect, which is orthogonal to the useful signal, can be analyzed to identify perturbation sources and monitor them.
– Application to process control : within this set of methods, there is a dynamic version called DOP(3) which orthogonalizes the calibration during the process.
– This is a very practical method: once the model is developed using “EPO-preprocessed” spectra, the regression vector is intrinsically insensitive to the influence factors (b-vector orthogonal to the perturbation subspace). Consequently, it is no longer necessary to preprocess the new spectra. So these models can be directly applied to commercial spectrometers, and do not require any modification.
Application examples – Improving measurement robustness in an industrial environment
– Spectroscopy for process control : effective control of fermentations, through robust management of the critical parameters of the process (temperature and nitrogen content).
– Hand-held spectroscopic sensor in the field : how to manage temperature influence when predicting apple ripeness in orchards with near infrared spectroscopy.
– Measurements in complex highly-scattering samples (biological samples for example): cancerous tumor detection with fluorescence spectroscopy, while controlling the influence of blood oxygenation.
– Calibration transfer : transfer of models from one spectrometer to another through optical correction of the spectrum.
To meet the needs of industry, for improved process control in particular, our R&D department develops techniques designed to optimize model robustness. This R&D area is quite vast and covers themes as diverse, as the development of preprocessing and wavelength selection methods, the combination of non-linear and linear tools and the building of multivariate modeling methods based on pure spectra.
(1) Roger J. M., Chauchard F., and Bellon-Maurel V. (2003). EPO-PLS external parameter orthogonalisation of PLS: Application to temperature-independent measurement of sugar content of intact fruits. Chemometrics and Intelligent Laboratory Systems, 66(2), 191-204.
(2) Chauchard F., Roger J. M., and Bellon-Maurel V. (2004). Correction of the temperature effect on near infrared calibration – application to soluble solid content prediction. Journal of Near Infrared Spectroscopy, 12(3), 199-205.
(3) Zeaiter M, Roger J.M, V. Bellon-Maurel (2005). Dynamic orthogonal projection. A new method to maintain the on-line robustness of multivariate calibrations. Application to NIR-based monitoring of wine fermentations. Chemometrics and Intelligent Laboratory Systems. 80 (2), pp. 227-235
If you are interested in one of these articles, do not hesitate to ask for a copy.
To know more about our consulting services in chemometrics