What is Chemometrics?
Chemometrics is the application of mathematical and statistical methods to process chemical data in an optimal way. Chemometrics comprises mathematical modeling (multivariate data analysis) and experimental designs.
Non linear modeling
With the need for greater traceability and total process control, industrial companies seek powerful measurement systems. New chemometrics methods have to be developed to meet these needs and consequently, non-linear modeling is required for certain applications:
– To extend the boundaries of vibrational spectroscopy: measurements of physical criteria (e.g. particle size distribution) or complex parameters (e.g. wood durability), heterogeneous sample measurements (e.g. skin, organs);
– To merge large databases generated by different spectrometers, while integrating equipment variability, sample measurement conditions, and the variations due to the year, the reference measurements, etc and to obtain satisfactory robustness;
– To model data coming from different origins (traceability, process control, etc).
Historically, the most commonly used non-linear models were Artificial Neural Networks (ANN). Support Vector Machines (SVM) have recently appeared in the chemometrics field and come from the same area of “machine learning”.
The concept: some theory
The SVM are based on:
|– a local modeling method, focusing on similarity between samples. Instead of basing the model on variables, like all standard multivariate modeling methods (PCA, PLS, ANN), the SVM replace the matrix X [n×p] with a “kernel matrix” (kernel : K [n×p]) made up of similarity measurements between n calibration samples.|
|– a non-linear modeling method, with the computation of a non-linear Gaussian kernel, with a Radial Basis Function (RBF).|
The parameter determines the Gaussian width and thus the non-linearity degree of the kernel (C.f. figure):
o Low : narrow kernel (abscissa), the modeling is extremely non-linear, not many Ki,j (ordinate) close to 1, thus the model is based on only a few samples d’échantillons.
o High : wide kernel, the modeling tends to be linear, including most of the samples.
For the LS-SVM (Least-Squares Support Vector Machines), which is an easier method to optimize than the SVM, a regularization parameter must also be optimized to reduce overfitting.
A regression vector b [n × 1] is computed on K [n × n] for each test sample. The predicted value is assessed for each sample according to its similarity with the samples of the calibration set.
Advantages of the kernel methods
– SVM vs PLS advantages (linear methods). non-linearity modeling capacity, increased accuracy, robustness when dealing with large database variability, multiple application fields (imaging, automation, etc).
– Minority constituent prediction: grape acidity assessment with a hand-held spectroscopic sensor (4)
– Physical property prediction: complex property assessment in wood (5)
– Modeling signals of innovative sensor: processing the spectra of Time-Resolved Spectroscopy (TRS) or Spatially-Resolved Spectroscopy (SRS). (6) To know more about our R&D works on optical sensors.
– Process control and calibration transfer: calibration transfer between lab measurements and on-line measurements (7)
(1) Croguennoc A., Lallemand J., Roussel S. (2019). Comparison of Machine Learning methods for spectroscopic data analysis: how to tune Support Vector Machine models for spectroscopic quantitative predictions, Chimiométrie 2019 Conference, Montpellier France.
(2) Roussel S., Lallemand J., Preys S. (2018). The Machine Learning tools for the spectroscopic data analysis, Conférence GFSV – Groupe Français de Spectroscopie Vibrationnelle, Le Ventron France.
(3) Chauchard F., Cogdill R. Roussel S., Roger J.M. and Bellon-Maurel V. (2004) Application of LS-SVM to non-linear phenomena in NIR spectroscopy : development of a robust and portable sensor for acidity prediction in grapes, Chemometrics and Intelligent Laboratory Systems, 71, 141-150.
(4) Cogdill R.P., Schimleck L.R., Jones P.D., Peter G.F., Daniels R.F. and Clark A. III (2004). Estimation of the physical wood properties of Pinus taeda L. radial strips using Least-Squares Support Vector Machines, J.NIRS, 12, 263-269.
(5) Chauchard F., Roussel S., Roger J.M., Bellon-Maurel V., Abrahamsson,S., Svensson T., Andersson- Engels, S. and Svanberg, S. (2005). Least Squares-Support Vector Machines modelling for Time Resolved Spectroscopy, Applied Optics, 44 (30), 7091-7097.
(6) Barreiro P., Chauchard F., Roger J.M., Moya-Gonzales A., Bellon-Maurel V. Robust modelling for at-line / on-line calibration transfer in a NIR industrial application. Postharvest biology and technology. 2006.
(7) Juan Antonio Fernandez-Pierna J.A., Baeten V., Michotte Renier A. Cogdill R.P., Dardenne P. (2004). Combination of SVM and NIR imaging spectroscopy for the detection on MBM in compound feeds, J. Chemom, 18 (7-8), 341-349.
If you are interested in one of these articles, do not hesitate to ask for a copy.