Instant quantification of sugars in milk tablets using near infrared spectroscopy and chemometric tools
Details of milk samples
A total of 13 different brands of milk tablets were obtained from local grocery stores in Chiang Mai, Thailand. The relevant details of the milk tablets are summarized in Table 1. The samples were divided into three groups, namely training (T1T3), internal validation (I1I3) and external validation (E1 E7). Each milk tablet was ground into a fine, homogeneous powder using a ceramic mortar and pestle. To generate systematic variations representing the sugar contents in the milk samples, a central composite design structure (CCD) was used.^{22}, including nine experiments for each sample. For example, for samples T1, T2 and T3, amounts of sucrose (analytical grade, purity >99%, RCI Labscan, Bangkok, Thailand) and lactose (analytical grade, purity >99%, KEMAUS, NSW, Australia) were added to the milk powder according to the coded values of the CCD structure shown in Table 2. Then, a combination of the three samples of the CCD model was used to construct the training set, resulting in a total of 27 milk samples. The use of the CCD structure was to ensure that the variation of the recorded NIR spectra was related to the concentrations of the sugars in the milk samples and that the number of training samples was sufficient to establish the prediction models.^{23}. Samples I1, I2 and I3 were used to establish internal validation samples, while variations in sugar contents were also generated based on the CCD model. Therefore, 27 additional powdered milk samples were used to build the internal validation set. Samples E1 to E7 were used as external samples to represent the independent test set. These were used to assess the performance of the calibration models when real samples were introduced.
It should be noted that two main types of milk tablets were used in this research. Samples E2 and E3 were tablets containing no milk or “cheap milk tablets” to which an artificial milk flavor was added to achieve product satisfaction. On the other hand, the other milk samples were produced using cow’s milk as the raw material and were called “premium milk tablets”.
NIR spectral detection
NIR spectra of milk powder (9.00 g) were acquired using a NIR transport module (width × length × depth: 5.7 × 29.4 × 2.0 cm) equipped with the NIRSystem 6500 (MultiMode™ Analyzer, Foss, USA) in the range of 400 to 2500 nm at a sampling interval of 2 nm, producing 1050 data points per spectrum. An average of 64 scans was used for each sample. The milk tablet samples were placed inside the NIR transport module. Layers of milk tablets were directly attached to the container glass following the measurement conditions of the powder samples. Milk samples were held at a controlled room temperature of 25°C for at least 6 h prior to NIR detection. Prior to analysis, NIR spectra were preprocessed by standard normal variable (SNV) to eliminate errors caused by light scattering during NIR measurement. Second, they were meancentered so that the analysis focused on the variance from the mean of the data rather than the absolute values.
HPLC analysis of sugar determination
The sugar content of the milk tablet samples was measured by high performance liquid chromatography (HPLC). For sample preparation, 1.00 g of each ground milk tablet was dissolved in 10 ml of ultrapure water and stored in a water bath (Julabo Labortechnik GMbH, Seelbach, Germany) at 55°C for 5 min. Then HPLC grade acetonitrile was added for protein precipitation^{24.25}. After denaturation, the sample solution was centrifuged at 10,000 rpm for 5 min. The clear solution was then filtered through a 0.45 µm nylon syringe filter (Agilent Technologies, CA, USA).
Chromatographic analysis of the sugar content of milk tablets was performed with a high performance liquid chromatograph (Agilent 1100 HPLC system, CA, USA) with an Agilent ZORBAX NH_{2} column (5 µm, inner diameter 4.6 mm, length 150 mm) operated at 25°C. The samples were autoinjected into the HPLC system with an injection volume of 10 µL. A mixture of HPLC grade acetonitrile and ultrapure water (75/25% v/v) was used as the mobile phase with a flow rate of 1.00 mL/min. A refractive index detector (RID) was operated at 25°C. Sugar contents were determined using the external standard calibration curve of sucrose and lactose standards resulting in R^{2} values of 0.9907 and 0.9896, respectively. The sugar concentration values in the studied milk tablet samples are summarized in Table 1.
Chemometric analysis
Standardization of NIR spectra using DS and PDS calibration transfers
Although both forms of milk samples (tablet and powder) were considered solid, there were differences, for example, in particle size and compaction pressure of the tablets. These physical variations have resulted in significant discrepancies in the recorded NIR spectra^{26}. Calibration transfers are multivariate correction methods that can be applied to stabilize variations that may have occurred due to different instrument and measurement conditions. In this research, they were used to account for any signal deviation between the spectra obtained from the tablet samples and the powder samples. Piecewise direct standardization (PDS) is an extension algorithm of a conventional method called DS direct standardization^{27.28}. The DS method describes the correlation between the two data matrices (X_{m} and X_{s} reference to master and slave data) by calculating a transformation matrix (F) using several linear regression models such as MLR, PCR, and PLS:
$${varvec{X}}_{m} = {varvec{X}}_{s} times {varvec{F}}$$
The extension in the PDS algorithm is that each spectral point of the base data (X_{m,I}) is specifically related to a spectral subset of the slave data (X_{s,j}). The PDS algorithm involves the following steps:

Step 1: Select the base data spectral points (X_{m,d}) at the wavelength J.

Step 2: Define the slave data subset spectra (X_{s,j}) near wavelength I form an index I − k at I + k
$${varvec{X}}_{s,j} { = [}{varvec{x}}_{s,j – k} cdot {varvec{x}}_{s,j – k + 1} {, } ldots {, }{varvec{x}}_{s,j + k – 1} cdot {varvec{x}}_{s,j + k} ]$$
wherekis the size of the window controlling the amount of spectral data that will be used in the calculation.

Step 3: Establish the regression coefficient
$${varvec{X}}_{m,j} = {varvec{X}}_{s,j} times {varvec{b}}_{j}$$
where b_{I} is a vector containing regression coefficients.

Step 4: Generate the transformation matrix (F) by organizing the b_{I} in a diagonal matrix
$$varvec{F }={text{diag}};{(}{varvec{b}}_{{1}}^{T} {,};{varvec{b}}_ {{2}}^{T} {,} ldots ;{varvec{b}}_{j}^{T} {,} ldots ;{text{b}}_{n}^ {T} )$$
wherenotis the number of spectral channels included.

Step 5: Standardize spectra of unknown samples (X_{Sun}) using F to get the modified spectrum (X_{s,PDS})
$${varvec{X}}_{s,PDS} = {varvec{F}} times {varvec{X}}_{s,un}$$
In this research, DS and PDS transformations were used to account for inconsistencies between spectra obtained from powder samples and tablet samples. These transformation methods investigated the correlation between the two data sets. After that, the resulting correlation information was applied to fit the NIR spectra of the milk tablet samples. Therefore, the fitted data could be compatible when predicting using the calibration model established from the NIR spectra of the powder samples without the need to recalibrate the model.
Model optimization was based on a previously published report^{21}. Correlation matrices in DS and PLS were determined using PLS regression which was calculated using the training samples and optimized based on internal validation samples.
PLS for quantitative analyzes
Partial least squares (PLS) regression is one of the most powerful methods for analyzing multivariate calibration models^{29}. The significant advantage of the PLS algorithm is that the variations obtained from both the predictive and response parameters are simultaneously extracted and then used to build the prediction model. With the use of the PLS model, the correlation between these blocks of information could be maximized. In most cases, PLS could successfully offer the optimal predictive performance for NIR spectral data prediction^{11.30}.
In this research, NIR spectra and sugar contents were respectively used as predictive and response parameters for PLS models. The calculation of the PLS was carried out following the procedure described in the previously published literature^{29}. The leaveoneout crossvalidation method was applied to identify the optimal number of PLS latent variables^{31}. According to Table 1, the PLS models were developed using training samples (T1T3) as calibration data. To validate the models, internal validation (I1I3) and external validation (E1E7) samples were used for validation and prediction, respectively.
The predictive performance of PLS models in terms of prediction accuracy was reported by root mean square error of calibration (RMSEC) and root mean square error of prediction (RMSEP). The determination coefficients for the calibration (R^{2}) and prediction (Q^{2}) were calculated to determine the robustness of the models. In addition, the standard error of crossvalidation (SECV) and the ratio of prediction deviation (RPD) were used to compare the different predictive performances of the calibration models.^{32}. PLS model calculations, PDS calibration transfer, and statistical analyzes were implemented using inhouse MATLAB scripts (MATLAB, The Math Works Inc., Natick).