Abstract:The analysis accuracy of energy dispersion X-ray fluorescence spectrometry (XRF) for detecting heavy metal in agricultural soils is severely depending on complex matrix effect, thereby posing a challenge in fast and precise monitoring soil contamination. To calibrate the XRF detection, a Gaussian mixture clustering-multilevel model (GMC-MLM) was proposed to enhance XRF accuracy for Cd in agricultural soils. Compared with other models such as multiple linear regression (MLR), random forest regression (RF), and support vector machine regression (SVMR), the GMC-MLM effectively disentangled the nested distribution of XRF detection errors. The correlation coefficient between the XRF detection results and ICP-MS test results for the corrected samples can reach 0.9085, with 74% of the corrected samples having a relative error of less than 30%. Notably, according to the GMC-MLM correction method, a knowledge base for localizing corrections in XRF detection has been constructed. When the number of knowledge base sample points is 50, the RMSE (Root Mean Squared Error), and REM (Relative Error of Mean) are 0.7347, 3.7014%, respectively. It can be observed that the model has good extrapolation capability, and with the increase in the number of knowledge base sample points, the correction effect based on the knowledge base gradually stabilizes. This knowledge base-based GMC-MLM calibration method can be embedded into XRF detection instruments to recalibration XRF detection results.