Introduction

The accurate identification of soil geotechnical characteristics is very important for designing and building of infrastructure projects, such as roads (Habal et al., 2024a; Polo-Mendoza et al., 2023). Assessing the quality of road subgrades, embankments, and pavement structures is highly significant. The CBR and compaction characteristics are the most commonly used tests to evaluate bearing capacity under worst-case conditions and mechanical performance (Habal et al., 2024b; Mukherjee & Ghosh, 2021). Conducting these laboratory tests is both time-intensive and expensive. Therefore, finding methods to reduce the time required for determining CBR is necessary to reduce construction expenses (Ho & Tran, 2022). Additionally, laboratory-determined CBR outcomes may lack high accuracy due to sample disturbances and preparation limitations. Hence, creating prediction models provides a cost-effective and rapid way to estimate CBR rates. To predict CBR values, studies have been conducted by many researchers based on soil characteristics such as distributions of grain size, consistency limits, OMC, and MDD. For instance, a correlation was developed to estimate the CBR of compacted soil (Black, 1962). Studies revealed an inverse linear correlation between CBR and PI, indicating that a reduction in PI results in an increase in CBR. Nevertheless, (Yildirim & Gunaydin, 2011) found that the equations of regression proposed in these studies did not produce satisfactory correlations, potentially due to the complex relationship between soil parameters and inappropriate calculation methods. Advances in ML and artificial intelligence have introduced methods to find suitable regressions between soil parameters, including ANN, gene expression programming (GEP), support vector machines (SVM), multiple linear regression (MLR), and local polynomial regression (LPR). These methods have indicated that CBR values of soil can be accurately predicted. Additionally, some authors have used ML to forecast the CBR of soil treated with hydraulic binders like lime and cement such as, (Ghorbani & Hasanzadehshooiili, 2018; Suthar & Aggarwal, 2018), evaluated the CBR of sulfate silty sand treated with micro silica and lime for the use in deep soil mixing applications using ANN and Evolutionary Polynomial Regression (EPR), reaching a prediction accuracy of 0.99 for R2. (Shukla & Iyer Murthy, 2024) employed laboratory tests (OMC, MDD, PI, Shrinkage limit SL, Swell pressure) of 211 soil samples using ANNs, Adaptive neuro-fuzzy inference system (ANFIS), Multi-layer perceptron (MLP), and Radial basis function (RBF). Their results indicated that Levenberg–Marquardt algorithm (LMA) is the most accurate algorithm, with excellent statistical performance. Many studies have explored the application of soft computing in various civil engineering fields (Iranmanesh & Kaveh, 1999; Kaveh, 2024; Kaveh & Laknejadi, 2013; Kaveh et al., 2020), finding it accurate and effective. Moreover, conventional mathematical approaches are gradually being replaced by computational approaches according to artificial intelligence (AI) (Baghbani et al., 2022; Sharma et al., 2021). Computational models reveal hidden correlations among large volumes of data, offering improved accuracy in forecast values (Jaksa & Liu, 2021; Yin et al., 2020). Many authors employed DNN to predict the CBR of granular soil (Erzin & Turkoz, 2016; Taha et al., 2019; Taskiran, 2010). (Othman & Abdelwahab, 2023) investigated the employment of DNN to forecast the CBR of road subgrade soil. 77 soil samples were used in this work to find an accurate relationship between soil parameters to better estimate CBR values. The input tests included grain size distribution, Atterberg Limits (LL and PL), Proctor test (OMC and MDD), and CBR test. The analysis indicated that the performance of DNN surpassed shallow ANNs. (Bardhan et al., 2021) employed ELM-based adaptive neuro swarm intelligence techniques for predicting the CBR of soils in soaked conditions, using a dataset of 312 test data results of soaked CBR, which have included these parameters Gravel content (G), sand content (S), silt and clay content (S&C), PI, MDD, OMC, and CBR values. The proposed ELM-based ANSI models, particularly ELM-MPSO, provided a solution to the challenges associated with adjusting the acceleration coefficients of SPSO through trial-and-error methods for predicting soil CBR, with potential applications in other geotechnical engineering problems.

Based on the authors’ knowledge, previous research has predominantly utilized traditional ML methods to forecast the CBR and compaction parameters. However, recent studies suggest that alternative techniques such as hybrid ML methods might produce more effective and precise results. Additionally, these studies often evaluated their predictive models using only a single split data of the, increasing concerns about the model’s capability to generalize and avoid overfitting or under-fitting. Furthermore, numerous published works present ML models as complex mathematical equations, making them difficult to replicate in future research. This approach offers limited value to other researchers and professionals in civil engineering. To address these issues, some authors have begun presenting their most effective models through programmed interfaces or simple scripts written in commonly utilized programming languages such as Python or MATLAB. This approach improves accessibility, allowing anyone interested in modeling to utilize these models, regardless of their proficiency level (Bioud et al., 2023).

The present paper introduces “ComParaCBR2024,” a user-friendly interface for predicting CBR and compaction parameters. This study makes two key contributions: first, it utilizes new metaheuristic hybrid ML methods previously unexplored for this application to improve prediction accuracy; second, it addresses overfitting and under-fitting issues through the K-fold cross-validation approach. The optimal model is subsequently used to create a graphical user interface (GUI) that is accessible to civil engineers and researchers. “ComParaCBR2024” is practical, reliable, and cost-effective, offering an efficient method for predicting CBR and compaction parameters using readily available data, thus eliminating the need for costly laboratory investigations.

Materials and methods

Overview of the methodology

To the author’s knowledge, various advanced ML techniques have been employed to predict compaction parameters and the CBR. Among these methods are ANN and RF. This study introduces novel methods: NAS-ANN and NAS-RF. A database of 90 soil samples, gathered from previous studies, was used. Various input parameters related to soil size distribution were employed, including the diameters at 10%, 30%, 50%, and 60% passing, as well as the coefficients of uniformity (Cu) and curvature (Cc). The performance of these ML methods was evaluated using multiple statistical indicators. To evaluate the predictive effectiveness of the optimal model, a k-fold cross-validation approach with 5 splits was utilized. Finally, a graphical interface was designed to help civil engineers and laboratory operators in easily using the optimal model for predicting compaction parameters and the CBR. This tool aims to save time, materials, and money.

Dataset

Selecting ML inputs is the most critical stage for obtaining accurate predictions. The chosen relevant inputs should address multiple facets of the issue being studied. In this research, a dataset of 90 coarse soil samples was employed to establish computational models. The database inputs were collected from experimental results elaborated by (Duque et al., 2020). The grain size diameters included in this study are the 10% passing diameter (D10), 30% passing diameter (D30), 50% passing diameter (D50), and 60% passing diameter (D60). Additionally, the coefficients of uniformity (Cu, as stated by Eq. 1) and the coefficient of curvature (Cc, as stated by Eq. 2) were included.

$$C_{C} \, = \,\frac{D60}{{D10}}$$
(1)
$$C_{C} \, = \,\frac{{D30^{2} }}{D60 * D10}$$
(2)

Remarkably, the grain size distribution, CBR and modified compaction characteristics were evaluated according to ASTM D6913, D1833, and D1557 standards, respectively. Notably, the grain size distributions vary for all granular soil samples. Furthermore, all CBR tests were conducted based on the OMC and MDD. To the author’s knowledge, numerous previous studies on soil particle size distribution have indicated that the aforementioned input parameters are critical factors influencing soil compaction behavior (MDD and OMC) as well as the CBR. Table 1 shows the input and output parameters of the suggested model utilized in this work.

Table 1 Input and output parameters of proposed model.

Machine learning methods

Numerous ML methods have been utilized in the present study to conduct a comprehensive analysis and propose an effective model. The effectiveness of ML methods has been demonstrated by many studies, revealing impressive findings across various fields. Therefore, only the methods used in this study are mentioned below, followed by additional references for readers interested in exploring these methods further. The methods employed in this research are as follows: ANN, RF, NAS-ANN, and NAS-RF. MATLAB was used to model the algorithms for each method.

Neural architecture search NAS

Neural architecture search (NAS) is a transformative technique in deep learning designed to automate the creation of neural networks. By eliminating the need for human expertise in selecting the optimal architecture, NAS employs intelligent search algorithms to identify the best-performing architectures for specific tasks. This automation could greatly improve the practicality and efficiency of deep learning, as demonstrated in seminal works by (Tan et al., 2018; Zoph & Le, 2016). NAS is poised to accelerate the development of deep learning models and broaden their application spectrum, as further evidenced by research from (Liu et al., 2017). The NAS process typically involves two main phases, as depicted in Fig. 1. The first phase is the search stage, during which various algorithms are employed to explore the extensive search space and identify promising architectures. The second stage is the evaluation stage, where the architectures discovered in the first stage are trained and evaluated on a target dataset. By iterating these two stages, the architecture is gradually refined and improved. Despite its potential, NAS faces significant challenges, the foremost being the immense size of the search space, with countless possible architectures for neural networks. This vast search space makes the process of finding the optimal architecture computationally expensive. To address this issue, researchers have proposed using reduced search spaces with ground-truth architecture performances available to evaluate the search quality (Yu et al., 2019).

Fig. 1
figure 1

Evaluation NAS algorithm

In the process of hybridizing a NAS algorithm with a neural network, the initial step involves defining a search space that includes potential network architectures. This space accounts for variables such as input data, target outputs, number of layers, activation functions, and network operations. Next, a search strategy, such as reinforcement learning, evolutionary algorithms, or gradient-based methods, is utilized to efficiently navigate this expansive design landscape. To manage the computational challenges associated with exploring an exhaustive search space, a reduced search space is implemented. This reduction helps in focusing the search process on the most promising regions, thereby optimizing computational resources and improving the efficiency of the NAS algorithm. This involved limiting the number of nodes in each hidden layer and utilizing ‘Tansig’, ‘Logsig’, and ‘Purlin’ activation functions (Amin, 2021). Ground-truth performance metrics were established for each possible architecture by training them multiple times with varying random seeds. The NAS algorithm then proceeds to generate and evaluate a multitude of architectures. By training these architectures on a given dataset and assessing their performance, the algorithm identifies the most promising designs. This iterative process allows for refining the search and honing in on the architectures that offer the best performance. The selected architecture undergoes further refinement and optimization before being deployed in a practical setting. Continuous monitoring of the deployed model ensures its ongoing performance and adaptability. Ultimately, NAS offers a powerful approach to discovering high-performing neural network architectures through a combination of automated exploration and rigorous evaluation. This method not only accelerates the development process but also enhances the efficiency and effectiveness of neural network models across various applications.

The random forest method, introduced by (Breiman, 2001), has proven to be highly effective for both classification and regression tasks. This method combines multiple randomized decision trees and averages their predictions, yielding excellent performance, especially in scenarios where the number of variables far exceeds the number of observations. The RF technique, implemented using the RF function, relies on two key parameters: nTrees (the number of trees in the forest) and mTry (the number of variables randomly sampled as candidates at each split). By hybridizing the automatic neural architecture search (NAS) algorithm with the RF method, these parameters are varied multiple times using different random seeds. The NAS algorithm then generates and evaluates a multitude of architectures, selecting the optimal one based on statistical indicators. This hybrid approach aims to fine-tune the model’s hyper parameters, ensuring improved accuracy and robustness in predictions. Table 2 illustrates the variability in the initial parameter settings for each method:

Table 2 Initial parameter settings for the algorithms

Traditionally, designing a neural network and random forest architecture manually tailored to a specific task could be tedious and time-consuming (Benbouras, 2022; Qi et al., 2021). However, with the introduction of neural architecture search (NAS), this step has been automated (Li & Talwalkar, 2019), significantly reducing the time required to find the best model architecture. NAS automates the search for optimal network hyper parameters, such as the number of hidden layers, the number of neurons per layer, nTrees, and mTry. This meticulous optimization allows for the creation of highly performing models capable of capturing subtle relationships between inputs and target variables.

Using NAS dramatically reduces the time required to set up a model manually, freeing up resources for other aspects of the study. The automated process of NAS not only expedites model development but also enhances the model’s performance by systematically exploring a vast space of potential architectures and parameter configurations. This approach ensures that the final model is optimally tuned for the specific characteristics of the dataset, resulting in more accurate and reliable predictions.

Statistical performance indicators

The accuracy of the proposed models predictions was assessed through various statistical performance metrics and visual representations. The statistical performance indicators employed in this study include mean absolute error (MAE), root mean square error (RMSE), index of scattering (IOS), coefficient of determination (R2), Pearson correlation coefficient (R), and index of agreement (IOA).

1. Mean absolute error (MAE):

$$MEA=\frac{1}{N}\sum_{i=1}^{N}\left|{Y}_{tar,i}-{Y}_{out,i}\right| \left(0<\text{MAE}<\infty \right)$$
(3)

2. Root mean square error (RMSE):

$$RMSE=\sqrt{\frac{1}{N}\sum_{i=1}^{N}{\left({Y}_{tar,i}-{Y}_{out,i}\right)}^{2}} \left(0<\text{RMSE}<\infty \right)$$
(4)

3. Index of scattering (IOS):

$$IOS=\frac{\sqrt{\frac{1}{N}\sum_{i=1}^{N}{\left({Y}_{tar,i}-{Y}_{out,i}\right)}^{2}}}{\overline{{Y }_{tar}}} \left(0<\text{RMSE}<\infty \right)$$
(5)

4. Coefficient of determination (R2):

$${R}^{2}=1-\frac{\sum_{i=1}^{N}{\left({Y}_{tar,i}-{Y}_{out,i}\right)}^{2}}{\sum_{i=1}^{N}{\left({Y}_{tar,i}-\overline{{Y }_{tar}}\right)}^{2}} \left(0<\text{NSE}<1\right)$$
(6)

5. Pearson correlation coefficient (R):

$$R=\frac{\sum_{i=1}^{N}(\left({Y}_{tar,i}-\overline{{Y }_{tar}}\right)\left({Y}_{out,i}-\overline{{Y }_{out}}\right))}{\sqrt{\sum_{i=1}^{N}{(\left({Y}_{tar,i}-\overline{{Y }_{tar}}\right)}^{2}{\left({Y}_{out,i}-\overline{{Y }_{out}}\right)}^{2})}}\left(-1<\text{R}<1\right)$$
(7)

6. Index of agreement (IOA):

$$IOA=1-\frac{\sum_{i=1}^{N}{\left({Y}_{tar,i}-{Y}_{out,i}\right)}^{2}}{\sum_{i=1}^{N}{\left(\sum_{i=1}^{N}\left|{Y}_{out,i}-\overline{{Y }_{tar}}\right|+\sum_{i=1}^{N}\left|{Y}_{tar,i}-\overline{{Y }_{tar}}\right|\right)}^{2}} \left(0<\text{IOA}<1\right)$$
(8)

where \({Y}_{tar,i}\), \({Y}_{out,i}\),\(\overline{{Y }_{tar}}\), and \(\overline{{Y }_{out}}\) characterize the soil size distribution diameters parameter values for N data samples. Furthermore, the proposed ML model achieved the minimum values for RMSE, IOS, and MAE, while displaying peak values for IOA, R2, and R, indicating optimal performance and close alignment with experimental values. Therefore, after selecting the optimal model using statistical performance indicators, its predictive ability was evaluated using a K-fold cross-validation approach. This sophisticated method demonstrates higher accuracy and resilience in evaluating the model’s capability to address overfitting and under-fitting issues in learning data (Benbouras et al., 2021). The approach involves dividing the database into equal partitions, with K-1 folds used for training and the final fold for validation. This process is repeated successively until all splits are used for validation. The main advantage of this approach is that all data is utilized in both training and validation steps (Oommen & Baise, 2010). In our study, a K-fold cross-validation with K = 5 was selected to evaluate the predictive ability of the optimal model.

Methodology

To define the optimal model for predicting the modified compaction parameters (MDD and OMC) and the CBR, the study followed these steps:

  • Data collection: An international database consisting of 90 samples was collected through sieve analysis tests to identify soil distribution sizes.

  • Model development: The chosen inputs were modeled using several ML methods, including ANN, RF, NAS-ANN, and NAS-RF. These methods were employed to build five distinct models.

  • Model evaluation: The optimal model for predicting modified compaction parameters and the CBR was determined by comparing the proposed ML models using key statistical indicators such as MAE, RMSE, IOS, R2, R, and IOA.

  • Cross-validation: A K-fold cross-validation approach (with K = 5) was utilized to evaluate the predictive ability of the optimal model, addressing potential under-fitting and overfitting issues.

  • Interface development: The optimal model was used to create a reliable and user-friendly graphical interface named “ComParaCBR2024.”

The research methodology for identifying the best model to predict modified compaction parameters and the CBR is systematically illustrated in Fig. 2.

Fig. 2
figure 2

The research methodology for compaction parameters and CBR estimation

Results

Dataset compilation

In this research, a database of 90 soil samples was collected to study the correlation between soil size diameters of granular soil and its compaction characteristics, with the goal of predicting the CBR values based on these parameters. To ensure a reliable and precise study, 67 soil samples were used for training, and 23 soil samples were used for the validation step. Given the limitations of the database, the value of K was set to 5 for cross-validation.

The dataset was randomly selected and completely detached. Descriptive statistics of the dataset, calculated using SPSS software, are displayed in Table 3. These statistics include range, minimum, maximum, mean, standard deviation, variance, skewness, and kurtosis. The skewness values indicated that all parameters were evenly distributed. Additionally, the results showed that the dataset encompassed a wide variety of data. Therefore, the collected database is highly practical and suitable for ML applications, aiding in the development of new empirical equations and models, as well as evaluating the predictive accuracy of existing formulas.

Table 3 Descriptive statistics of the collected samples

relationship between compaction parameters, CBR and input parameters

To statistically predict the relationship between compaction characteristics CBR, and the input parameters, SPSS software was employed. The descriptive summary of the data distribution is shown in Fig. 3. The results revealed a positive correlation between the maximum dry density (MDD, Y1) and all other input parameters, indicating that an increase in soil diameters tends to increase the MDD values, consistent with real-world observations. In contrast, the optimum moisture content (OMC, Y2​) exhibited a negative correlation with all input parameters, meaning that larger soil diameters tend to decrease the OMC values, which aligns with practical findings. The California bearing ratio (CBR, Y3​) showed a positive correlation with all input parameters (see Table 4). Furthermore, the Pearson correlation coefficient (R) and its significance between compaction parameters and inputs are presented in Table 5. The results indicate that the significance is less than 0.05 for all input parameters, except for X6 with Y1​ and Y3​, suggesting that most correlations are statistically significant. All parameters such as MDD, OMC, and CBR show strong correlations, with the exception of X6​, which correlates poorly.

Fig. 3
figure 3

The Correlation matrix between the Compaction parameters and soil parameters

Table 4 Matrix of the correlation between the geotechnical parameters‏‏
Table 5 Performance indicators values of the AI models for forecasting the compaction parameters and CBR

These findings suggest that the relationships between these factors and input parameters may be complex and nonlinear. To accurately model this complex phenomenon, it is necessary to develop and apply advanced ML techniques.

Compaction parameters and CBR prediction through ML models

To choose the optimal ML model. the first stage is to select the optimal input parameters. which have a high impact on the desired values. The next step is to define the best ML methods. to the author knowledge six factors have been utilized according to the recommendation of the literature. After that in order to define the optimal ML model for estimation the compaction parameters and the CBR. six statistical indicators have been employed. The ML architecture adopted for each method includes six inputs (X1. X2. X3. X4. X5. and X6). And three output (Y1. Y2 and Y3). Results for each model’s performance of the selected input layer during the training and validation phases is illustrated in Table 5. Six performance measurements were employed to compare the proposed models in the sake of identifying the best one based on these criteria, such as: mean absolute error (MAE), root mean square error (RMSE), index of scattering (IOS), coefficient of determination (R2), Pearson correlation coefficient (R), and index of agreement (IOA). The data have been split into two portions. i.e. 75% for the training and 25% for the validation phases. The target values ​​were formulated via ML methods as depicted in Table 5. where the methods parameters have been fixed as shown in Table 3 and checked utilizing six performances measurements in order to define the best model. In the training phase, the various models produced the following values for MDD: MAE (0.2415–0.3425), RMSE (0.3454–0.4612), IOS (0.0175–0.0236), R (0.9220–0.9649), R2 (0.8501–0.9311), and IOA (0.9569–0.9760). For OMC: MAE (0.3698–0.6490), RMSE (0.5597–0.9075), IOS (0.0515–0.0829), R (0.9149–0.9714), R2 (0.8371–0.9437), and IOA (0.9552–0.9830), finally for CBR: MAE (1.0491–2.2702), RMSE (1.4961–2.9230), IOS (0.0681–0.1346), R (0.9834–0.9951), R2 (0.9670–0.9902), and IOA (0.9900–0.9975). Similarity in the validation phase the following values have been obtained: MDD:MAE (0.2312–0.4214), RMSE (0.3573–0.5295), IOS (0.0182–0.0269), R (0.9194–0.9713), R2 (0.8453–0.9434), and IOA (0.9574–0.9842), for OMC: MAE (0.4450–0.7715), RMSE (0.7734–1.1428), IOS (0.0698–0.1055), R (0.9141–0.9403), R2 (0.8356–0.8841), and IOA (0.9384–0.9647), lastly for CBR: MAE (1.0209–3.0934), RMSE (1.4597–3.5245), IOS (0.0552–0.1392), R (0.9802–0.9980), R2 (0.9721–0.9961), and IOA (0.9901–0.9989). Based on these findings, the best model for predicting compaction characteristics (MDD and OMC) was found to be NAS-RF, while the best model for predicting the CBR was NAS-ANN. These models demonstrated high performance and accuracy in predicting the aforementioned parameters in terms of compaction the best model was NAS-RF with the following values for MDD: MAE (0.2415/0.2312). RMSE (0.3454/0.3573). IOS (0.0175/0.0182). R (0.9649/0.9713). R2 (0.9311/0.9434). and IOA (0.9760/0.9842), and for OMC: MAE (0.3698/0.4450). RMSE (0.5597/0.7734). IOS (0.0515/0.0698). R (0.9714/0.9403). R2 (0.9437/0.8841). and IOA (0.9830/0.9647) during the training/validation process. Furthermore, the best model in predicting the CBR is NAS-ANN with the following values: MAE (1.0491/1.0209). RMSE (1.4961/1.4597). IOS (0.0681/0.0552). R (0.9951/0.9980). R2 (0.9902/0.9961). and IOA (0.9975/0.9989) during the training / validation phases. It can be concluded that NAS-RF is the most appropriate model for predicting compaction characteristics, and NAS-ANN is the most suitable for predicting CBR. These models exhibit high performance in both training and validation phases. Conversely, the ANN model demonstrated poor performance in predicting compaction parameters and CBR during the training phase. The hierarchy of performance in the training phase is as follows:

  • For compaction parameters: NAS-RF, NAS-ANN, RF, ANN

  • For CBR prediction: NAS-ANN, NAS-RF, RF, ANN

Finally, the scatter plots comparing the target and output compaction parameters and CBR for each model are presented in Appendix 1. Figures 6 through 16.

Evaluating the best fitted model using K‑fold cross‑validation approach

The fivefold cross-validation approach was employed to effectively evaluate the predictive capacity of the optimized model. Notably, previous studies focusing on forecasting compaction characteristics and CBR often assessed their best models using only a single data split. In contrast, this study utilized five data splits to validate and select the best model, addressing a key limitation of single-split evaluations. This approach provides a more robust assessment of model performance, allowing for a better understanding of its effectiveness and accuracy.

The use of fivefold cross-validation, which is a significant contribution of this study, offers a more comprehensive evaluation compared to single-split methods, which restrict the verification of model capabilities. By employing this method, the study effectively addresses issues related to overfitting and under-fitting, which could not be thoroughly examined in previous research.

Figure 4 illustrates the performance measurements of the optimal ML models based on the fivefold cross-validation approach, showing validation data for each split. The findings clearly demonstrate the effectiveness of the optimal ML models. Moreover, the coefficient of correlation for these models underscores their performance and reliability.

Fig. 4
figure 4

Performance measures of the NAS-RF, NAS-ANN models using the K-fold cross-validation with K = 5

The predictive ability of the optimal model was demonstrated through data validation in five splits, with R values ranging from 0.9902 to 0.9990 for CBR, 0.9350 to 0.9565 for MDD, and 0.9250 to 0.9460 for OMC. The NAS-RF model was used for compaction parameters, while the NAS-ANN model was employed for the CBR. These models were effective in learning the existing data and generating new validation data, addressing issues of overfitting and under-fitting.

Comparison between the proposed models and empirical formulae

To assess the capability and effectiveness of the proposed ML model, a comparative investigation was conducted using eight empirical models from the current state of research on forecasting compaction characteristics, as illustrated in Table 6. This comparison was based on the correlation coefficient of the proposed model. It is important to note that the correlation coefficient is a crucial indicator of forecasting precision, with an optimal model yielding a prediction value close to 1.

Table 6 A Comparison between our NAS-RF, NAS-ANN models and some of the proposed empirical models in the literature

The findings revealed that the proposed models, NAS-RF and NAS-ANN, outperformed the others, demonstrating the highest accuracy with R values of 0.9649, 0.9714 and 0.9980, respectively. The high accuracy and effectiveness of these models are attributed to the integration of Neural Architecture Search (NAS) with RF and ANN, which enhances prediction accuracy. The NAS-ANN model showed significant improvement in accuracy and ranked first in performance, highlighting the advantage of combining NAS with ANN and RF for more precise predictions.

Graphical user interface (GUI) design “ComParaCBR2024”

A common practice in much of the published research involves using ML modeling to present models in the form of mathematical equations. However, this approach often proves difficult to implement in future studies and offers limited practical value for researchers and civil engineers in the field. To enhance usability, the proposed ML architecture should be presented in a more accessible format, such as a programmed interface using MATLAB or a simple script in a programming language like Python.

In this paper, the graphical interface was developed based on the optimal values obtained from the best model, as illustrated in Fig. 5. The proposed model was employed to create a public graphical user interface (GUI) named “ComParaCBR2024,” which was developed using MATLAB. The name “ComParaCBR2024” was chosen to reflect its components: “Com” for compaction, “Para” for parameters, “CBR” for California Bearing Ratio, and “2024” for the year the work was conducted.

Fig. 5
figure 5

ComParaCBR2024 interface

The “ComParaCBR2024” interface includes relevant input fields for compaction parameters and CBR predictions. Users are required to enter soil size parameters (D10, D30, D50, D60), coefficient of curvature (Cc), and coefficient of uniformity (Cu). By clicking on “Run,” the interface calculates and displays the predicted values for OMC, MDD, and CBR.

This interface is highly beneficial for civil engineers and researchers, as it simplifies the prediction of compaction parameters and CBR, which are typically challenging to determine. The “ComParaCBR2024” tool provides an accessible and practical solution for modeling and prediction in this field.

Discussion

In the present paper, a significant contribution is made to geotechnical engineering, particularly in the field of transportation geotechnics, by improving models for compaction parameters and the CBR. It is crucial to note that the quality of the proposed model is influenced by the employed method. Other advanced machine learning methods, which have shown effective results in various fields, were not previously applied to this area.

This study utilized four machine learning methods such as artificial neural network (ANN), Random forest (RF), Neural architecture search-artificial neural network (NAS-ANN), and Neural architecture search-random forest (NAS-RF) in order to forecast compaction parameters and CBR. A review of the literature reveals that the application of these methods is relatively rare. The research began with the collection of a comprehensive database of 90 soil samples for soil size distribution tests. Six factors were selected based on literature recommendations: 10% passing diameter (D10), 30% passing diameter (D30), 50% passing diameter (D50), 60% passing diameter (D60), coefficients of uniformity (Cu), and coefficient of curvature (Cc).

For the first time, these four machine learning methods were applied to model the chosen input set. The findings indicate that the NAS-RF model is optimal for predicting compaction parameters in terms of accuracy and effectiveness, while NAS-ANN is optimal for predicting the CBR. These models showed the minimum error values (MAE, RMSE, and IOS) and the highest values for R2, R, and IOA compared to other models.

Additionally, the novel models were evaluated using the K-fold cross-validation method and compared with other proposed models in the literature based on their correlation coefficients. This new approach demonstrated superior precision by avoiding overfitting and under-fitting issues compared to other empirical methods.

The optimal model was then used to develop a user-friendly graphical interface named “ComParaCBR2024,” programmed using MATLAB. This interface provides a reliable and easy to use tool for researchers and civil engineers, regardless of their experience level. It offers several benefits, including reliability, ease of use, and cost savings by predicting compaction parameters and CBR from easily obtainable parameters without the need for expensive laboratory tests.

The findings indicate that the performance of the models for compaction parameters and the California Bearing Ratio has been significantly enhanced using these novel machine learning methods.

The predicted model of NAS-ANN has shown improvement by 2.59% comparing to the model proposed by Taha et al., 2019.

The clarifications of results are very logic due to the high accuracy, which indicating a reduction in bias and variance. The chosen hybrid models of NAS-RF and NAS-ANN have effectively utilized in terms of prediction and classification aiding in overcoming the overfitting and the under-fitting, which minimize the errors and augment the correlation coefficient near to 1.

These heuristic algorithms have shown high performance in the obtained results when combined with machine learning (Kaveh & Dadras, 2017; Kaveh, 2017a, 2017b).

Conclusion

An attempt has been made to investigate a novel sophisticated model to predict the compaction characteristics and CBR values using the machine learning approach. In this study four methods have been employed in order to do reliable study (ANN, RF, NAS-ANN and NAS-RF). The main influential factors of the grain size distribution (D10, D30, D50, D60, Cc and Cu) have been introduced as the key factors to get the exact prediction of these values. The dataset involved 90 values of these parameters obtained experimentally of granular soil samples and the efficacy of all the ML methods have been evaluated using various statistical indicators.

Based on the findings, the following conclusion can be drawn:

  • This study is the first research to predict the three parameters automatically in the same time using sophisticated algorithm. which diffuse different method in one model permitting to discover the hidden layers with R value more than 0.99 for the CBR.

  • The designed interface. which is called “ComParaCBR2024.” will make easy and useful for engineers and practitioners in the field of civil and geotechnical engineering.

  • The optimal model for predicting compaction parameters (OMC and MDD) was identified as NAS-RF, with correlation values of RMDD = 0.9649 and ROMC = 0.9714. For CBR prediction, the optimal model was NAS-ANN, achieving a correlation value of RCBR = 0.9951. These models demonstrated high accuracy compared to existing studies.

  • forecasting compaction characteristics and CBR values by this manner to stop time and financial resources consuming.

  • Various validation methods have been included in order do reliable study

  • This study highlights the use of the meta heuristic algorithms combined with the machine learning methods. which have approved an enhancement in the performance of the predictive models of the compaction parameters and the California bearing ratio CBR.