Abstract
Ensuring accurate classification of rice as either cooked or dry is vital for food safety, as improperly stored or cooked rice contain harmful bacteria, emphasizing the importance of maintaining and monitoring food safety standards. In the field of image analysis and food classification, classifying dry and cooked rice samples using photographs is an interesting but difficult task. The main challenge stems from the minute visual variations between cooked and dry rice, which has not always displayed distinct traits that are easily observable by machines. Hence, various machine learning algorithms were implemented to effectively mitigate this issue. However, the existing works have not analysed the physicochemical characteristics due to non-destructive type of experimentation method with image processing. To overcome this issue, this work develops the Classification and Regression Tree (CART) of Decision Tree Learning method for classifying the rice grain samples as dry or cooked based on the physicochemical characteristics such as morphological, texture and color features, which in turn gain an exhaustive facts of rice quality in diverse state. Initially, the images are captured, pre-processed and the features are extracted. From the extracted features, the rice samples are classified as dry and cooked using DT and the results are compared with the existing algorithms like K-Nearest Neighbour (KNN) and Support Vector Machine (SVM). The comparative analysis of these classification algorithms infers the outperformance of the DT learning model under morphological, texture and color features in terms of accuracy, error, precision, recall and F-score.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Rice, which is a staple cereal grain that feeds billions of people worldwide is an essential part of the world's diet [1, 2]. With a wide range of nutritional advantages, this adaptable grain is a mainstay in many cultures and cuisines. The three primary parts of rice are the endosperm, germ, and bran. The outer layer, or bran, is full of fibre, antioxidants, and vital nutrients. While the starchy inner portion, called the endosperm, offers energy in the form of carbs, the germ is rich in vitamins, minerals, and good fats. In food analysis, rice's nutritional makeup—which includes its protein, carbs, fats, vitamins, and minerals is evaluated [3]. The difference between dry and cooked rice is significant in both the food industry and scientific study. In the realm of food safety, accurately identifying whether rice is dry or cooked is essential to avoid circulating tainted or deteriorated goods, thus protecting public health. Additionally, in the field of scientific research, distinguishing dry rice from cooked rice is vital. It allows for an in-depth analysis of the rice’s nutritional value, alterations in texture, and taste. This knowledge is instrumental in enhancing rice processing methods and comprehending consumer preferences.
In the domain of food science and agricultural technology, distinguishing between dry and cooked rice is vital. This classification is crucial for accurately assessing nutritional content, ensuring compliance with regulatory standards, and guaranteeing consumer satisfaction. Analysis of the physicochemical characteristics of rice, encompassing various parameters in the field of food science, offers insights into the composition, structure, and behavior of food products. These analyses include texture, color, and morphological features. Texture, denoting the physical properties of food perceived by the senses, plays a crucial role. Color significantly influences consumer perception, acceptability, and overall quality. Additionally, morphological features encompass the physical structure, shape, size, and surface characteristics of food components, such as grains, particles, fibers, and cells.
The identification of dry and cooked rice is vital for the precise evaluation of its nutritional profile. Following the cooking process, rice undergoes substantial alterations in nutrient availability and compositional makeup, thereby directly influencing dietary guidelines and nutritional declarations. This process accepts critical significance in guaranteed food integrity, safety standards adherence, consumer contentment, and regulatory compliance within the scope of agricultural and food science.
Classifying dry and cooked rice presents significant challenges due to their similar appearance, texture variability, moisture content differences, processing variability, sensor limitations, and data variability. Distinguishing between the two states is particularly challenging in large-scale processing facilities where manual inspection is impractical. Factors such as rice variety, cooking method, and storage conditions contribute to the complexity of texture and appearance variations. Addressing these challenges is crucial for ensuring accurate rice grain quality analysis and improving classification accuracy in food and science applications.
Image processing is a subfield of signal processing is concerned with enhancing, interpreting, or extracting information from visual input through the editing and analysis of digital images [4, 5]. Image processing techniques include segmentation, which divides an image into meaningful parts for study, and filtering, which modifies pixel values to accentuate specific qualities or remove undesired aspects [6]. When it comes to rice classification, image processing is essential, especially in the agricultural sector where it helps with precise rice grain classification and evaluation is essential for ensuring food safety and compliance with regulatory standards helps in implementing appropriate quality control measures and ensuring compliance with food safety regulations. [7, 8]. One interesting application in the field of food science and quality evaluation is the use of image analysis to classify samples of cooked and dry rice. By collecting unique visual traits associated with each condition, image processing techniques allow rice to be distinguished between its cooked and dry states. Features including size, shape, colour, and texture are taken out of photos during the analysis of dry rice in order to establish a reference dataset for dry grains. However, cooked rice exhibits different characteristics such as enlarged size, altered texture, and altered colour intensity; these are all identified and compared with the reference dataset in order to correctly categorise the samples. To guarantee uniform lighting and placement, the procedure includes taking pictures of cooked and dry rice samples in controlled settings. Classification models can then be created by using image processing methods to extract pertinent characteristics and patterns that are specific to each condition [9, 10]. When fresh samples are provided, these models—which are frequently constructed using machine learning or deep learning techniques—can accurately classify rice by learning to distinguish between the visual cues unique to cooked and dry rice.
In the past, manual inspection and evaluation of dry and cooked rice samples based on observable qualities like texture, appearance, and physical features was required for classification. By visually examining the grains, experts or trained persons would rely on their expertise and knowledge to distinguish between the two states of rice. The characteristics such as colour, texture, size, and translucency were evaluated to identify if the rice was cooked or dry [11]. Another conventional approach was to distinguish between cooked and dry rice by applying straightforward threshold-based algorithms to fundamental image characteristics like colour intensity or pixel values. However, due to their limitations in capturing the nuanced and intricate changes that arise as rice goes from a dry to a cooked condition, these approaches frequently lacked the sophistication and precision necessary for thorough classification.
Traditional machine learning models use features that are derived from images to classify data. Examples of these models are Support Vector Machines (SVM) [12], Random Forests, and k-Nearest Neighbours (k-NN) [13]. These models demand manual feature engineering, which entails the identification and extraction of pertinent features in advance, such as colour histograms, texture descriptors, or form properties. Although these models work well in situations with clearly defined features, they may not be able to adjust to more intricate or subtle variations in the appearance of rice between the dry and cooked states. Their accuracy in more sophisticated classifications may be limited by the handcrafted features' quality and relevancy, which has a significant impact on their performance. Conversely, without explicit feature engineering, deep learning models—in particular, Convolutional Neural Networks [14, 15], or CNNs—have demonstrated great potential in image-based categorization tasks. From raw pixel data, CNNs may automatically build hierarchical representations of features, possibly capturing complex visual patterns essential for distinguishing between samples of cooked and dried rice. However, obtaining vast and varied datasets for rice classification can be difficult because deep learning models frequently need to be trained on significant volumes of labelled data and computer power. Furthermore, a problem with deep learning techniques is still overfitting, which occurs when a model performs well on training data but badly on unseen samples, particularly when working with small amounts of training data. Hence, a new technique is needed with clear classification without any computational or overfitting problems for classifying the dry and cooked rice samples. The main contribution of this work is as follows:
-
To classify the dry and cooked rice kernels using CART based DT learning model based on the morphological, texture and color properties of the rice grain.
-
To analyse the classified outcome with gradation, for achieving better performance with average accuracy metrics and ROC curve.
In this paper, Sect. "Literature survey" presents the literature survey, Sect. "Material and methods" explains the CART based DT learning model. Sect. "Result and discussion" converses the results and discussions and Sect. "Conclusion" concludes the paper.
Literature survey
Izquierdo et al. [16] designed and validated a deep learning-based system to classify five distinct varieties of rice (Oryza sativa L.), which were photographed using a standard photographic camera. Convolutional neural networks (CNNs), which have been trained and optimized to detect various types of rice, processed the generated photographs. The model was effectively verified with pictures that had originally been separated from the training set. However, these work shortfalls in accuracy because of the smaller number of samples for training CNN.
Chun et al. [17] aimed to train a classification model for Korean food images, which gathered images from the AI-Hub platform, a public food image collection, in order to train the classification model. A dataset of Korean food photos was used in the experiment to assess the suggested model, which successfully used transfer learning to classify the photographs. Future research is necessary to enhance a classification model's performance, particularly in relation to its subpar performance for specific food image kinds.
Shen et al. [18] examined fissure formation in kernels brought on by microwave drying in light of cooking characteristics, microstructure, and textural features in an effort to enhance the cooking quality of (Germinated Brown Rice) GBR. The microwave-induced cracks in GBR provide the ideal pathways for water to seep into the GBR kernel. Because of the moderate water absorption and starch gelatinization, an appropriate fissure amount in the range of 3–5 in GBR kernels was helpful in increasing its cooking quality and rice taste. This work needs further improvement in evaluating the drying quality of other cereal materials.
Mohan et al. [19] presented an automated system that uses digital image processing techniques to identify and classify rice grains. MATLAB was used to preprocess, segment, and extract features from the collected images. Rice quality was evaluated using Neural Networks (NN) and Support Vector Machine (SVM) classification methods based on the collected features. SVMs work well for classification tasks, however, it is difficult to understand how the model makes decisions, particularly when dealing with complicated, non-linear kernels.
Maldaner et al. [20] investigated the combined consequences of sporadic drying based on the number of cycles and duration of the grains' exposure to elevated temperatures within the dryer, and its associations with the physical, physicochemical, and morphological attributes of rice, particularly brown rice. To reduce the impact of the procedure on grain quality when drying rice, it's critical to regulate the air-drying temperature and the interval time. The categorization of the treatments and the qualitative variable correlations suggest that the most important factor influencing rice quality was the length of the drying period. More studies are needed to learn the physicochemical quality and morphological structure of rice and brown rice.
Liu et al. [21] presented a framework that integrates experimentation, modeling, and optimization methodologies to comprehend the effects of water and fertilizer on rice productivity and grain quality. A thorough quality model of rice grain was built by using the Principal Components Analysis method to determine the key features. Based on this, a fuzzy goal programming method was used to solve a multi-objective quadratic model that supports the utilization of fertilizer and water in eight different scenarios. The resultant outcome was the same as the previous studies, which was its main drawback.
Kobayashi et al. [22] examined the endoreduplication progression in ten cultivars of rice endosperm using flow cytometry and fluorescence microscopy to clarify the characteristics and variances of this process. The mean ploidy of all nuclei, the proportion of nuclei ≥ 6C, and the mean ploidy of nuclei ≥ 6C were the three parameters in which the flow cytometric analysis found significant differences among the ten cultivars. The modification is needed in the endoreduplication of endosperm cells to improve the rice quality.
Cinar et al. [23] performed the classification procedures under five distinct rice varieties that are all part of the same trademark based on characteristics related to morphology, shape, and color. MATLAB software was used to preprocess the images and get them ready for feature extraction. Future work will be exploited by using machine learning techniques to create an automated image-taking system to identify different rice varieties, or it carry out calibration procedures or the separation of undesired elements from varieties.
Shi et al. [24] introduced a feature reduction approach to determine the origin of rice, which reduces the dimension of electronic nose (e-nose) sensor data when combined with multiclassifiers. This study demonstrates that KECA-GDA-RF is a useful technique for rice origin tracing. Additionally, it offers a helpful processing method to raise an e-nose's measurement accuracy. The rice grains from different sites have small changes that the program finds difficult to detect.
Sampaio et al. [25] explained the biggest challenge faced by rice growers was the development of quick and non-destructive techniques for classifying different types of rice. Following various spectra processing steps NIR spectroscopy, associated with PLS-DA and SVM techniques, effectively discriminates rice samples, proving a suitable strategy for automated classification, sorting, and food control, enhancing product security. Future research is needed to create the rice database and produce in-situ for real-time classification of the rice’s origin and types.
The above methods have stated that [16] shortfalls in accuracy because of the smaller number of samples for training CNN. For [17], future research is necessary to enhance a classification model's performance, particularly in relation to its subpar performance for specific food image kinds. [18] needs further improvement in evaluating the drying quality of other cereal materials. [19] uses SVM with the difficulty in understanding how the model makes decisions, particularly when dealing with complicated, non-linear kernels. [20] needs more studies to learn the physicochemical quality and morphological structure of rice and brown rice. [21] comes with the drawback of the same resultant outcome as the previous studies. [22] needs modification in the endoreduplication of endosperm cells to improve the rice quality. For [23], future work will be exploited by using machine learning techniques to create an automated image-taking system to identify different rice varieties. For [24], rice grains from different sites can have small changes that the program finds difficult to detect. Future research is needed for [25] to create the rice database and produce in-situ for real-time classification of the rice’s origin and types.
Material and methods
The study utilized the Classification and Regression Tree (CART) Model of Decision Tree (DT) to accurately assess physicochemical characteristics of rice samples in both dry and cooked states. The researcher collected a dataset of rice samples from diverse rice varieties, grades, and quality levels. Physicochemical characteristics such as texture, color, and morphological properties were assessed using image processing techniques. The CART Model of Decision Tree was developed for classification tasks, partitioning feature space based on input features. The developed DT model was trained using the preprocessed dataset, and its performance was evaluated using metrics like accuracy, precision, recall, and F1-score. This provides insights into improving rice classification and quality assessment methods.
Experimental setup
To begin the experiment, high-quality digital dual cameras (13 megapixels + 2 megapixels) with a resolution of 720 × 1520 pixels were used to take pictures of the rice kernels when they were dry. All of the images have been taken with the camera at a constant distance of 14 cm from the samples, and with appropriate lighting. The second step involves placing rice samples in a container and letting them absorb water for 30 min at room temperature (22–25 °C). Afterwards, samples are heated in excess water until the kernels elongate to the maximum extent possible, depending on the sample's gelatinization properties. The parboiled and non-parboiled samples have been heated for 50 min and 20 min respectively. Then the kernels have been separated from hot water and preserved for 30 min for retrogradation. Images of the finally cooked grain have been taken for all the variety of rice samples. To gauge the rice's quality, 700 photos of both the cooked and dry samples of each type of rice have been taken.
Cart based DT learning model
Analyzing the quality of rice grains, whether in their cooked or dry states, is crucial in the realm of food and science. Rice grains claim a diverse composition, encompassing carbohydrates, proteins, lipids, vitamins, minerals, and various bioactive compounds. With thousands of rice varieties cultivated worldwide, each exhibiting unique genetic, morphological, and chemical characteristics, assessing rice quality becomes complicated. In the food and science field, examining rice quality needs to explore physicochemical characteristics such as texture, color, and moisture content. These parameters serve as fundamental indicators for categorizing rice samples into cooked and dried categories. However, the challenge lies in accurately capturing and understanding these physicochemical attributes through non-destructive testing methods, such as image processing. While these methods offer the advantage of preserving the integrity of the rice grains, ensuring that they remain unaltered or undamaged, achieving precise and reliable results demands sophisticated analytical techniques and rigorous validation processes. Previous research also frequently concentrated on a single feature or utilised less complex categorization techniques, thus it lacked a thorough examination of several characteristics in both the dry and cooked phases. Hence, to overcome the existing issues, this work focus on presenting Classification and Regression Tree (CART) Model of Decision Tree (DT) to analyse and classify the dry and cooked rice samples based on the physicochemical characteristics such as Texture, Color and Morphological properties. The classification process of both dry and cooked conditions is made using a DT Learning Model to obtain a thorough knowledge of rice quality in various states, through the assessment of these various physicochemical factors.
Figure 1 depicts the overall structure of the proposed work for classifying the dry and cooked rice samples using DT Learning Model. The input image of dry and cooked rice samples is initially captured and portioned for training phase with 80% and testing phase with 20%. Then, the input image underwent pre-processing, following by feature extraction, where the morphological, texture and color features are extracted. Then the extracted features are subjected to classification, where DT learning model CART is applied, which effectively classifies the images as dry and cooked rice with their gradation.
Image acquisition
Eleven number of Indian rice varieties such as Basmati, IG-Basmati, Baskathi, IR-36, Ratna, Chamonmoni, Sarna, and Maouri in parboiled category; Gobindavog, Atap, and New Atap in the non-parboiled category have been used for training of the proposed model. As per the customer choice and market value Basmati, IG-Basmati, Baskathi and Gobindavog are in Grade I; IR 36, Ratna, Chamonmoni and New Atap are in Grade II; and Swarna, Maouri and Atap are in Grade III as shown in the Table 1. Ten such frames total was collected, with about 70 samples of each variety fitting within each image frame. In order to gauge the quality of the rice, each variety has been captured 70 * 10 = 700 times. In the current study, 700 dry samples of each type that were unrelated to one another along with 700 cooked samples were taken.
Table 1 serves a comprehensive inventory of rice categories, grades, and associated features, providing a basis for further analysis, classification, or comparison of different rice types based on their market prices and inherent characteristics. Such categorization and feature analysis are crucial in the rice industry for quality assessment, market segmentation, and product differentiation. For training and testing, sample pictures of rice grains were taken and saved. The total number of samples across all categories is provided, summing up to 77,000 samples, each characterized by a set of features including morphological, texture, and color attributes, totaling 19 features per sample. The captured image is then offered for pre-processing, which is given in detail in next section.
Parboiled rice class
Figure 2 shows the different rice samples taken for the experiment to find the different features like Morphological features, Texture features and color features of parboiled and non-parboiled rice class based on different grades and inherent characteristics.
Image pre-processing
The initial step in classifying the dry and cooked rice sample is image pre-processing, which involves resizing, adjusting the contrast, converting a picture to grayscale, and then back to black and white. Through the use of these techniques, undesirable distortions, noise, and blur are reduced from the image, producing a higher-quality image. This technique further reduced the complexity of the model and increases the classification accuracy before training the DT learning model.
Feature extraction
The pre-processed image is extracted as features which determine an image's behaviour, thus influences how much storage space is used, how well an image is classified, and time consumption. Feature extraction provide valuable information for understanding the quality attributes of rice grains and optimizing processing, storage, and consumption practices in the food and science fields. The features in dry and cooked rice samples are determined based on the morphology, texture and color of each sample. The morphological features include the area (number of pixels within the rice image sample), perimeter (shape of the rice grain), and irregularity index (boundaries of irregular shape). Extracting and analyzing these morphological features provide valuable insights into the quality and state of rice grains. The image texture is described based on diverse features such as Gray-Level Co-occurrence Matrix (GLCM), Local Binary Pattern (LBP), and Gabor filter, which is explained as follows.
GLCM features
GLCM measures the relationship between two neighbouring pixels (second-order texture) in a grayscale image in terms of the spatial distance and tone or grayscale intensity. The Eqs. (1–6) are the GLCM based features.
where, the row and columns are respectively given as \(a\) and \(b\), the mean value of \(a\) and \(b\) is \({\mu }_{a}\) and \({\mu }_{b}\), \({\sigma }_{a}\) and \({\sigma }_{b}\) are the standard deviation of \(a\) and \(b\). The color features include Color moment and color histogram.
Color moments
It is the scales that distinguish images with regards of their own color feature and is explained in Eq. (7–9).
where, the pixel count in image is \(W\), value of \({b}^{th}\) image pixel at \({a}^{th}\) color channel is \({P}_{ab}\), and the mean value is \({H}_{a}\).
Color histogram
An image's colour distribution is depicted by a colour histogram. It counts the quantity of pixels inside predetermined bins that have a given colour value. This technique yields details about the colours that predominate in a picture.
Once the morphological, texture and color features are extracted in the feature extraction phase, it is then subjected to classification.
Classification using CART of DT learning model
In the realm of food and science, after extracting relevant features from rice grain samples, the subsequent step involves classification using the CART algorithm within the framework of Decision Tree (DT) learning. This method adopts a binary recursive partitioning strategy tailored to the specific requirements of food and science applications. The three stages involved in CART classification are: constructing a classification tree using the recursive node splitting construction technique that divides the dataset based on extracted features pertinent to rice grain quality; trimming the trees to refine their structure, by removing superfluous branches and nodes to enhance interpretability and manageability and create a more manageable classification tree series; and identifying the best classification tree that achieves optimal performance in accurately distinguishing between dry and cooked rice samples.
The root node is given the main features that taken for analysis (morphology, texture and color) that provides the best separation of dry and cooked rice samples. Then the CART identifies the threshold value for each features to separate the sample into two groups (dry and cooked), and is done in splitting stage, which is the first step in the development of the classification tree. Based on splitting rules and the quality of the split criteria in light of the variability of the split samples, the training data is divided. Compared to the parent node, the split sample needs to be more homogeneous. The node’s heterogeneity is measured based on impurity \(m(t)\). The Gini index function is given in Eq. (10).
where, Gini index (heterogeneity function) at node \(t\) is \(m(t)\), the proportion of class 0 at node \(t\) is \(p\left({e}_{0}|t\right)\), and the proportion of class 1 at node \(t\) is \(p\left({e}_{1}|t\right)\). Equation (11) provides the split goodness.
where, split goodness is \(\varnothing \left(l,t\right)\), the right node observation’s proportion on the left and right sides is \({p}_{LT}\) and \({p}_{RT}\), the heterogeneity functions at the right and left nodes are \(m\left({t}_{LT}\right)\) and \(m\left({t}_{RT}\right)\). The one with highest split value goodness is considered as best split. In addition, the terminal node is identified. When there is only one observation at each child node or when there is no discernible decline in heterogeneity, a node \(t\) becomes a terminal node. Class labels are marked on the terminal nodes according to the highest number, determined by Eq. (12).
where, \({e}_{x}\) is the terminal node \(t\) class label that provides the predictable value of classifying errors at the smallest node \(t\), i.e. \(m\left(t\right)=1-{\text{max}}p\left({e}_{x}|t\right)\).
The branching phase is the next phase, where the rice samples characteristics that’s below the threshold is assigned to the left node (dry rice) and the rice samples characteristics that’s above or equal to the threshold is assigned to the right node (cooked rice). This process will continue at every next consequent step, where CART selects the next best feature to further split the samples until certain stopping criteria are met.
Pruning is done after the classification tree is formed to prevent overfitting. To prune a tree, one must ascertain the least cost of complexity, which may be computed using Eq. (13).
where, the complexity of a tree \(T\) on complexity \(p\) is \({D}_{p}\left(T\right)\), the misclassification rate or tree resubstitution estimate of \(T\) trees is \(D\left(T\right)\), the complexity cost parameter for adding a terminal node to the \(T\) tree is \(p\), the terminal node count in the \(T\) tree is \(\widetilde{T}\). The estimator replacement is the final phase, where the applied procedure is by distributing the S samples into \({S}_{1}\) with sample count \({W}_{1}\) and testing \({S}_{2}\) with \({W}_{2}\) sample count. \(T\) trees are formed using observations in \({S}_{1}\) and \(D\left(T\right)\) is estimated by using the observations in \({S}_{2}\). Figure 3 illustrates the block diagram of CART in DT learning model.
Hence, the series of choices taken to determine whether a rice sample is cooked or dry based on its attributes would be graphically represented by the decision tree structure. Based on this process of decision tree learning model, the performance is greatly increased in precisely classifying the rice samples as both cooked and dry conditions.
On the whole, the captured image is pre-processed and then the GLCM and color moments features are extracted along with morphological features. Then the extracted features underwent classification using CART based DT model.
Overall, CART of DT Learning Model in the food and science fields involves selecting relevant features, splitting the dataset based on feature thresholds, identifying terminal nodes, branching the tree recursively, trimming to prevent overfitting, and evaluating predictive accuracy on new data. This approach facilitates the analysis of rice grain quality and classification into dry and cooked categories, aiding in various applications such as quality control, food safety, and consumer preferences. The performance of the overall process is evaluated in the next section.
Result and discussion
This section explains a thorough analysis on the performance of DT learning model in classifying the dry and cooked rice samples. The implementation was simulated using Python, and the performance and comparative analysis is evaluated to ensure the successfulness of proposed DT model.
Confusion matrix for performance measurement
Experimental results obtained in terms of confusion matrix for rice classification has been compared with respect to KNN, SVM, and the proposed Decision Tree (DT). For 700 dry rice samples and 700 cooked rice samples, confusion matrices have been derived separately in the present study employing 6 morphological, 5 textural, and 8 color features from, as shown in Tables 2, 3, and 4 accordingly.
From Table 2, proposed method demonstrates superiority over KNN and SVM, in accurately classifying rice grains into different grades across both dry and cooked phases. Unlike KNN and SVM, DT partitions the feature space based on informative attributes, enabling decision-making. Using a rich feature set comprising morphological, textural, and color features enhances the model's discriminatory power. DT's interpretability aids in validating feature relevance and refining the model. Its scalability and efficiency make it well-suited for large datasets, resulting in faster training and prediction times. Overall, DT's robust feature representation, interpretability, and computational efficiency contribute to its superiority over KNN and SVM in rice classification.
From Table 3, proposed method outperforms KNN and SVM. In Grade-I classification, the proposed method achieves consistently higher counts of correct predictions and lower counts of misclassifications in the dry and cooked phases compared to KNN and SVM. The decision tree algorithm facilitates complex relationship capture and interpretable decision-making, surpassing KNN and SVM. Furthermore, potential fine-tuning for rice classification optimizes its performance, accommodating dataset intricacies effectively. This comprehensive approach underscores the significance of advanced techniques and feature incorporation in classification tasks. By integrating diverse features and employing sophisticated algorithms, the proposed model accurately classifies rice grains across various grades and phases, outperforming traditional methods and advancing classification standards in rice analysis.
In Table 4 the proposed algorithm shows competitive performance compared to both KNN and SVM. Across all phases and grades, the proposed algorithm tends to achieve higher counts in the predicted classes, indicating its effectiveness in classifying instances. DT model outperformed traditional methods like KNN and SVM in rice classification due to several key factors. Firstly, a comprehensive feature selection process incorporating morphological, textural, and color features enhanced the model's ability to capture the characteristics of rice samples. The DT model's decision-making process is better equipped to distinct complex patterns, leading to more efficient classification. Optimization techniques or algorithmic enhancements to rice classification further improved its performance. The robustness and interpretability of decision trees make them less prone to overfitting, making them a more effective method for rice classification.
Comparison of accuracy in gradation between dry and cooked rice samples
In this section, the accuracy of classification has been compared between dry and cooked samples of all three grades classified by DT (Proposed), KNN and SVM methods.
Figure 4 explains the comparison of accuracy in grading between dry and cooked rice grains based on morphology features. When comparing the accuracy of grade-I, grade-II, and grade-III classification of dry rice, both KNN and SVM achieve higher accuracy rates compared to cooked rice. KNN achieves accuracies of 90.45%, 89.35%, and 91.78% for dry rice grades I, II, and III, respectively, while SVM achieves accuracies of 91.23%, 91.65%, and 93.63% for the same grades. In contrast, the accuracy rates for cooked rice grades I, II, and III are slightly lower for both KNN and SVM, with accuracies ranging from 87.69% to 90.42% for KNN and from 88.32% to 92.31% for SVM. Moreover, the proposed decision tree (DT) model demonstrates superior accuracy in classifying dry rice compared to cooked rice. For dry rice, the DT model achieves accuracies of 97.55%, 96.68%, and 98.12% for grades I, II, and III, respectively, surpassing the accuracy rates for cooked rice, which range from 93.34% to 95.92% across the same grades. From this, it is clear that, the accuracy of classification is better in dry stage than that in the cooked stage using morphology feature.
Figure 5 demonstrates a notable increase in accuracy when classifying dry rice compared to cooked rice using texture features, owing to several key factors. Specifically, when comparing the accuracy of KNN and SVM classifiers, KNN achieves accuracies of 91.53%, 88.95%, and 91.82% for grade-I, grade-II, and grade-III of dry rice respectively, whereas for cooked rice, the accuracies are slightly lower at 89.49%, 87.93%, and 91.72%. Similarly, SVM achieves higher accuracies for dry rice, with rates of 92.23%, 89.65%, and 92.43% for grade-I, grade-II, and grade-III respectively, compared to accuracies of 89.72%, 88.43%, and 91.81% for cooked rice. Furthermore, the proposed decision tree (DT) outperforms both KNN and SVM classifiers for both dry and cooked rice. Specifically, for dry rice, the DT classifier achieves superior accuracies of 97.75%, 97.48%, and 97.92% for grade-I, grade-II, and grade-III respectively, compared to accuracies of 95.32%, 95.67%, and 93.69% achieved for cooked rice across the same grades. From this, the accuracy of classification is better in dry stage than in the cooked stage in case of all applied techniques.
Figure 6 explicates the accuracy of classification between dry and cooked rice grains based on color features. When comparing the accuracy of classification using KNN and SVM, both algorithms achieve higher accuracy for dry rice compared to cooked rice across different grades (grade-I, grade-II, and grade-III). Specifically, for KNN, the accuracy for dry rice is 91.87%, 90.56%, and 91.58%, respectively, while for SVM, it is 91.87%, 94.58%, and 92.62%. In contrast, for cooked rice, the accuracy is lower, with KNN achieving 86.34%, 87.74%, and 89.98%, and SVM achieving 88.12%, 89.67%, and 90.23%. However, the proposed decision tree (DT) algorithm demonstrates superior accuracy for both dry and cooked rice. For dry rice, the DT algorithm achieves accuracy rates of 96.96%, 95.45%, and 97.36%, respectively, which are higher than those achieved by KNN and SVM. Similarly, for cooked rice, the DT algorithm achieves accuracy rates of 94.56%, 93.86%, and 97.36%, indicating consistent performance across different grades. From this, it is evident that, the accuracy of classification is better in dry stage than in the cooked stage in case of all applied techniques in terms of color features.
Performance measurement in classification taking average of dry and cooked rice samples
In reality, the quality of rice is considered by the user not only in dry stage but also in the cooked stage and hence the average of dry and cooked stage accuracy value of classification by all the three models has been taken in to consideration. How close the measurement results are compared to the recognised result, have been shown with respect to accuracy, error, precision, recall and f-score. These have been evaluated for measuring the performance metric with respect to morphological, textural and color attributes for all the used algorithm, taking average between dry and cooked conditions.
Figure 7 illustrates the classification accuracy metrics averaged across both dry and cooked conditions for all algorithms used. CART-based Decision Trees (DT) excel in capturing intricate relationships among morphological, texture, and color features, effectively handling both color categories and morphology features, resulting in improved accuracy. Across all feature types, the proposed DT consistently outperforms KNN and SVM algorithms. Specifically, the proposed DT achieves an average classification accuracy of 96.58% for morphological features, 96.23% for texture features, and 95.82% for color features. In contrast, KNN achieves an average accuracy of 89.69% for morphological features, 89.57% for texture features, and 89.35% for color features, while SVM achieves 92.15%, 91.79%, and 90.76% for morphological, texture, and color features, respectively. These results highlight the superior performance of the proposed DT method across all three feature types compared to KNN and SVM methods.
Figure 8 illustrates the performance of error metrics by averaging dry and cooked conditions across all the algorithms utilized. CART-based DT determines the optimal thresholds or criteria for all features, thereby reducing classification errors within each subgroup. According to the data, the average error rates for morphological features are 10.25% for KNN and 7.81% for SVM, while the proposed DT achieves the lowest error rate at 3.73%. Similarly, for texture features, KNN and SVM exhibit average error rates of 10.37% and 8.65%, respectively, whereas the proposed DT achieves an error rate of 3.89%. In terms of color features, KNN and SVM achieve error rates of 10.74% and 9.56%, respectively, while the proposed method attains 4.09%. The results demonstrate that the proposed DT method consistently outperforms KNN and SVM methods across all three categories, achieving minimized error performance.
Figure 9 illustrates the precision performance metrics averaged across both dry and cooked rice conditions for all algorithms utilized. CART-based Decision Trees (DT) excel in feature selection, prioritizing characteristics that distinctly differentiate between cooked and dry rice samples, thereby enhancing precision. In terms of morphological features, KNN and SVM achieves average precisions of 89.79% and 92.05%, respectively, while the proposed DT attains a superior precision of 96.17%. Similarly, for texture features, KNN and SVM achieve average precisions of 89.49% and 91.49%, whereas the proposed DT achieves 96.12%. In the case of color features, KNN and SVM yield average precisions of 89.75% and 90.56%, while the proposed DT demonstrates a higher precision of 95.87%. These results indicate that the proposed DT method consistently outperforms KNN and SVM methods across all three feature categories in terms of superior average precision.
Figure 10 illustrates the recall performance metrics, averaged across both dry and cooked conditions, for all algorithms used. CART-based decision tree (DT) models adjust their decision boundaries to account for imbalances in the quantity of dry and cooked rice samples. According to the results, the proposed DT method outperforms KNN and SVM in terms of recall across all three categories of morphological, texture, and color features. Specifically, for morphological features, the proposed DT achieves a recall of 96.61%, while KNN and SVM achieve 89.73% and 92.08%, respectively. Similarly, for texture features, the proposed DT achieves a recall of 96.09%, compared to 89.59% for KNN and 91.62% for SVM. In terms of color features, the proposed DT method attains a recall of 95.89%, outperforming KNN and SVM, which achieve 90.87% and 90.36%, respectively. These results demonstrate the superior recall performance of the proposed DT method over KNN and SVM across all feature categories.
Figure 11 explains the performance of F-score metrics, averaging dry and cooked conditions across all algorithms used. CART-based decision tree models excel in identifying the most important features related to dry and cooked rice samples due to their interpretability, contributing to better F-score performance. In terms of morphological features, KNN and SVM achieve average F-scores of 89.81% and 92.11%, respectively, while the proposed decision tree (DT) method achieves the highest F-score of 96.48%. Similarly, for texture features, KNN and SVM achieve average F-scores of 89.59% and 91.8%, respectively, while the proposed DT method achieves an F-score of 96.04%. Regarding color features, KNN and SVM achieve average F-scores of 90.66% and 90.41%, respectively, whereas the proposed DT method achieves an F-score of 95.84%. Overall, the proposed DT method consistently outperforms KNN and SVM methods in terms of average F-score performance across all three categories.
Sensitivity and specificity analysis in classification of dry and cooked rice samples
Sensitivity and specificity can be represented by the True Positive Rate and False Positive Rate. TPR and FPR is mathematically calculated as per Eq. (14) and (15).
Table 5 shows the sensitivity and specificity analysis regarding morphology, texture and color features by taking the average of Dry and Cooked rice samples. In the context of classification, a high TPR value indicates that the model efficiently determines the particular grade of rice among the actual samples, as well as, a low FPR indicates that the samples are not incorrectly classified as belonging to a grade. Based on this above statement, the proposed DT model achieves minimum FPR value and higher TPR value, which evident that the performance of proposed model in terms of sensitivity and specificity achieves better results than the compared models KNN and SVM, for all three grades regarding morphology, texture and color features.
Proposed method selects three characteristics of rice kernel like morphology, texture and color have played a good role in classification of rice with respect to quality. ROC analysis is a useful graphical tool for reviewing the efficiency of a model’s classification ability. In this study, to analyze the sensitivity and specificity, Receiver Operating Characteristics (ROC) curve have been plotted as shown in Fig. 12. In this plot two parameters like TPR and FPR have been shown. The classifier's total performance is measured by the AU i.e., a higher AUC value (closer to 1) indicates the improved discrimination and overall accuracy of the model across various thresholds. Hence, the figure shows how accurately the rice samples have been classified by the proposed DT classification model compared to KNN and SVM model.
Overall, the proposed DT model in classifying the dry and cooked rice samples for morphology, texture and color properties in all three grades achieves the classification accuracy of 96.58%, 96.23% and 95.82%; with error performance 3.73%, 3.89% and 4.09%, with precision value 96.17%, 96.12% and 95.87%, with recall value 96.61%, 96.09% and 95.89%; and with F-score performance 96.48%, 96.04% and 95.84%. While comparing the average accuracy with KNN and SVM, the proposed DT model achieves betterment, thereby classifies the dry and cooked rice samples with their gradation precisely.
Conclusion
To develop a Classification and Regression Tree (CART) method for classifying rice grain samples as dry or cooked based on physicochemical characteristics, the images were captured and partitioned as training and testing data. Then the image was pre-processed followed by feature extraction, where the morphology, texture and color features were extracted. The extracted features were then classified using CART based DT model as dry and cooked rice with their respective gradation. Evaluate the robustness of the proposed DT model across different grades and phases of rice samples. It consistently achieves accurate classification results, even for challenging cases such as Grade-III rice grains, which exhibit subtle differences in physicochemical characteristics. Overall, the proposed DT model outperforms KNN and SVM models in terms of classification accuracy, error rates, precision, recall, F-score, sensitivity, specificity, and ROC analysis across morphology, texture, and colour features for different rice grades. This indicates the model's ability to accurately classify rice samples while minimizing misclassifications and also shows promising results in classifying dry and cooked rice samples.
Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Yang X, Pan Y, Xia X, Qing D, Chen W, Nong B, Zhang Z, Zhou W, Li J, Li D, Dai G (2023) Molecular basis of genetic improvement for key rice quality traits in Southern China. Genomics 115(6):110745
Fengfeng F, Meng C, Xiong L, Manman L, Huanran Y, Mingxing C, Ahmad A, Nengwu L, Shaoqing L (2023) Novel QTLs from Wild Rice Oryza longistaminata Confer Strong Tolerance to High Temperature at Seedling Stage. Rice Sci 30(6):577–586
Peramaiyan P, Srivastava AK, Kumar V, Seelan LP, Banik NC, Khandai S, Parida N, Kumar V, Das A, Pattnaik S, Sarangi DR (2023) Crop establishment and diversification strategies for intensification of rice-based cropping systems in rice-fallow areas in Odisha. Field Crop Res 302:109078
Saiwaeo S, Arwatchananukul S, Mungmai L, Preedalikit W, Aunsri N (2023) Human skin type classification using image processing and deep learning approaches. Heliyon 9(11)
Kristensen K, Ward LM, Mogensen ML, Cichosz SL (2023) Using image processing and automated classification models to classify microscopic gram stain images. Computer Methods and Programs in Biomedicine Update 3:100091
Yu W (2024) Image processing methods based on physical models. Results Phys 56:107199
Mittal S, Dutta MK, Issac A (2019) Non-destructive image processing based system for assessment of rice quality and defects for classification according to inferred commercial value. Measurement 148:106969
de Moraes RS, Coradi PC, Nunes MT, Leal MM, Müller EI, Teodoro PE, Flores EMM (2023) Thick layer drying and storage of rice grain cultivars in silo-dryer-aerator: Quality evaluation at low drying temperature. Heliyon 9:7
Ranathunga A, Thumanu K, Kiatponglarp W, Siriwong S, Wansuksri R, Suwannaporn P (2023) Image mapping of biological changes and structure-function relationship during rice grain development via Synchrotron FTIR spectroscopy. Food Chemistry Advances 2:100290
Shaodan L, Yue Y, Jiayi L, Xiaobin L, Jie M, Haiyong W, Zuxin C, Dapeng Y (2023) Application of UAV-based imaging and deep learning in assessment of rice blast resistance. Rice Sci 30(6):652–660
Shi SJ, Zhang GY, Cao CG, Jiang Y (2023) Untargeted UHPLC–Q-Exactive-MS-based metabolomics reveals associations between pre-and post-cooked metabolites and the taste quality of geographical indication rice and regular rice. J Integr Agric 22(7):2271–2281
Blanco V, Japón A, Puerto J (2022) A mathematical programming approach to SVM-based classification with label noise. Comput Ind Eng 172:108611
Hamidi AA, Robertson B, Ilow J (2023) A new approach for ECG artifact detection using fine-KNN classification and wavelet scattering features in vital health applications. Procedia Computer Science 224:60–67
Ahad MT, Li Y, Song B, Bhuiyan T (2023) Comparison of CNN-based deep learning architectures for rice diseases classification. Artificial Intelligence in Agriculture 9:22–35
Liao F, Feng X, Li Z, Wang D, Xu C, Chu G, Ma H, Yao Q, Chen S (2024) A hybrid CNN-LSTM model for diagnosing rice nutrient levels at the rice panicle initiation stage. J Integr Agric 23(2):711–723
Izquierdo M, Lastra-Mejías M, González-Flores E, Pradana-López S, Cancilla JC, Torrecilla JS (2020) Visible imaging to convolutionally discern and authenticate varieties of rice and their derived flours. Food Control 110:106971
Chun M, Jeong H, Lee H, Yoo T, Jung H (2022) Development of Korean Food Image Classification Model Using Public Food Image Dataset and Deep Learning Methods. IEEE Access 10:128732–128741
Shen L, Zhu Y, Wang L, Liu C, Liu C, Zheng X (2019) Improvement of cooking quality of germinated brown rice attributed to the fissures caused by microwave drying. J Food Sci Technol 56:2737–2749
Mohan D, Raj MG (2020) Quality Analysis of Rice grains using ANN and SVM. J Crit Rev 7(1):395–402
Maldaner V, Coradi PC, Nunes MT, Müller A, Carneiro LO, Teodoro PE, Teodoro LPR, Bressiani J, Anschau KF, Müller EI (2021) Effects of intermittent drying on physicochemical and morphological quality of rice and endosperm of milled brown rice. LWT 152:112334
Liu X, Li M, Guo P, Zhang Z (2019) Optimization of water and fertilizer coupling system based on rice grain quality. Agric Water Manag 221:34–46
Kobayashi H (2019) Variations of endoreduplication and its potential contribution to endosperm development in rice (Oryza sativa L.). Plant Production Science 22(2):227–241
Cinar I, Koklu, (2022) Identification of rice varieties using machine learning algorithms. Journal of Agricultural Sciences 9:9
Shi Y, Jia X, Yuan H, Jia S, Liu J, Men H (2020) Origin traceability of rice based on an electronic nose coupled with a feature reduction strategy. Meas Sci Technol 32(2):025107
Sampaio PS, Castanho A, Almeida AS, Oliveira J, Brites C (2020) Identification of rice flour types with near-infrared spectroscopy associated with PLS-DA and SVM methods. Eur Food Res Technol 246:527–537
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Compliance with ethics requirements
This article does not contain any studies with human or animal subjects.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bhattacharyya, S.K., Pal, S. Design and performance analysis of decision tree learning model for classification of dry and cooked rice samples. Eur Food Res Technol 250, 2529–2544 (2024). https://doi.org/10.1007/s00217-024-04555-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00217-024-04555-3