1 Introduction

Due to advancements in semiconductor technology, the complexity of integrated circuit design has been steadily increasing. The semiconductor manufacturing industry is focused on achieving larger scales and smaller process dimensions. Present-day wafer production relies on highly automated processes and utilizes high-precision equipment. Nonetheless, the manufacturing of semiconductor wafers remains intricate and time-consuming, involving numerous complex chemical steps and requiring meticulous monitoring of various process parameters. Despite professional regulation, the occurrence of abnormal dies on the wafer is unavoidable [1]. Therefore, improving the yield of semiconductor production is a crucial issue. The core step in the semiconductor manufacturing process is wafer manufacturing. The wafer manufacturing process requires a series of extremely complex operations such as oxidation, photolithography, etching, ion implantation, and metallization [2], which can easily cause wafer defects [3]. Therefore, after the wafer is manufactured, it is necessary to conduct an electrical test on each die on the wafer to mark whether there are defects in the wafer, and thus a wafer map [4] is generated. The wafer map obtained through testability inspection often forms a certain spatial pattern, which can reflect some problems in the production process. Realizing accurate classification of wafer maps can provide a theoretical basis for engineers to solve production problems, improve wafer production yield, and reduce production costs [5].

Wafer map defect pattern recognition and classification is an indispensable part of the semiconductor manufacturing process, because it can locate the cause of failures in the manufacturing process. In 2023, Jianhong Ma et al. [6] published a comprehensive review on wafer defect detection, which outlines the advancements in wafer defect pattern recognition in recent years. Naigong Yu et al. [7] developed an 8-layer CNN model to inspect wafer map defects and a 13-layer model to classify defect patterns in 2019. In 2023, Chen et al. [8] proposed a deep convolutional neural network architecture incorporating improved convolutional block attention module (I-CBAM) for pattern recognition and classification of single-type wafer map defects, but the accuracy of Edge-Loc and Local still needs to be improved.

Some researchers have started using various methods for defect pattern recognition. For example: Li et al. [9] used semi-supervised learning for wafer map defect pattern recognition. This approach leverages unlabeled samples to improve recognition accuracy. Iljoo Jeong et al. [10] employed a geometric transformation-invariant convolutional neural network. Wang et al. [11] utilized rotation-invariant features Tziolas et al. [12] addressed the challenge of imbalanced datasets.

In addition to the specific spatial pattern of the wafer map itself, there is a lot of noise. The defect patterns in these wafer maps can be well identified by manual recognition [13, 14], but for machine learning methods, it is difficult to identify the real defect pattern types of the wafer maps. Therefore, it is necessary to filter the wafer map. Traditional filtering algorithms include spatial filtering, median filtering, mean filtering, etc. [14,15,16]. In 2022, Pourkaramdel et al. [17] proposed a new texture descriptor called complete local quartet pattern (CLQP), which is used to extract local texture features of fabric images and detect fabric defects in the training step. In 2023, Gulhan [18] proposed a new hybrid optical sensor based on deep learning to sense the micro-size defects on PCBs. These methods are extremely dependent on the setting of parameters. If the parameter threshold is set too small, the filtering will be incomplete, and if the parameter threshold is set too large, the defect pattern of the wafer map itself will be destroyed. Especially in the face of a large number of wafer maps, it is even more difficult to find a reasonable parameter that can be applied to all wafer maps. Therefore, such traditional filtering algorithms are not suitable for wafer maps.

(filter 算法)In addition to traditional filtering algorithms, many algorithms for filtering through clustering have also been derived [19]. For example, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is a typical algorithm based on density clustering, which can find clusters of arbitrary shapes and remove isolated noise points. The most important thing about the DBSCAN algorithm is to determine the two key parameters, the minimum neighborhood radius (Eps) and the minimum number of points (MinPts). Chen et al. [20] used the DBSCAN algorithm to filter the wafer map and chose to set the Eps parameter to \(\sqrt{2}\) and the MinPts parameter to 3. The selection of parameters directly affects the accuracy of clustering. Like traditional filtering algorithms, the setting of this parameter is not suitable for large-scale wafer images with variable shapes. Facing the problem that the parameters are difficult to determine, Chen et al. [21] proposed an adaptive DBSCAN clustering algorithm for wafer map data. The algorithm uses a comprehensive index of inter-cluster density and intra-cluster density to select optimal parameters and realizes adaptive filtering of large-scale wafer map data. However, this algorithm is aimed at single-type wafer maps. For mixed-type wafer maps, the number of clusters retained cannot be determined well.

However, as the wafer process becomes more complex and the wafer size becomes larger [22], the probability of mixed-type wafer maps is constantly increasing. If the size of the mixed-type wafer map and the single-type wafer map are the same in the classification and recognition process on the wafer map, the mixed-type wafer map is more susceptible to noise points than the single-type wafer map. Therefore, it is necessary to perform effective filtering and denoising processing on the mixed-type wafer map to facilitate the identification of the mixed-type wafer map. Kim et al. [23] proposed a connecting path filtering algorithm for 6 mixed-type wafer maps. The principle of this method is to find all possible connecting paths between two defect points. However the path length needs to exceed a certain set threshold, and the setting of this threshold is also related to the filtering effect. The same threshold setting is not necessarily applicable to all wafer maps. A. A. Ezzat et al. [24] proposed an adjacency clustering (AC) filtering algorithm based on graph theory for 12 mixed-type wafer maps. However, this method also requires manual setting of parameters, and then a large number of experiments on different parameters to select a suitable parameter, which is not suitable for large-scale wafer map data.

Among the above filtering algorithms for mixed-type wafer maps, the most prominent problem is still the selection of parameters. No matter what parameters are selected, they are not all applicable to different wafer maps. Therefore, this paper proposes a method for denoising mixed-type wafer maps—Neighborhood Path Filtering Clustering (NPFC), which is not affected by parameters. Based on the idea of clustering, this method uses the clustering index of the Silhouette coefficient and the similarity index of Mahalanobis distance to propose a cluster similarity index SD to determine the number of clusters in the wafer map. Then, by calculating the compactness index f between the clusters, it is judged whether the clusters are merged or not, to obtain the final clustering effect. The algorithm in this paper is mainly designed for the particularity that the wafer map is a matrix structure and consists of only three numeric values.

The rest of the paper is arranged as follows: The second section introduces in detail some concepts of clustering and related concepts redefined in this article and gives a detailed description of the specific steps of the basic principles of NPFC. The third section compares the method proposed in this paper with other experimental methods from the perspectives of filtering and clustering. The fourth section summarizes the content of the full text and the outlook for future work.

2 Related Work

A wafer map is a two-dimensional digital matrix structure that differs significantly from a typical image. As shown in the left side in Fig. 1, a matrix composed of three numbers zero, one, and two (zero means no grain, one means normal grains, two means defective grains), and the pixel value of a typical image is 0–255, with a relatively large range of values. Therefore, the traditional filtering method is not suitable for this kind of image structure with only three digits. If each wafer grain is represented by coordinates, each grid cell can be regarded as a point. As shown on the right side of Fig. 1, The data points in a wafer map are distributed in a relatively regular pattern, with the distance between the central point and its eight neighboring points being either 1 or \(\sqrt{2}\).

Fig. 1
figure 1

Structure of the wafer map

Traditional clustering methods are typically designed for scattered data, so clustering filtering methods need to be adjusted to be applicable to wafer maps. In this paper, a filtering algorithm based on clustering principles is proposed to address the specificity of wafer map structures.

2.1 Algorithm Dependent Definition

This article defines some concepts related to points and clusters, with examples illustrated in in Fig. 2. Given a data set sample \(X=\left\{{x}_{1},{x}_{2},\dots ,{x}_{n}\right\}\), the specific concepts are as follows:

Fig. 2
figure 2

Specific definition example

(1) Initial point: Before forming a cluster, the first selected point \(O\) is the initial point (the initial point is randomly selected), \(O\in X\).

(2) Direct path connected: If there is a point \(P\) in the 3 × 3 neighborhood of the initial point\(O\), then the point \(P\), point \(R\) and point \(O\) are direct path connected.

(3) Path connected: If a point \(Q\) is directly connected to a connected point \(P\), then point \(Q\) is path-connected to the initial point \(O\).

(4) Connected points: A point \(P\) that is directly connected to the initial point \(O\) is referred to as a connected point. In Fig. 2, all black triangles represent connected points.

(5) Cluster: A non-empty set where any point within the set is either direct path connected or path connected to an initial point is classified as a cluster. A cluster \(D\) should satisfy the following two conditions:

\({x}_{i}\in D, {x}_{j}\in D\Rightarrow\) point \({x}_{i}\) is path-connected with point \({x}_{j}\).

\({x}_{i}\in D\), point \({x}_{j}\) is path-connected with point \({x}_{i}\Rightarrow\) \({x}_{j}\in D\).

As shown in Fig. 2, the defect point marked with a pentagon and the surrounding defect points marked with triangles form a cluster \(D.\)

(6) Core cluster: The collection of all clusters sorted according to the number of data samples from high to low is \(\pi =({C}_{1},{C}_{2},\dots ,{C}_{i},\dots ,{C}_{M})\),where\(M>i>2>1\), and select the Num clusters with the most data points from among them as the core cluster\(({C}_{1},{C}_{2},\dots ,{C}_{i},\dots ,{C}_{Num})\), where\(Num>i>2>1\). Overall,\(M\ge Num\).

(7) Noise points: A noise point is a defect point with no other defect points in its 3 × 3 neighborhood. Similarly, twin defect points that only have each other in their respective 3 × 3 neighborhoods are also considered noise point. For instance, point \(G\), \(H\) and \(I\) in Fig. 2 are examples of noise points.

From the definitions above and as illustrated in Fig. 2, it can be seen that regardless of whether point \(O,P,Q,\) or any other point marked with a triangle is chosen as the initial point, a cluster \(D\) can always be formed. Therefore, compared to the DBSCAN clustering algorithm, the neighborhood path filtering algorithm does not require identifying core points of clusters or checking whether points directly reachable from or density-reachable to core points need to continue meeting the criteria for cluster expansion. This approach reduces computational complexity while still effectively capturing clusters of arbitrary shapes.

2.2 Neighborhood Path Filtering Clustering

In response to the preprocessing challenges posed by mixed defect pattern wafer maps, this paper introduces the Neighborhood Path Filtering Clustering (NPFC) algorithm, which circumvents the parameter selection issues encountered in typical filtering methods and clustering-based filtering methods. The flowchart of the NPFC algorithm is depicted in Fig. 3.

Fig. 3
figure 3

Flowchart of NPFC algorithm

(1) Form preliminary clusters by using the initial point as the core. Identify connected points within the smallest neighborhood of 3 × 3 around the initial point. Include points directly or path-connected within this area.

(2) Mark and remove noise points and clusters. Mark isolated defect points or twin defect points [22]  not in any cluster as noise points, and remove noise points.

(3) Merge clusters. The Num clusters with the largest number are selected as the core clusters, and the remaining clusters are merged with the Num clusters as the core. Calculate the distance between the remaining clusters and the core cluster, the closest and the farthest distance relationship between the core cluster and the central origin, and the closest and farthest distance relationship between the remaining clusters and the central origin. Through the above relationship, it is determined whether the cluster needs to be merged with the core cluster and which core cluster to merge with.

2.2.1 Preliminary Clustering of Wafer Map

The types of defects in the wafer map are composed of small clusters. The preliminary clustering aims to form multiple small clusters while eliminating isolated noise points and twin noise points. For clarity, the preliminary clustering steps of the wafer map are illustrated in Fig. 4. The specific clustering steps are as follows:

Fig. 4
figure 4

Diagram of preliminary clustering

(1) First select the initial point. The defect point marked with a pentagon is designated as the initial point in Fig. 4 ②, and there are two defective points in the 3 × 3 neighborhood of the initial point.

(2) Divide the connected point and the initial point into the same cluster. As shown in Fig. 4③, two defect points found are marked with triangles, and the pentagram mark and the triangle mark are grouped into the same cluster.

(3) Find the point that is path connected with the initial point. Query the 3 × 3 neighborhood of the triangle marked defect point in Fig. 4 ③. Two defect points are found and marked as triangles, as shown in Fig. 4 ④.

(4) Repeat the process of step (3) until there are no unmarked defect points in the 3 × 3 neighborhood of the defect points to be queried, as shown in Fig. 4 ⑦, that is, a cluster is formed.

(5) Repeat steps 1) 2) 3) 4) until all the defect points on the wafer map are divided into clusters or noise points, as shown in Fig. 4 ⑧.

The defect types on a wafer map are all composed of small clusters. By employing the aforementioned clustering method, it is possible to capture clusters of any shape. This approach is versatile and applicable to various types of wafer maps.

2.2.2 Determine the Number of Clusters in the Wafer Map

Preliminary clustering divides the wafer map into many small clusters, but the overall defect type pattern of the wafer map is less than or equal to the number of preliminary clusters. Therefore, it is necessary to determine the total number of clusters representing the wafer map's overall defect pattern before proceeding with the merging of these small clusters. Through the observation of many wafer maps, the number of defect types in the wafer map is generally 1 to 4. To prevent the unobserved wafer map may have more defect types, this article considers the number of defect types to be 1 to 5. Therefore, the number of clusters is also in the range of 1 to 5. When discussing mixed-type wafer maps, the number of clusters ranges from 2 to 5. This paper proposes a new clustering similarity index (SD) to determine the number of clusters in wafer maps. It combines the Silhouette coefficient and the Mahalanobis distance.

(1) Silhouette coefficient.

The Silhouette coefficient [25] is a way to evaluate the clustering effect. Assuming that the data sample \(X=\left\{{x}_{1},{x}_{2},\dots ,{x}_{n}\right\}\) has been divided into number Num clusters by the algorithm. The silhouette coefficient for a sample point \({x}_{i}\) in the k-th cluster is calculated as shown in Eq. (21).

$$S(i)=\frac{b(i)-a(i)}{\mathit{max}\{a(i),b(i)\}}$$
(1)

Among them, \(a(i)\) is the average value of the distance from the sample point \({x}_{i}\) and other points in the k-th cluster, which is used to indicate the degree of separation between clusters. b \((i)\) is the minimum value of the distance from the sample point \({x}_{i}\) and other points in the k-th cluster.

(2) Mahalanobis distance.

Mahalanobis distance [26] is a metric used to evaluate the similarity between data points. For one-dimensional sample data, The specific calculation is shown in Eq. (2).

$${D}_{M}(x)=\sqrt{(x-\mu {)}^{T}{\Sigma }^{-1}(x-\mu )}$$
(2)

When evaluating the similarity between two images using the Mahalanobis distance, let the sample data of the first image be \(X=\{{x}_{1},{x}_{2},...,{x}_{n}{\}}^{T}\), and the sample data of the second image be \(Y=\{{y}_{1},{y}_{2},...,{y}_{n}{\}}^{T}\). The specific calculation is shown in Eq. (3).

$${D}_{M}(x,y)=\sqrt{(x-y{)}^{T}{\Sigma }^{-1}(x-y)}$$
(3)

In Eq. (3), \(\Sigma\) is the covariance matric of \(x\) and \(y\).

(3) Clustering similarity index SD.

The silhouette coefficient is a metric used to evaluate the clustering effectiveness of different algorithms based on the same raw data. However, when determining the number of clusters in this study, the comparison is made between the clustering effects before and after merging or removing clusters. Since these comparisons involve data with certain differences, relying solely on the silhouette coefficient could lead to inaccuracies. Therefore, this paper uses a combined metric, SD, which integrates both the silhouette coefficient and the Mahalanobis distance (used as a similarity measure), to select the appropriate number of clusters. The calculation method of SD is shown in the formula (4).

$$SD=\frac{1}{n}{\sum }_{i=1}^{n}S(i)+{D}_{M}(X,Y)$$
(4)

The larger the silhouette coefficient, the better the clustering effect. The larger the Mahalanobis distance, the greater the similarity between wafer maps. Therefore, the larger the SD, the better. The process for determining the number of clusters in wafer map clustering is shown in Fig. 5. The number of clusters, Num, ranges from 2 to 5, and the optimal number of clusters is the one where the SD is the largest.

Fig. 5
figure 5

Flowchart for determining the number of clusters

2.2.3 Remove or Merge Wafer Map Clusters

The initial clustering results in many small clusters. To obtain a clustering result with an appropriate number and reasonable division of data samples, further cluster merging and removal are required. Given the number of clusters, Num, the specific steps for merging or removing clusters are as follows:

Combine the clusters whose closest distance between clusters does not exceed \(\sqrt{8}\) (\(\sqrt{8}\) is the distance from the center of the entire square to the center of the diagonal square when the neighborhood is 5 × 5).

Find the largest Num clusters as core clusters and merge them with other clusters.

Merge or remove clusters. The rule are as follows:

Calculate the radius relationship between the remaining clusters and the core clusters. Take the center of the wafer map as the origin. Compute the distance from the origin to each defect point within each cluster. Use the maximum and minimum distances from the origin to determine the maximum and minimum radius distances for each cluster. Then, assess the relationship between the maximum and minimum radius distances between the remaining clusters and the core clusters. The schematic diagram of the radius relationship is shown in Fig. 6. The red markers indicate the core clusters, and the red arrows represent the maximum and minimum radius distances of the core clusters.

Fig. 6
figure 6

Schematic diagram of radius distance relationship

Figure 6 illustrates the following four possible cases between the remaining clusters and the core clusters:

Case 1: The radius range of the cluster is entirely within the radius range of the core cluster, namely:

$${R}_{1min}\ge {R}_{0min},{R}_{1max}\le {R}_{0max}$$

Case 2: The maximum radius distance of the cluster exceeds the maximum radius distance, namely:

$${R}_{2min}\ge {R}_{0min},{R}_{2max}>{R}_{0max}$$

③ Case 3: The minimum radius distance of the cluster exceeds the minimum radius distance of the core cluster, namely:

$${R}_{3min}<{R}_{0min},{R}_{3max}\le {R}_{0max}$$

④ Case 4: The radius range of the cluster is completely outside the radius range of the core cluster, namely:

$${R}_{4min}>{R}_{0min},{R}_{4max}>{R}_{0max}$$

If it belongs to the ①②③ cases, set a parameter α = 1. If it belongs to the ④ case, α = 0, directly remove the cluster.

(2) The distance between the cluster and the core cluster.

Assuming that the initial set of clusters after preliminary clustering is \({P}_{1}=\left\{{C}_{1},{C}_{2},\dots ,{C}_{N}\right\}\), the core cluster set is \({P}_{2}=\left\{{C}_{1},\dots ,{C}_{Num}\right\}\), then the rest of clusters set is \({P}_{3}=\left\{{C}_{Num+1},\dots ,{C}_{N}\right\}\). Calculate respectively the nearest distance \(nDist\), the farthest distance \(fDist\), and the average distance \(mDist\) between the sample points in the \({P}_{2}\) cluster and those in cluster \({P}_{3}\), as shown in Eqs. (5, 6 and 7).

$$nDist = min\left(Dist\left(X,Y\right)\right)$$
(5)
$$fDist = max\left(Dist\left(X,Y\right)\right)$$
(6)
$$mDist=mean(Dist(X,Y))$$
(7)

where \(X=\left\{{x}_{1},{x}_{2},..,{x}_{m}\right\}\) represents the sample point of cluster \({C}_{i}\) in the core cluster, \(X\in {C}_{i}\), \({C}_{i}\in {P}_{2}\). Similarly, \(Y=\left\{{y}_{1},{y}_{2},..,{y}_{n}\right\}\) represents the sample points in cluster \({C}_{j}\) of the remaining clusters, \(Y\in {C}_{j}\), \({C}_{j}\in {P}_{3}\). \(Dist\) denotes the Euclidean distance, \(min\) is the minimum value, \(max\) denotes the maximum value, and \(mean\) denotes the average value.

(3) The compactness after merging clusters.

Assume \(X=\left\{{x}_{1},{x}_{2},..,{x}_{m}\right\}\), where \(X\in {C}_{i}\) and \({C}_{i}\in {P}_{2}\), \(Y=\left\{{y}_{1},{y}_{2},..,{y}_{n}\right\}\), where \(Y\in {C}_{j}\) and \({C}_{j}\in {P}_{3}\). After merging the data into \(Z=\{{x}_{1},{x}_{2},...,{x}_{m},{y}_{1},{y}_{2},...,{y}_{n}\}\), the compactness \(Den\) after merging clusters is given by Eq. (8) [27]. The larger compactness of \(Den\) is, the more suitable it is to merge the two clusters.

$$Den=mean(mean(Dist(Z,Z)))$$
(8)

(4) The closeness index.

The relationship between the radius and distance of composite clustering clusters and core clusters, as well as the compactness after merging two clustering clusters, establish the comprehensive indicator of closeness f as the final criterion for whether to merge. The expression of the closeness index f is shown in Eqs. (9 and 10).

$${f}_{1}= \alpha *Count({C}_{\mathit{min}\left(nDist\right)}\cap {C}_{\mathit{max}\left(Den\right)}\cap {C}_{(\mathit{mDist}<R)})$$
(9)
$${f}_{2}= \alpha *Count({C}_{\mathit{min}\left(nDist\right)}\cap {C}_{\mathit{min}\left(fDist\right)}\cap {C}_{\mathit{max}\left(Den\right)}\cap {C}_{(mDist<R)})$$
(10)

where \({C}_{\mathit{min}\left(nDist\right)}\) represents the core cluster with the minimum nearest distance to the current clustering cluster, \({C}_{\mathit{min}\left(fDist\right)}\) represents the core cluster with the minimum furthest distance to the current clustering cluster, and \({C}_{\mathit{max}\left(Den\right)}\) represents the core cluster with the maximum density after merging with the current clustering cluster, \({C}_{(mDist<R)}\) represents the intersection symbol, indicating that the average distance to the current clustering cluster is less than the wafer map radius R, \(\cap\) represents the intersection symbol, and Count() means counting operations.

Regarding the decision of whether to merge or remove a specific cluster (other than the core cluster), and determining which core cluster to merge with, the following methods are used for assessment:

Judge \({f}_{1}\). If for any core cluster, \({f}_{1}=0\), then remove that cluster.

If \({f}_{1}=1\), merge the cluster with the corresponding core cluster.

If \({f}_{1}>1\), \({f}_{2}=0\), remove the cluster.

If \({f}_{1}>1\), \({f}_{2}=1\), merge the cluster with the corresponding core cluster.

Apply the above judgments to each cluster until all remaining clusters in the wafer map are either merged or removed. The process of clustering and filtering on the wafer map using the above method is illustrated in Fig. 7. Initially, after preliminary clustering, a preliminary clustering image is obtained as shown in Fig. 7(b). Then, through determining the number of clusters, merging, and removing clusters, a clustered image is obtained as shown in Fig. 7 (c). Finally, the representation is restored to the original image format, resulting in the filtered map as shown in in Fig. 7(d).

Fig. 7
figure 7

Process of NPFC

3 Experimental Results and Analysis

The Neighborhood Path Filtering Clustering (NPFC) algorithm is a filtering method based on clustering concepts. To validate the effectiveness of the NPF algorithm proposed in this paper for preprocessing wafer maps with mixed defect patterns, this section compares its filtering and clustering performance with several existing methods.

3.1 Dataset

3.1.1 Data

The wafer map data used in this paper is 110 mixed-type wafer maps selected from the marked wafer maps in the WM-811 K database provided by Ming-Ju Wu et al. [28]. Among the 110 mixed-type wafer maps, there are 6 wafer maps labeled Center, 3 wafer maps labeled Donut, 30 wafer maps labeled Edge-Loc, 8 wafer maps labeled Edge-Ring, 60 wafer maps labeled Edge-Loc, and 3 wafer maps labeled Scratch. For ease of comparative experimental study, this paper uses the same 12 wafer maps with an article [24] as in reference (not duplicated with the 110 wafer maps selected above). Figure 8 shows the 12 mixed-type wafers.

Fig. 8
figure 8

Original wafer map

3.2 Comparison Index

(1) Evaluation index of filtering effect.

Common filtering evaluation indexes are mainly as follows:

① Mean Absolute Error (MAE).

The smaller the value of MAE, the smaller the deviation from the original image and the better the image quality.

$$MAE=\frac{{\sum }_{i=1}^{L}{\sum }_{j=1}^{W}|G(i,j)-G{\prime}(i,j)|}{L\times W}$$
(11)

② Normalized Mean Square Error (NMSE).

NMSE is a measurement method based on energy normalization. The smaller the value of NMSE, the better the image quality.

$$NMSE=\frac{{{\sum }_{i=1}^{L}{\sum }_{j=1}^{W}|G(i,j)-G{\prime}(i,j)|}^{2}}{{\sum }_{i=1}^{L}{\sum }_{j=1}^{W}|G(i,j){|}^{2}}$$
(12)

③ Signal-to-Noise Ratio (SNR).

The larger the value of SNR, the better the image quality.

$$SNR=10{\mathit{log}}_{10}[\frac{{\sum }_{i=1}^{L}{\sum }_{j=1}^{W}|G(i,j){|}^{2}}{{{\sum }_{i=1}^{L}{\sum }_{j=1}^{W}|G(i,j)-G{\prime}(i,j)|}^{2}}]$$
(13)

Peak signal-to-noise ratio (PSNR).

The higher the value of PSNR, the better the image quality.

$$PSNR=10{\mathit{log}}_{10}[\frac{25{5}^{2}\times L\times W}{{{\sum }_{i=1}^{L}{\sum }_{j=1}^{W}|G(i,j)-G{\prime}(i,j)|}^{2}}]$$
(14)

where \(G(i,j)\) represents the pixel value of the original image, \(G`(i,j)\) represents the pixel value of the filtered image, and L and W respectively represent the length and width of the wafer map.

(2) Evaluation index of the internal clustering effect.

Assuming that the clusters of the clustering results of the wafer map are divided into\(C=\left({C}_{1},{C}_{2},\dots ,{C}_{K}\right)\), the definitions are as follows: the average distance between samples is \(Avg\left(C\right)\), the longest distance between samples is\(diam\left(C\right)\), the nearest distance between cluster samples from different cluster is \({d}_{min}\left({C}_{i},{C}_{j}\right)\) and the center distance between clusters is \({d}_{cen}\left({C}_{i},{C}_{j}\right)\), as shown in formulas (15161718 and 19).

$$avg\left(C\right)=\frac{2}{|C|(|C|-1)}{\sum }_{1\le i<j\le |C|}dist({x}_{i},{x}_{j})$$
(15)
$$diam(C)={max}_{1\le i<j\le |C|}dist({x}_{i},{x}_{j})$$
(16)
$${d}_{min}({C}_{i},{C}_{j})={min}_{{x}_{i}\in {C}_{i}, {x}_{j}\in {C}_{j}}dist({x}_{i},{x}_{j})$$
(17)
$${d}_{cen}({C}_{i},{C}_{j})=dist({\mu }_{i},{\mu }_{j})$$
(18)

where \(\upmu\) represents the center point of cluster C, \(\mu =\frac{1}{|C|}{\sum }_{1\le i\le |C|}{x}_{i}\).

Common metrics used for evaluating internal clustering performance (Vinh et al. 2009) includes the Davies-Bouldin Index (DBI), Dunn Index (DI), Calinski-Harabasz (CH) index, and Generalized Dunn Index (GDI). Respectively as shown in formulas (192021 and 22).

$$DBI=\frac{1}{K}{\sum }_{i=1}^{K}\underset{j\ne i}{max}(\frac{avg({C}_{i})+avg({C}_{j})}{{d}_{cen}({C}_{i},{C}_{j})})$$
(19)
$$DI=\underset{1\le i\le k}{min}\left\{\underset{j\ne i}{min}\frac{d\mathit{min}({C}_{i},{C}_{j})}{{\mathit{max}}_{1\le l\le k}diam({C}_{l})}\right\}$$
(20)
$$CH=\frac{r-K}{K-1}\frac{{{\sum }_{k=1}^{K}{r}_{k}||{G}^{k}-G||}^{2}}{{{\sum }_{k=1}^{K}{\sum }_{i}||{C}^{k}-{G}^{k}||}^{2}}$$
(21)
$$GDI=\frac{\underset{k\ne k{\prime}}{min}\frac{1}{ {r}_{k}+{r}_{k{\prime}}}({\sum }_{i}||{C}_{i}^{k}-{G}^{k}||+{\sum }_{j}||{C}_{j}^{k{\prime}}-{G}^{k{\prime}}||)}{\underset{k}{max }\underset{i\ne j}{ max}||{C}_{i}^{k}-{C}_{j}^{k}||}$$
(22)

where \(G\) represents the center point of all coordinates, \({G}^{k}\) represents the center point of all defect point coordinates in the k-th cluster, r represents the number of all defect points on the wafer map, and \({r}_{k}\) represents the number of defect points in the k-th cluster. The smaller the value of DBI, the better the clustering effect. On the contrary, the larger the value of DI, CH, and GDI, the better the clustering effect.

(3) Evaluation index of the external clustering effect.

The Infinite Warped Mixture Model (iWMM) is a non-parametric Bayesian model that can find non-linearly separable clusters with complex shapes (non-Gaussian distribution). Through the warping process, complex shapes with non-Gaussian distributions can be modeled in the latent space using simpler Gaussian distributions, so that the model can be effectively inferred [23]. For a detailed introduction to iWMM, please refer to the paper [23, 24].

In addition to internal clustering indicators, In addition to internal clustering metrics, this paper also uses three external metrics of the iWMM model to compare with adjacency clustering [24]. The definitions of the three external indicators are as follows:

Assuming that the true label of the data set D is \(L=\left\{{L}_{1},{L}_{2},\dots ,{L}_{n}\right\}\), the predicted label is \(\widehat{L}=\left\{{\widehat{L}}_{1}, {\widehat{L}}_{2},\dots ,{\widehat{L}}_{n}\right\}\), TP, FP, TN, and FN respectively represent the number of samples that meet the four conditions in Table 1, then TP + FP + TN + FN is equal to the total number of samples N of the dataset D.

Table 1 Four relationships between true labels and predicted labels

① Rand Index (RI)

$$RI=\frac{TP+TN}{TP+FN+FP+TN}$$
(23)

It can be seen from the above formula that the range of RI is [0, 1]. When the predicted label is completely inconsistent with the real label, RI = 0. When the predicted label is completely consistent with the real label, RI = 1. Therefore, the larger the value of RI, the more accurate the prediction.

② Adjusted Rand Index (ARI)

$$ARI=\frac{N*(TP+FN)-[(TP+TN)(TP+FP)+(FP+FN)(TN+FN)]}{{N}^{2}-[(TP+TN)(TP+FP)+(FP+FN)(TN+FN)]}$$
(24)

The larger the value of ARI, the more similar the predicted label and the real label, and the more accurate the prediction result.

③ Normalized mutual information (NMI)

$$NMI(L,\widehat{L})=\frac{I(L,\widehat{L})}{H(L,\widehat{L})}$$
(25)
$$I(L,\widehat{L})={\sum }_{i=1}^{L}{\sum }_{j=1}^{\widehat{L}}\frac{{n}_{ij}}{n}\mathit{log}(\frac{{{n}_{i}}_{j}/n}{({\sum }_{j=1}^{\widehat{L}}{n}_{ij})({\sum }_{i=1}^{L}{n}_{ij})/{n}^{2}})$$
(26)
$$H(L,\widehat{L})={\sum }_{i=1}^{L}{\sum }_{j=1}^{\widehat{L}}\frac{{n}_{ij}}{n}\mathit{log}(\frac{{{n}_{i}}_{j}/n}{({\sum }_{i=1}^{L}{n}_{ij})/n})$$
(27)

where \({n}_{ij}\) represents the same number of samples that have the same true label belonging to cluster i and the predicted label belonging to cluster j, n represents the number of cluster samples in the cluster. The larger the NMI, the better the clustering effect.

3.3 Experimental Result and Comparison

3.3.1 Comparison of Filtering Effects

To validate the filtering effectiveness of the method proposed in this paper, a comparison was made with median filtering(MF) using four filtering evaluation metrics: MAE, NMSE, SNR, and PSNR. The filtering effects of NPFC and MF on the 12 wafer maps are shown in Fig. 9. The results for the 110 wafer maps are shown in Table 2, and the comparison of filtering metrics for the 12 wafer maps is presented in Table 3. In all tables of the paper, superior data is highlighted in bold, with “↑” indicating that a higher value is better for that metric and “↓” indicating that a lower value is better.

Fig. 9
figure 9

Comparison of median filtering and NPFC

Table 2 Comparison of the filtering effect of 110 wafer maps
Table 3 Comparison of the filtering effect of 12 wafer maps

From the visual comparison in Fig. 9, it can be observed that compared to median filtering, NPFC method’s wafer images (a) to (k) have removed more noise points while retaining the main spatial patterns intact. In contrast, the wafer image (l), after median filtering, shows that the basic spatial patterns have been disrupted, whereas the NPFC method retains them completely. According to the comparison results of the four evaluation metrics—MAE, NMSE, SNR, and PSNR—shown in Tables 2 and 3, the NPFC algorithm scores higher in most of these indicators than the median filtering method. In the comparison of 110 wafer images, the NPFC method has a superior number of 99 in all four indicators, which is significantly higher than the median filtering. Additionally, the average values are better with the NPFC than with the median filtering. Therefore, the NPFC algorithm proposed in this paper is superior to the median filtering method.

3.3.2 Comparison of Clustering Effects

(1) Comparison of SD index for different numbers of clusters.

When determining the number of clusters Num, the SD index is used to determine the number of clusters. The larger the value of the SD index, the better the cluster effects. Therefore, the number of clusters corresponding to the maximum SD value is chosen as the final number of clusters. The SD indicators for different numbers of clusters for the wafer images are shown in Table 4 (clustering stops when Num exceeds the actual number of core clusters that the wafer images can form, hence some cluster numbers do not have data).

Table 4 Comparison of SD index for different numbers of clusters

From Table 4, it can be seen that only the wafer image (c) has a final cluster number of 3, while the other images have a cluster number of 2. Upon comparison, these results align with the actual number of clusters that should be present in the wafer images. Similarly, for the remaining 110 wafer images, the determined number of clusters also meets the expected requirements. Therefore, the method used in this paper to determine the number of clusters is feasible.

(2) Comparison of the NPFC clustering and K-means clustering.

The NPFC method is a filtering technique based on the clustering process. Therefore, this paper compares the clustering process of the NPFC method with the traditional K-means clustering method using four clustering indicators: DBI, DI, CH, and GDI. The clustering results for 12 wafer images using different clustering methods are shown in Table 5, while the clustering results for 110 wafer images are illustrated in Fig. 10.

Table 5 Comparison of four indexes of different clustering methods for 12 wafer maps
Fig. 10
figure 10

Comparison of the optimal clustering results for the 110 wafer images

From the comparison of the four clustering indicators for the 12 wafer images in Table 5, it is evident that the NPFC algorithm has an absolute advantage over the K-means algorithm in terms of the DBI, DI, and GDI indicators. For the CH indicator, both algorithms perform similarly. Additionally, from the comparison of the optimal clustering indicators for the 110 wafer images shown in Fig. 10, it can be observed that the neighborhood path clustering outperforms the K-means clustering. Therefore, the neighborhood path filtering algorithm proposed in this paper is superior.

(3) Comparison of NPFC algorithm and Adjacent Clustering algorithm.

The Adjacent Clustering (AC) [18] is a graph-theoretic clustering method, and it is also a method designed for mixed types of wafer maps. In this paper, we compare the clustering performance of the NPFC method with the Adjacency Clustering method using the clustering metrics and 12 wafer images referenced in [22]. The comparison results are shown in Table 6 and 7.

Table 6 Comparison of CH and GDI index of different clustering methods
Table 7 Comparison of RI, AMI, and NMI of different clustering methods

From the comparison of the clustering metrics CH and GDI in Table 6, it is clear that the clustering performance of the method proposed in this paper is significantly better than that of the Adjacency Clustering (AC) method. Furthermore, as shown in Table 7, for the RI, ARI, and NMI indicators, the NPFC method presented in this paper demonstrates a decisive advantage over the AC method, with higher scores in all three metrics. Overall, the clustering effectiveness of the NPFC algorithm is superior to that of the Adjacency Clustering method.

The effectiveness of clustering mixed defect pattern wafer maps using the iWMM method is shown in Fig. 11. From the visual results in Fig. 11, this study also demonstrates good clustering performance.

Fig. 11
figure 11

iWMM result of wafer map

3.3.3 Algorithm Complexity Analysis

For two-dimensional data, the basic time complexity of the traditional DBSCAN algorithm is \(O\left({n}^{2}\right) [29]\). The Adjacency Clustering (AC) algorithm, which is based on graph theory, also has a time complexity of\(O({n}^{2})\). This is because it involves calculations for both the separation function and the deviation function for any two different defect points [24]. In contrast, the NPFC algorithm only requires repeating the process of merging and removing clusters four times when determining the number of clusters. Therefore, the time complexity of the Neighborhood Path Filtering algorithm is\(O(4n)\), that is,\(O(n)\).

The space complexity of the traditional DBSCAN algorithm is \(O({n}^{2})\). The Adjacency Clustering (AC) algorithm requires storing the relationships and weights between defect points, as well as a binary label for each defect point, resulting in a space complexity of \(O({n}^{2})\). In contrast, the Neighborhood Path Filtering algorithm only needs to store the relevant metric data for four different cluster numbers during the selection process. Consequently, the space complexity of the Neighborhood Path Filtering algorithm is \(O(4n)\), that is, \(O(n)\). Based on the complexity analysis, the proposed method in this paper is more efficient than both the traditional DBSCAN algorithm and the AC clustering algorithm.

4 Conclusion

This chapter proposes a Neighborhood Path Filtering Clustering (NPFC) algorithm based on clustering concepts for wafer maps with mixed defect patterns. The algorithm extends clustering within the neighborhood range of defect points on the wafer map. It determines the number of clusters by using a comprehensive metric combining the silhouette coefficient and the Mahalanobis distance. Additionally, it uses indicators designed from compactness, distances between clusters, and the radius relationships between clusters and the center of the wafer map to decide whether clusters should be merged. The NPFC algorithm does not require the confirmation of defect points as core points or the assessment of whether points are density-reachable or density-connected to core points, as in the DBSCAN algorithm. Despite this, it can still effectively identify clusters of arbitrary shapes. The paper compares the filtering effect of 12 mixed-type wafer maps that is better than median filtering. This paper compares the proposed method with existing methods in terms of filtering performance, clustering performance, and infinite Gaussian mixture warping effects. Based on the visual results and the quantitative comparison of various metrics, the results consistently indicate that the proposed method is superior.

The data that support the findings of this study are available from the corresponding author upon reasonable request.