Case Study: Using Data Science to Detect Defects In Semiconductors

October 1, 2015 Anirudh Kondaveeti

sfeatured-semi-waferData science is being used to help detect defects in many industries, including manufacturing. This post explains how data science was recently applied to mechanical and materials engineering in semiconductor manufacturing through a real-world case study and a project we recently completed. Ultimately, we used defect data, from big data sets in manufacturing, to identify a series of defect patterns and improve both yield and profitability.

As we know from lean principles, automation and efficiency are key parts of any manufacturing process, and semiconductor manufacturing processes are no different—they have many steps, and each run is characterized by thousands of measurements. Advances in data storage and data analytic capabilities have enabled us to track thousands of variables across the process. Below, we break down the analysis performed across every step, accomplishing de-noising, preprocessing, feature extraction, dimensionality reduction, outlier detection, and clustering.

The Goal and Process

Our work had a key, overarching goal—to identify die failure patterns on a wafer. The identified defect patterns are used by experts to tie back to the process variables for root cause analysis, saving a lot of money and time by early prevention in the manufacturing process. For background, a die is a small block of a semiconducting material on which a circuit is fabricated, and a wafer is substrate which consists of multiple dies. Dies on a wafer are tested for functional defects using a wafer prober. If a wafer consists of more than a threshold of dies that are not functional, then it is discarded.

The manufacturing process typically involves hundreds of process steps where wafers move from step to step in groups of approximately 25 identical wafers in a fabrication facility. The wafers are tested using a circuit probe to characterize the dies on the wafer as good or bad. Once the wafers are tested, a wafer bin map (WBM) is produced, and it provides information regarding the quality of a wafer. A WBM gives information regarding the specific test on the wafer for which a die has failed. We analyzed this data, from WBMs, to detect patterns of defective wafers using various data science techniques. Analyzing the WBMs manually is an extremely time consuming process. It also leads to a high margin error when a subsample of WBMs are examined manually because the sample is not a good representation of the entire lot.

The various modeling steps are shown in the block diagram below. The WBMs are first preprocessed and denoised using a median filtering technique. Features are extracted from the wafers and a matrix factorization technique is used for feature or dimensionality reduction. The resulting wafers, represented by the reduced feature space, are then clustered, and outliers are identified.

image04

De-Noising and Preprocessing

For simplicity, each die on the WBM is considered failed if it fails for at least one test, and otherwise it is considered functional. If a die on a wafer fails, it is marked as 1 or it is marked 0 if it passes. Since the objective of the work is to identify the pattern in the die failures and not the individual failures, it is important to reduce the noise and enhance the signal. To reduce the noise in the WBM, a median filtering technique is used, where the median value of die failures in a bin neighborhood is used to replace the central bin value. The image of a wafer before and after denoising is shown in Fig. 1, where ucs_x denotes the x axis and ucs_y the y axis in the specified coordinate system. The blue color in the figure denotes that the die on the wafer has failed, while red denotes that the die has passed.

Fig. 1: Wafer bin map images before (left) and after (right) denoising. Blue  color denotes that the die on the wafer has failed, while red denotes that  the die has passed.

Fig. 1: Wafer bin map images before (left) and after (right) denoising. Blue
color denotes that the die on the wafer has failed, while red denotes that
the die has passed.

Feature Extraction

Once the wafers are denoised, the next step in the analysis is to extract features from the wafer. To create the feature vector, the failure code of each die on the wafer is converted to an array, starting from top left position to bottom right position. Each wafer is composed of 1519 dies. As an example—if the die at position 1 on a wafer failed, position 2 passed, position 3 passed etc., then the feature vector for the particular wafer would look like {1, 0, 0, ….. 1} with a total of 1519 values in the vector. The 1519-dimensional binary feature vector, thus represents the position of die failures on the wafer.

Dimensionality Reduction

Since the feature vector depicting each wafer is of high dimensions (1519), we employed a feature reduction technique to reduce the number of dimensions. The advantages of feature reduction are threefold—they take into account the collinearity in the dies, reduce the computational complexity, and provide better means of visualization for the lower dimensional vector. In this step, the feature vectors from all the wafers are first arranged in the form of a matrix. Feature reduction techniques, like Non-negative Matrix Factorization (NMF), are used to reduce the dimensionality of the feature space. Alternative techniques for dimensionality reduction, such as Singular Value Decomposition (SVD) or Principal Component Analysis (PCA), could also have been used.

image07

Fig. 2: Visualizing 130 wafers in two dimensions after dimensionality reduction.

Outlier Detection and Clustering

Outliers are wafers which are different from the rest of the population. Since the outliers do not fall into any specific pattern, we first remove the outliers from the wafer data. To detect outlying wafers, each wafer is first represented in a lower dimensional space with K dimensions, using the NMF technique described above. In general, the number of dimensions for reduction depend on the energy captured by first K dimensions. However, for visualization purposes the data is reduced to a two-dimensional space. Fig. 2 gives the two-dimensional representation of a sample of 130 wafers after using the NMF technique.

As well, Fig. 2 shows that some wafers are very distant from the others in a two-dimensional space—there are clear outliers in the data. For each point in Fig. 2, the sum of Euclidean distances to every other point is obtained, which is used as a score for the outlier measure. If a point has a greater distance from all other points, the score will be higher, denoting that it is an outlier. The outlier scores for each of the 130 wafers are shown in Fig. 3. Wafers with high scores e.g. wafer no. 100 clearly has a greater distance from all other wafers, which is also shown in Fig. 2. The wafer bin maps for some of the outliers detected using this method are shown in Fig. 4. It can be clearly seen that these wafers have a lot of failing dies compared to other wafers, i.e., there are a lot of blue ones. However, this method of outlier detection doesn’t scale well if the number of wafers increases. For N wafers, the computational cost would be approximately O(N2), since the distance between each pair of wafers need to be calculated. As an alternative, outlier detection measures like Local Outlier Factor (LOF) can be used, where distance from K-nearest neighbors is used to calculate the density of a point and detect outliers.

Fig. 3: Outliers scores for 130 wafers using Euclidean distance

Fig. 3: Outliers scores for 130 wafers using Euclidean distance

Fig. 4: Wafers with high outlier scores. Blue color denotes that the die  on the wafer has failed, while red denotes that the die has passed.

Fig. 4: Wafers with high outlier scores. Blue color denotes that the die
on the wafer has failed, while red denotes that the die has passed.

Once the outlying wafers are identified, they are removed from the lot and remaining wafers are clustered to obtain wafer groups. The 1519-dimensional feature vectors, obtained from each wafer, are used for clustering the wafers. A k-means clustering algorithm, which is available in Apache MADlib (incubating), is used to group the wafers into 20 clusters with random initial seeding. The number of clusters (20) was chosen randomly, however a simulation based approach could be used to tune this parameter. Some of the clusters obtained by this method are shown in Fig. 5. Clearly, wafers with similar defect patterns, i.e., ones with failing dies in the center (denoted by blue color) were grouped into one cluster. As explained up front, this method helped identify the common defect patterns occurring in the manufacturing process.

Fig. 5: Wafers belong to a single cluster having same defect pattern (defect in the center). Blue color denotes that the die on the wafer has failed, while red denotes that the die has passed.

Fig. 5: Wafers belong to a single cluster having same defect pattern (defect in the center). Blue color denotes that the die on the wafer has failed, while red denotes that the die has passed.

Summarizing the Data Science Process

To recap, we have discussed a data science driven approach for defect identification in the semiconductor manufacturing processes. The defect patterns we identified from the wafers were then correlated back to the process parameters for root cause analysis. The resulting improvements ultimately impact profitability. In addition to this specific use case, data science techniques can be applied to analyze big data arising from process monitoring, yield enhancement, or identifying relationships among the various complex processes.

Learning More:

About the Author

Anirudh Kondaveeti

Anirudh Kondaveeti is a Principal Data Scientist and the lead for Security Data Science at Pivotal. Prior to joining Pivotal, he received his B.Tech from Indian Institute of Technology (IIT) Madras and Ph.D. from Arizona State University (ASU) specializing in the area of machine learning and spatio-temporal data mining. He has developed statistical models and machine learning algorithms to detect insider and external threats and "needle-in-hay-stack" anomalies in machine generated network data for leading industries.

Previous
Silicon Valley’s Secret Weapon: Pivotal Labs
Silicon Valley’s Secret Weapon: Pivotal Labs

Last week, Pivotal’s managing editor of the blog, Stacey Schneider, spent the week at three major events ex...

Next
What Is a Value Proposition and Why Do You Need One? (Part 1)
What Is a Value Proposition and Why Do You Need One? (Part 1)

This is the first in a six-post series that outlines a step-by-step process to define your value propositio...