faris-k / fastsiam-wafers Goto Github PK

View Code? Open in Web Editor NEW

6.0 1.0 0.0 145.99 MB

Self-Supervised Representation Learning of Wafer Maps with FastSiam

License: MIT License

Python 1.06% Jupyter Notebook 98.94%

computer-vision metric-learning python pytorch-lightning self-supervised-learning semiconductor

fastsiam-wafers's Introduction

Hi there 👋🤖🎨

Check this out...

The first image was created using text2img with Stable Diffusion v1.4. Final result was created using a mix of img2img loopback from the initial image and inpainting to fix details and create new backgrounds. That was done on SD v1.5, specifically Inkpunk Diffusion to get the style. All images were made using Automatic1111's Stable Diffusion web UI.

fastsiam-wafers's People

Contributors

Stargazers

Watchers

fastsiam-wafers's Issues

Use Albumentations instead of Torchvision Transforms

Minor enhancement, but since speed is a concern with self-supervised pretraining, use Albumentations instead of torchvision and lightly's transforms. Throughput is reportedly much higher: https://github.com/albumentations-team/albumentations#benchmarking-results

Segmentation fault, CUDNN_STATUS_EXECUTION_FAILED, and launch time out

Consistent crashes occur with any nonzero value of num_workers on Windows systems. Multiprocessing on Linux doesn't seem to have this issue, but the added abstraction of WSL2 leads to low GPU utilization and thus lower throughput.

After painful trial and error, here are a few observations.

Even running lightly's imagenette benchmark leads to crashes
When benchmark files are mounted on the same drive as the conda / cuda installations, there seems to be no issue running lightly's imagenette benchmark. This may explain why wafer map benchmarks never crashed on the TI system...
Moving this repo's benchmarks to the same drive as the conda / cuda installation still leads to a crash, but it doesn't seem to be related to CUDA. It may be related to data loading, since the script consistently crashes before the start of the first training epoch. Could it be a segmentation fault?

Possible course of action:

Total revamp of data loading. Specifically, the custom WaferMapDataset needs to be looked at. Data is loaded from a Pandas dataframe, specifically two Pandas series of numpy arrays (more or less a ragged array structure). The wafer map Series is converted to a list of different-sized tensors. PyTorch's nested tensors may be a better data structure for this. These are experimental and may change with the upcoming PyTorch 2.0 release though.
Custom transforms may need to overhauled. The only non-torch library used is numpy, which shouldn't cause issues? But this may need investigation.
Move everything to the same drive bro 😒🤦

Fix benchmarking scripts to correctly pass `knn_k` to each model (quick fix is just to unpack any additional arguments). Re-run all benchmarking scripts 🙃

tensorboard embedding projector not working

Also include torchmetric's ConfusionMatrix. Create a `test_step` PyTorch-Lightning-style to handle this at the end. Logging the output of this can be a bit tricky.

Point people in the new direction

Anyone looking for FastSiam should head to Lightly, which will soon have its own FastSiam implementation: lightly-ai/lightly#1130 (comment)
For SSL of wafer maps, point to updated repository at https://github.com/faris-k/self-supervised-wafermaps

Migrate to a new repo

This repo was meant simply for a class project and FastSiam (hence the name). It has expanded far beyond its scope 🙂

Create a new repo focused on SSL applied to wafer maps
Either deprecate this repo or add something in the README to point the reader to the correct repo
PMSN implementation... repo for that?

Move all models to one location (a zoo 🦏🦍🦒🐘🐅)

Currently, the benchmarking scripts are entirely self-contained, which is fine for running the benchmarks themselves, but this makes inference on trained models messy. To avoid spaghetti code 🍝,

Move all kNN benchmarking models (i.e. KNNBenchmarkModule and child classes to a knn.py in models/)
Maybe create a SSL model zoo in addition to the "kNN zoo" for those instances where I don't need kNN evaluation during training

This will make training and feature extraction much more streamlined, i.e.

from models.knn import MAE

# Create a model and load a checkpoint
model = MAE()
ckpt_path = "bruh.ckpt"
model.load_state_dict(torch.load(ckpt_path)["state_dict"])

# Run inference (feature extraction)
trainer = pl.Trainer()
preds = trainer.predict(model, inference_dataloader)
preds = torch.cat(preds, dim=0)

Die per wafer transform to simulate fixed root cause on different DPW products

Novel idea: learn a DPW-invariant representation of a failure shading by including a DPW transform. Consider the wafer in the middle. Could I transform it to the lower DPW variant (left) and higher DPW variant (right)?

Working method of going from right to left: take the central relative coordinate of each die and map it to a smaller matrix. Toy example with a matrix of just 1's and 2's would look something like this:

For wafer maps, simple implementation below:

def dpw_transform(original_matrix, scale):
    # Calculate the new dimensions of the matrix after scaling down
    h, w = original_matrix.shape
    new_h = int(h * scale)
    new_w = int(w * scale)
    new_dim = (new_h, new_w)

    # Find the indices of the passing elements in the original matrix
    passing_indices = np.argwhere(original_matrix == 128)

    # Find the indices of the failing elements in the original matrix
    failing_indices = np.argwhere(original_matrix == 255)

    # Calculate the relative central coordinate of the passing and failing elements in the original matrix
    pass_coords = (passing_indices + 0.5) / original_matrix.shape
    fail_coords = (failing_indices + 0.5) / original_matrix.shape

    # Calculate the central coordinates of the passing and failing elements in the new matrix
    new_pass_coords = (pass_coords * new_dim).astype(int)
    new_fail_coords = (fail_coords * new_dim).astype(int)

    # Create the (new_h, new_w) matrix
    new_matrix = np.zeros(new_dim, dtype=int)

    # Assign the passing and failing elements in the new matrix
    new_matrix[new_pass_coords[:, 0], new_pass_coords[:, 1]] = 128
    new_matrix[new_fail_coords[:, 0], new_fail_coords[:, 1]] = 255

    return new_matrix

Example of a wafer resized by a scale of 0.75 (from shape (41, 33) to (30, 24))

As expected, more yield loss on the wafer with higher die size / lower DPW.

Enhancement Idea

The approach outlined above doesn't work from left to right in the Yield Model image (turn a wafer into a higher DPW version of itself) since taking the central die coordinates of the original wafer won't always map to every die on the higher DPW wafer map. Case in point:

How do we "sparsify" defects but still keep shadings like scratches and rings "connected" on the higher DPW part?...

knn_k isn't being used anywhere 😢😿

Benchmarks were all run using knn_k=200, the default value in KNNBenchmarkModule. No wonder accuracy/F1 scores never went higher than 0.5-0.6 ☹️. Note that for kNN classifiers, choosing a values of $k$ this high could make it impossible to predict minority classes, which I highly suspect is what's going on here.

Tasks: