antixk / pytorch-model-compare Goto Github PK

Compare neural networks by their feature similarity

License: MIT License

Python 100.00%

pytorch deep-learning neural-networks cka transformers imagenet feature-extraction pip torch-cka

pytorch-model-compare's Introduction

PyTorch Model Compare

A tiny package to compare two neural networks in PyTorch. There are many ways to compare two neural networks, but one robust and scalable way is using the Centered Kernel Alignment (CKA) metric, where the features of the networks are compared.

Centered Kernel Alignment

Centered Kernel Alignment (CKA) is a representation similarity metric that is widely used for understanding the representations learned by neural networks. Specifically, CKA takes two feature maps / representations X and Y as input and computes their normalized similarity (in terms of the Hilbert-Schmidt Independence Criterion (HSIC)) as

Where K and L are similarity matrices of X and Y respectively. However, the above formula is not scalable against deep architectures and large datasets. Therefore, a minibatch version can be constructed that uses an unbiased estimator of the HSIC as

The above form of CKA is from the 2021 ICLR paper by Nguyen T., Raghu M, Kornblith S.

Getting Started

Installation

pip install torch_cka

Usage

from torch_cka import CKA
model1 = resnet18(pretrained=True)  # Or any neural network of your choice
model2 = resnet34(pretrained=True)

dataloader = DataLoader(your_dataset, 
                        batch_size=batch_size, # according to your device memory
                        shuffle=False)  # Don't forget to seed your dataloader

cka = CKA(model1, model2,
          model1_name="ResNet18",   # good idea to provide names to avoid confusion
          model2_name="ResNet34",   
          model1_layers=layer_names_resnet18, # List of layers to extract features from
          model2_layers=layer_names_resnet34, # extracts all layer features by default
          device='cuda')

cka.compare(dataloader) # secondary dataloader is optional

results = cka.export()  # returns a dict that contains model names, layer names
                        # and the CKA matrix

Examples

torch_cka can be used with any pytorch model (subclass of nn.Module) and can be used with pretrained models available from popular sources like torchHub, timm, huggingface etc. Some examples of where this package can come in handy are illustrated below.

Comparing the effect of Depth

A simple experiment is to analyse the features learned by two architectures of the same family - ResNets but of different depths. Taking two ResNets - ResNet18 and ResNet34 - pre-trained on the Imagenet dataset, we can analyse how they produce their features on, say CIFAR10 for simplicity. This comparison is shown as a heatmap below.

We see high degree of similarity between the two models in lower layers as they both learn similar representations from the data. However at higher layers, the similarity reduces as the deeper model (ResNet34) learn higher order features which the is elusive to the shallower model (ResNet18). Yet, they do indeed have certain similarity in their last fc layer which acts as the feature classifier.

Comparing Two Similar Architectures

Another way of using CKA is in ablation studies. We can go further than those ablation studies that only focus on resultant performance and employ CKA to study the internal representations. Case in point - ResNet50 and WideResNet50 (k=2). WideResNet50 has the same architecture as ResNet50 except having wider residual bottleneck layers (by a factor of 2 in this case).

We clearly notice that the learned features are indeed different after the first few layers. The width has a more pronounced effect in deeper layers as compared to the earlier layers as both networks seem to learn similar features in the initial layers.

As a bonus, here is a comparison between ViT and the latest SOTA model Swin Transformer pretrained on ImageNet22k.

Comparing quite different architectures

CNNs have been analysed a lot over the past decade since AlexNet. We somewhat know what sort of features they learn across their layers (through visualizations) and we have put them to good use. One interesting approach is to compare these understandable features with newer models that don't permit easy visualizations (like recent vision transformer architectures) and study them. This has indeed been a hot research topic (see Raghu et.al 2021).

Comparing Datasets

Yet another application is to compare two datasets - preferably two versions of the data. This is especially useful in production where data drift is a known issue. If you have an updated version of a dataset, you can study how your model will perform on it by comparing the representations of the datasets. This can be more telling about actual performance than simply comparing the datasets directly.

This can also be quite useful in studying the performance of a model on downstream tasks and fine-tuning. For instance, if the CKA score is high for some features on different datasets, then those can be frozen during fine-tuning. As an example, the following figure compares the features of a pretrained Resnet50 on the Imagenet test data and the VOC dataset. Clearly, the pretrained features have little correlation with the VOC dataset. Therefore, we have to resort to fine-tuning to get at least satisfactory results.

Tips

If your model is large (lots of layers or large feature maps), try to extract from select layers. This is to avoid out of memory issues.
If you still want to compare the entire feature map, you can run it multiple times with few layers at each iteration and export your data using cka.export(). The exported data can then be concatenated to produce the full CKA matrix.
Give proper model names to avoid confusion when interpreting the results. The code automatically extracts the model name for you by default, but it is good practice to label the models according to your use case.
When providing your dataloader(s) to the compare() function, it is important that they are seeded properly for reproducibility.
When comparing datasets, be sure to set drop_last=True when building the dataloader. This resolves shape mismatch issues - especially in differently sized datasets.

Citation

If you use this repo in your project or research, please cite as -

@software{subramanian2021torch_cka,
    author={Anand Subramanian},
    title={torch_cka},
    url={https://github.com/AntixK/PyTorch-Model-Compare},
    year={2021}
}

pytorch-model-compare's People

Contributors

Stargazers

Watchers

pytorch-model-compare's Issues

Comparision between ResNet50 and ViT gives error

!pip install transformers
import torch
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.nn.functional as F
from transformers import ViTFeatureExtractor, ViTModel, ViTConfig, AutoConfig

Modify the model - ResNet

model_Res = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)

Remove the last layer of the model Res

layers_Res = list(model_Res.children())
model_Res = nn.Sequential(*layers_Res[:-1])

Set the top layers to be not trainable

count = 0
for child in model_Res.children():
count += 1
if count < 8:
for param in child.parameters():
param.requires_grad = False

Modify the model - ViT model

model_trans = ViTModel.from_pretrained('google/vit-base-patch16-224-in21k')
count = 0
for child in model_trans.children():
count += 1
if count >= 4:
for param in child.parameters():
param.requires_grad = False

layers_trans = list(model_trans.children())
model_trans_top = nn.Sequential(*layers_trans[:-2])

model1 = model_Res
model2 = model_trans_top

cka = CKA(model1, model2,
model1_name="ResNet50", model2_name="ViT",
device='cuda')

cka.compare(dataloader)

cka.plot_results(save_path="/content/drive/MyDrive/resnet-ViTcompare.png")

i got this error ValueError: Input image size (3232) doesn't match model (224224).

Bug with num_batches?

It looks like there's a subtle bug on this line:
https://github.com/AntixK/PyTorch-Model-Compare/blob/main/torch_cka/cka.py#L156

num_batches = min(len(dataloader1), len(dataloader1))

len(dataloader1) is in there twice, I assume it's supposed to be

num_batches = min(len(dataloader1), len(dataloader2))

AssertionError: Input image size (3232) doesn't match model (224224).

Dear Author：
This error occurred when I tried to run resnet_vit_compare.py：
AssertionError: Input image size (3232) doesn't match model (224224).
Where do I need to modify it?

Better way to implement

I am trying to implement this code to check CKA similarity of ResNet50, MobileNetV3, EfficientNet, and many others. How can we implement it without running through the error?

Works fine with the whole model but raise "NANs" on selected layers.

When I was trying to compare the same model trained on different datasets, I encountered a weird problem:

It works fine when I compare all layers:
cka = CKA(model1, model2, device='cuda', model1_name='model1', model2_name='model2')

But, when I try to compare a selected subset of layers:
cka = CKA(model1, model2, device='cuda', model1_name='model1', model2_name='model2', model1_layers=list(model1.state_dict().keys())[:5], model2_layers=list(model2.state_dict().keys())[:5])
It raises:

HSIC computation resulted in NANs

Do you have any idea how to fix this? Thank you very much.

getting spurious "HSIC computation resulted in NANs"

Thanks for this great little module! I was able to adapt the code to deal with models suitable for speech recognition (mostly transformers and conformers) and I'm learning a lot from the CKA outputs.

One problem I face is that for some models, some layers hit this assert after a certain number of batches. Basically, if I try to pass 300 batches of 32 through the model, I end up with NaN exception around 150 or so. It doesn't seem related to the data because I shuffle the data and get the same exception after the same number of batches.

I guess this is a numerical stability problem perhaps. Is there some assumptions about the range of the layer features and outputs?

Comparing Models for Different Tasks (Image Classification v.s. Object Detection)?

Thanks for sharing the code. I see that the models in the examples in this repo are all for classification. I was wondering if we can compare two models that do different tasks such as one for classification (Reset) and the other for object detection (YOLO)?

Thanks!

"HSIC computation resulted in NANs"

did anyone solve the issue with "HSIC computation resulted in NANs"? Any suggestions how to address it would be helpful.

Thank you.

The X means what in the formulation of HSIC

I notice that in the formulation of HSIC , the K are calculated with X . And X denote a matrix of activateions of p1 neurons for n examples . But in the code , it seems like the feature map after this layer . I don't know if i am wrong , could you help me ?
Take an example , use CIFAR-10 as dataset , which images are 3232 . And the first layer is conv2(in_channels=3,out_channels=6,kernel_size=5) , the feature map after this layer could be 62828 , and the neurons could be 65*5 . Which should i take for calculate CKA?
I read the paper and i think it should be the representation , do you think what i said is right?
But it some other project of CKA on githubs , they use the weight of neurons which confuses me . If i want to use traditional CKA to calculate the CKA similarities , where can i find the needed feature map?

AssertionError: HSIC computation resulted in NANs

I tried comparing many EfficientNet to other models (and its variants), but all I got is this error: AssertionError: HSIC computation resulted in NANs.
One example:

python3 eff_b0b2_compare.py

eff_b0b2_compare.py:

import torch
from torchvision.models import efficientnet_b0, efficientnet_b2 # edit
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
import numpy as np
import random
from torch_cka import CKA

def seed_worker(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    np.random.seed(worker_seed)
    random.seed(worker_seed)

g = torch.Generator()
g.manual_seed(0)
np.random.seed(0)
random.seed(0)

model1_name, model2_name = 'efficientnet_b0', 'efficientnet_b2' # edit
model1 = efficientnet_b0(pretrained=True) # edit
model2 = efficientnet_b2(pretrained=True) # edit

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))])

batch_size = 16 # 256

dataset = CIFAR10(root='../data/',
                  train=False,
                  download=True,
                  transform=transform)

dataloader = DataLoader(dataset,
                        batch_size=batch_size,
                        shuffle=False,
                        worker_init_fn=seed_worker,
                        generator=g,)

cka = CKA(model1, model2,
        model1_name=model1_name, model2_name=model2_name,
        device='cuda')

cka.compare(dataloader)

cka.plot_results(save_path="../exps/{}.jpg".format(model1_name, model2_name))

/home/brcao/.local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/brcao/.local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=EfficientNet_B0_Weights.IMAGENET1K_V1`. You can also use `weights=EfficientNet_B0_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
/home/brcao/.local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=EfficientNet_B2_Weights.IMAGENET1K_V1`. You can also use `weights=EfficientNet_B2_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Files already downloaded and verified
/home/brcao/.local/lib/python3.8/site-packages/torch_cka/cka.py:62: UserWarning: Model 1 seems to have a lot of layers. Consider giving a list of layers whose features you are concerned with through the 'model1_layers' parameter. Your CPU/GPU will thank you :)
  warn("Model 1 seems to have a lot of layers. " \
/home/brcao/.local/lib/python3.8/site-packages/torch_cka/cka.py:69: UserWarning: Model 2 seems to have a lot of layers. Consider giving a list of layers whose features you are concerned with through the 'model2_layers' parameter. Your CPU/GPU will thank you :)
  warn("Model 2 seems to have a lot of layers. " \
/home/brcao/.local/lib/python3.8/site-packages/torch_cka/cka.py:145: UserWarning: Dataloader for Model 2 is not given. Using the same dataloader for both models.
  warn("Dataloader for Model 2 is not given. Using the same dataloader for both models.")
| Comparing features |:  28                                                                                    | Comparing features |:  32%|▎| 13                 | Comparing features |:  35%|▎| 14                                                                                                     | Comparing features |:  38%|▍| 15                                  | Comparing features |: 100%|██| 40/40 [3:43:19<00:00, 335.00s/it]^[[B^[[A^[[B^[[A^[[B
Traceback (most recent call last):
  File "eff_b0b2_compare.py", line 45, in <module>
    cka.compare(dataloader)
  File "/home/brcao/.local/lib/python3.8/site-packages/torch_cka/cka.py", line 183, in compare
    assert not torch.isnan(self.hsic_matrix).any(), "HSIC computation resulted in NANs"
AssertionError: HSIC computation resulted in NANs

Any help would be great. Thanks!

Example to compare datasets

Can you add an example to compare two datasets for a single model.

ValueError: Input image size (3232) doesn't match model (224224).

I tried to compare Resnet50 vs ViT but got this error ValueError: Input image size (3232) doesn't match model (224224).
The code I used is

model1 = resnet50(pretrained=False)
layers_Res1 = list(model1.children())
model1 = nn.Sequential(*layers_Res1[:-1])

#model2 = resnet34(pretrained=True)
config = ViTConfig()
model2 = ViTModel(config)

Create a new model with the last two layers removed

layers_trans = list(model2.children())
model2 = nn.Sequential(*layers_trans[:-2])

cka = CKA(model1, model2,
model1_name="ResNet50", model2_name="ViT-B")
#,device='cuda')