The fastssl from kumarkrishna

List of experiments for alpha-ReG

Spectral loss implementation: look at target eigenspectrum implementation
Compare eigenspectrum and accuracy to BarlowTwins/VIC-REG
Can we avoid higher dimension projector? -- robustness to proj_dim hyperparam (benchmark is SimCLR proj_dim)

Merge precache and linear classification into one script

Merge the precaching and linear classification steps to be run from the same python file. Steps would potentially include:

Call linear classification steps after precaching in train_model.py file
Update train_model_widthVary.py accordingly
Merge configs cc_classifier and cc_precache

Installing ffcv environment

Installing a working environment with ffcv (FFCV-SSL) has been painful, due to changing version of dependencies etc. So, keeping track of the instructions that worked.

echo "Creating new conda environment ======"
conda create --prefix $SCRATCH/conda_envs/ffcv_ssl_fastssl -y python=3.9 conda conda-libmamba-solver -c conda-forge
conda activate $SCRATCH/conda_envs/ffcv_ssl_fastssl

echo "Setting up conda env variables and config options =========="
export CONDA_EXE="$(hash -r; which conda)"
conda config --set solver libmamba
conda config --set channel_priority flexible

echo "Installing torch and other important packages+dependencies ============"
conda install -y cupy pkg-config compilers libjpeg-turbo opencv=4.7.0 pytorch=2.1.2 torchvision=0.16.2 pytorch-cuda=12.1 torchmetrics numba=0.56.2 terminaltables matplotlib scikit-learn pandas assertpy pytz -c pytorch -c nvidia -c conda-forge

cd <FFCV-SSL folder>
pip install -e .
pip install wandb tqdm

Notes: Not sure if we need numba=0.56.2 but it was required to install original ffcv library

Add pre-caching for linear eval

Save features to SLURM_TMPDIR using ffcv dataloader
Use simple dataloader to train linear classifier from pre-cached features
Test accuracy numbers wrt no precaching

Add support for SimCLR training

Incorporate multiple patches in ffcv ssl dataloader

Reinstall ffcv using FFCV-SSL codebase and incorporate functionality to generate multiple patches

Wandb integrate and BYOL/VIC-Reg implementation

Wandb sweep integrate
BYOL debug
Add VIC-Reg implementation

Aim to do around Feb 16

Add support for segmentation task

Linear head from final layer representations to segmentation map
Highway network to incorporate hierarchical features to predict segmentation map
Check BarlowTwins, VICReg and VICRegL for both configs -- use Imagenet-pretrained models from repo and train linear readout for segmentation

Add support for Imagenet training

Add FFCV dataloader for Imagenet (with support for multiple augmentations)
Test imagenet dataloaders for data shape/size
Test runtime for 1 epoch with ResNet18 model and Adam optimizer
Run hparam sweep

Include shrinkage for eigenspectrum computation

Incorporate code for Rao-Blackwellized Ledoit-Wolf shrinkaged estimator of the covariance
Add separate function for corrected_eigenspectrum that uses shrinkage on the covariance matrix
Check how eigenspectrums differ

Debug alpha tracking experiments/results

Inspect track_alpha branch to understand if code has any bugs
Ensure that model save/load operation doesn't affect computed eigenspectrum/alpha
alpha at end of training should be equal to alpha during eval

Have you tried to speed MoCo v2 by FFCV? How does it go?

Title. As I saw that you speed SSL by FFCV, e.g. https://github.com/kumarkrishna/fastssl/blob/8f96d16d2a/write_ffcv_double_datasets.py

Have you tried the same thing on MoCo v2? Since the issue mentioned in another SSL speeding library implies that adapting FFCV to MoCo v2 is non-trivial.

Add support for projector finetuning

Currently the projector is thrown away after BarlowTwins pretraining -- check proper arguments are passed to the function when reloading network (under if conditions of algorithm 'linear') and while doing forward pass!

Add MAE implementation

Incorporate Imagenet changes from cluster_update to main

Remember imangenet changes that were pushed to cluster_update branch on Dec 16 5:30 pm EST
Test on imagenet and merge with main

Enable varying base width of Resnet18 models for running sweeps

Add ResNet18 model definition in backbone file
Add base model width parameter to control the base width
Infer base width from model name
[Optional] Add argument for model base width
Incorporate new parameter in barlowtwins and linear models (Only for testing new argument option for base width)
Change log directory name to add width in folder structure
Run coarse BarlowTwins sweep to check good hyperparams
Run sweep across different widths

New todos:

Remove the base model width argument and simply infer width of ResNet18 from model name
Add train accuracy along with test accuracy at each epoch
Save precached features to logdir
Re-run sweep over widths
Incorporate wandb to log results to server
Add width variation for ResNet50

FFCV debug

FFCV is integrated and trains reasonably well to ~74% accuracy. The same config with PyTorch data loaders gives a ~84% model.

Review documentation to ensure there's no implicit transforms
Implement FFCV compatible custom transformer.

Ranges used in function fit_powerlaw() and stringer_get_powerlaw()

Dears,

I have been recently following your work "Assessing Representation Quality in Self-Supervised Learning by measuring eigenspectrum decay" and I was trying to apply to compute alpha in order to assess the representation quality of Self-Supervised Model.

I computed the covariance matrix of the model features then I applied torch.linalg.eigvals() to calculate the eigenspectrum using the following code

cov = torch.zeros(768, 768)
N = len(val_data_loader)
for i,x in enumerate(val_data_loader):
    inputs,batchLabels = x
    features = backbone.features(inputs.to('cuda'))

    features = features.detach().cpu()
    cov += torch.mm(features.T, features)/N
eigenspectrum = torch.linalg.eigvals(cov).detach().cpu().numpy()

Now to calculate alpha from functions fit_powerlaw() and stringer_get_powerlaw(), from your code , I can see that both functions take eigenspectrum and other argument that implies the range.

My question is: What does this range imply? and what is the suitable range I should use to calculate alpha properly to assess the model's features' quality?

Thanks!
Nader

kumarkrishna / fastssl Goto Github PK

fastssl's People

Contributors

Stargazers

Watchers

Forkers

fastssl's Issues

Recommend Projects

Recommend Topics

Recommend Org