Git Product home page Git Product logo

transfg's People

Contributors

beckschen avatar tacju avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

transfg's Issues

About Stanford dogs accuracy

Hi, could you release your training settings for the Stanford dogs dataset? I set the lr to 3e-3 and did not change other settings, however the model is underfitting. I only get 1.7% accuracy after 200k steps.

visualization code

Thanks for your wonderful work! I meet some problems when I try to visualize the part attention patch as your paper showed. So could you provide the visualization code. Thanks so much!
image

Failed to run on multi GPUs

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py xxx

The codes above cannot run on multiple GPUs.

It is weird that all the trainning are running on the first GPU. Then if the batch size is increased, OOM error is reported.

Any one knows what's wrong?

About visualization

Hello, I appreciate your visualization work very much. Can you open source this part of the code?

dataset help

Can you provide the car dataset? How are the training and test sets divided? The official link is no longer working.

About train.py

I found you write scheduler.step() before optimizer.step() in line 267-268 in train.py, when I run it,i got a UserWarning UserWarning: Seems like optimizer.step() has been overridden after learning rate scheduler initialization. Please, make sure to call optimizer.step() before lr_scheduler.step().,so is your code right on that place ?

memory error

Dear author, why I always face memory error when I start training. My memory is 16G. Is there any problem in dataset pipline?

NAbirds dataset

Hello, can you provide the data set of NAbirds? Official website can't download it. Thank you very much

About running on one GPU

I have only one GPU. I have set local_rank=-1 and assigned os.environ['CUDA_VISIBLE_DEVICES']='0',but failed to run the code. What do i need to revise to successfully run on one GPU?

About apex

Hello, thanks for your nice work!

when I reproduce your work, I encoutered a challenging problem below:

ImportErrorImportErrorImportError: : : cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

can you give me some idea?

How can I train my dataset?

Hello, thanks for your nice work!
My dataset have 10 classes,and each category is in a different folder.How can I train my dataset?

[minor error]The linear layer self.out of Attention in file modeling

May be there is something wrong with the first argument of the linear layer self.out (line 78: self.out = Linear(config.hidden_size, config.hidden_size)) of function Attention in the modeling file and should be changed to self.out = Linear(self.all_head_size, config.hidden_size). Because in some cases, config.hidden_size and self.all_head_size might not be equal.

Normalization parameters

Thank you for your excellent work! I am just wondering why all the datasets use the same normalization parameters (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) and they are the same as ImageNet?

Pip won't find requirements

I'm trying to setup the environment but pip won't find the requirements.
In a virtual environment with python 3.7:
$ pip install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement torch==1.5.1 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0)
ERROR: No matching distribution found for torch==1.5.1

About CUB ACC

I can not reproduce this code ,and the acc on CUB dataset only 91.1% with overlap. Did I miss something important?

Trained Weights

Could you upload the weights for trained models? I'm personally looking for the weights of the trained nabirds model.
Thanks!

How to pretrain TransFG on my own dataset

I'm very interested in your outstanding work, and I also have a question:

I want to pretrain TransFG on my own dataset, so could you please provide code about pre-training? I'm looking forward to your reply.

Accuracy on the CAR dataset

To the best of my own ability, I can only achieve up to 90% accuracy on the car dataset. Is there something wrong with me? I would like to ask if the parameters of the training car dataset are set the same as the cub dataset?

patch embeddings always 0?

I was reading the paper and checking the code and I can't see when you add value to the patch embbedings, I was debugging the code and in this part I only see you create a zero tensor and after on forward you only add this tensor.
In which moment you give a value to the patch embeddings?

line 157 https://github.com/TACJu/TransFG/blob/master/models/modeling.py#L157
self.position_embeddings = nn.Parameter(torch.zeros(1, n_patches+1, config.hidden_size))

Line 173 embeddings = x + self.position_embeddings

About PSM module

I attempted to verify the functionality of the PSM module and noticed that incorporating only the PSM module into ViT didn't seem to enhance performance. Could you please let me know if there is a transfg model weights trained on CUB?

Batch_size is 16 or 64?

Hi @TACJu, I notice you apply DDP with 4 GPUs in train.py. Therefore, if the batch_size in args is set to 16, then the overall batch_size will be 16x4=64.
However, in your paper, you say that the batch_size is 16. I also try batch_size 16x4 on Tesla V100, but OOM will be raised, so I wonder batch_size is 16 means 16 or 64? thanks!

About valid accuracy

I used different data sets, but the accuracy was always a little over 0.2, anyone know how to fix that?

About the training details

First of all, thank you for your work, which has benefited me a lot.

After several attempts, only 91% accuracy can be obtained on the cub. Can you provide model parameters and training details with 91.7% accuracy.Thank you very much if you reply.

ImportError: cannot import name 'Di1stributedDataParallel' from 'apex.parallel'

The paltform is window10, and there is only one GPU=RTX GeForce 3080
pytorch version==1.7.1,
tensorboard version ==1.15.0,
apex version==0.1
Q: Every time when i tried to train with the command "train.py --dataset CUB_200_2011 --split overlap --num_steps 10000 --fp16 --name sample_run", it always gave me a error feedback, like "ImportError: cannot import name 'Di1stributedDataParallel' from 'apex.parallel' "
Any guy know what is the reason causing this error? If you know some likely reasons, please tell me, thank you so much. God bless you!

About train.py

Every time I start training, there are always mistakes like this

usage: train.py [-h] --name NAME [--dataset {CUB_200_2011,car,dog,nabirds}] [--data_root DATA_ROOT] [--model_type {ViT-B_16,ViT-B_32,ViT-L_16,ViT-L_32,ViT-H_14}] [--pretrained_dir PRETRAINED_DIR] [--pretrained_model PRETRAINED_MODEL] [--output_dir OUTPUT_DIR] [--img_size IMG_SIZE] [--train_batch_size TRAIN_BATCH_SIZE] [--eval_batch_size EVAL_BATCH_SIZE] [--eval_every EVAL_EVERY] [--learning_rate LEARNING_RATE] [--weight_decay WEIGHT_DECAY] [--num_steps NUM_STEPS] [--decay_type {cosine,linear}] [--warmup_steps WARMUP_STEPS] [--max_grad_norm MAX_GRAD_NORM] [--local_rank LOCAL_RANK] [--seed SEED] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--fp16] [--fp16_opt_level FP16_OPT_LEVEL] [--loss_scale LOSS_SCALE] [--smoothing_value SMOOTHING_VALUE] [--split SPLIT] [--slide_step SLIDE_STEP] train.py: error: the following arguments are required: --name

I don't know how to solve it. If you know , please tell me. Thank you!

About trainint details

Hi, thanks for your great work. I want to know how many epochs/steps have you trained on those benchmarks. Thanks again!

about Part Selection Module

Thanks for your great work!
I have a question about selecting tokens with maximum activation in Part Selection Module.
In Eq.6, is a_l^i the attention-score calculated separately for the class token and other N tokens? So the dimension of a_l^i is N right?
image

About CUB-200-2011's accuracy

Thanks for your work and sharing your codes! However, when I reproduce your code on 4 Tesla GPU V-100 entirely following the instruction with non-overleap, I just got 90.8% accuracy. Could you analyze the problem about this?

How to fix the RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx

Thanks for your work and sharing your codes!

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port 89898 train.py --dataset CUB_200_2011 --split overlap --num_steps 10000 --fp16 --name sample_run

When I train on two gpus(1080TI *2), it is current.
the configuration is CUDA 11.1, pythorch 1.8.1, torchvision 0.9.1, python 3.8.3

Warning:  multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
Training (X / X Steps) (loss=X.X):   0%|| 0/749 [00:00<?, ?it/s]Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
Training (X / X Steps) (loss=X.X):   0%|| 0/749 [00:42<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 400, in <module>
    main()
  File "train.py", line 397, in main
    train(args, model)
  File "train.py", line 226, in train
    loss, logits = model(x, y)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/apex-0.1-py3.8.egg/apex/parallel/distributed.py", line 560, in forward
    result = self.module(*inputs, **kwargs)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/apex-0.1-py3.8.egg/apex/amp/_initialize.py", line 196, in new_fwd
    output = old_fwd(*applier(args, input_caster),
  File "/home/lirunze/xh/project/git/trans-fg_-i2-t/models/modeling.py", line 305, in forward
    part_logits = self.part_head(part_tokens[:, 0])
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`

Could you analyze the problem about this? Thank you!

train from scratch

Did anyone try to train from scratch, without any pertained weight?
I want to make the model adapt to my project, with 224 * 224 input, 8 * 8 patches and 6 sliding size, which means there is no pertained weight for me. I found it very hard to converge, after 10000 steps the train acc is still around 0.6
Than I tried the original training configurations, except loading pertained weight, the same issue
Did I miss anything?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.