Git Product home page Git Product logo

contrastiveseg's Issues

Function _dequeue_and_enqueue() in train_contrastive.py

Hi Zhou,

I have read your paper and found it was absolutely amazing. Great work!

However, I felt a bit loss when trying read your code. Could you please briefly explain how "_dequeue_and_enqueue" function works in train_contrastive.py?

In general,

  1. What does the function do and how it works from a high view?
  2. What's the difference between segment/pixel_queue and segment/pixel_queue_ptr

Thank you

Can't find the code where it is saving loss weights during training

Hi @tfzhou

I'm studying your repo for a couple of months from time to time. I'm implementing something different. However, I need to change the loss function and I'm not sure if it will be saved during the training. I couldn't find the lines where the loss function is saved. In init functions, the weights are loaded from configer, but I couldn't find where they are saved. At each epoch, the losses should be saved right? Do I think correctly?

about the memory bank

Hello. Thanks for your great work.
I run the script cityscapes/hrnet/run_h_48_d_4_contrast_mem.sh, but the it turns out that the result is worse than baseline.
There may be some bugs in the implement of memory, such as pixel_queue_ptr[lb] = (pixel_queue_ptr[lb] + 1) % self.memory_size in line 138, trainer_contrastive.py. I also find that the semi-hard example sampling is not implemented in your code.
I think these may be the reasons that I can't reproduce your results. Would your provdie an updated version of the implement of memory bank?
Looking forword to your reply!

network_stride & _dequeue_and_enqueue

Thank you very much for your excellent work, I have a few questions to consult you.

  1. What is the function of this parameter network_stride, and why do you perform this operation on the labels?

  2. I want to know what are the dimensions of the two parameters(keys, labels) passed by the function _dequeue_and_enqueue, and are they the same?

  3. Can your code handle segmentation tasks with labels starting from 0? I didn't understand the statement this_label_ids = [x for x in this_label_ids if x > 0] in the function _dequeue_and_enqueue

about the loss_contrast.py

Thank you very much for sharing.There is a place in the code that I don't understand. Could you please tell me,
In loss_contrast.py what is meanning of the _hard_anchor_sampling.Would you please explain its use to me

About simple reproduction

Hello, there are too many files in the project, I would like to ask you if I just want to try to reproduce the simple results of the camvid dataset combined with the pixel contrast learning method, what should I go to see about model creation, loss function, What about the data preprocessing, and the .py files and .sh files for training and testing?
I hope you can take the time to give me some simple guidelines, which will benefit me a lot, thank you!

Semi-Hard Example Sampling

I don't find your 10% hardest sampling strategy implement. Your memory bank only randomly stores K positive and negative pairs.

About memory bank

Thank you for your code. But I can not find the implementations of memory bank. How do you realize the contrastive loss?

About args n_view,max_samples

Hello! Your work is outstanding!
I did a detailed research on both the paper and the code.
There are a lot of questions that I don’t understand:

  1. I want to ask first about the loss_contrast.py
    Can you explain the meaning of max_samples and n_views in the hard_anchor_sampling function?
  2. If you apply this method to small samples, can the hard_anchor_sampling step be omitted?
    I hope you can explain in more detail,many thanks!

tSNE visualisation

Hi,

Thanks for your great work! I am wondering how you select feature embeddings for tSNE visualisation? Because for the dense pixel-level segmentation task, if we use feature embeddings of all the pixels, it will be too much.

Many thanks!

an AttributeError occurred suddently during train with cityscapes dataset...

i downloaded the cityscapes dataset for training the model HRNet-OCR. Since data from the directory 'train_extra' is so massive, i skipped the part, and only used 2975 images and labels from the 'train' and 'val'.
the training state is ok at the begining of the procedure, but an error occurred at the Train Epoch 2 Train Iteration1930 as the follow:

.....
2022-01-13 06:47:40,820 INFO [trainer_contrastive.py, 318] 60 images processed

2022-01-13 06:47:40,894 INFO [trainer_contrastive.py, 318] 60 images processed

2022-01-13 06:47:41,245 INFO [trainer_contrastive.py, 318] 60 images processed

2022-01-13 06:47:41,654 INFO [trainer_contrastive.py, 318] 60 images processed

2022-01-13 06:47:45,347 INFO [base.py, 84] Performance 0.0 -> 0.38793964487753546
2022-01-13 06:47:49,701 INFO [trainer_contrastive.py, 394] Test Time 50.082s, (0.795) Loss 0.64488318

2022-01-13 06:47:49,702 INFO [base.py, 33] Result for seg
2022-01-13 06:47:49,702 INFO [base.py, 49] Mean IOU: 0.38793964487753546

2022-01-13 06:47:49,702 INFO [base.py, 50] Pixel ACC: 0.8894303367559715
.......
2022-01-13 06:58:05,453 INFO [trainer_contrastive.py, 283] Train Epoch: 2 Train Iteration: 1930 Time 6.409s / 10iters, (0.641) Forward Time 2.593s / 10iters, (0.259) Backward Time 3.720s / 10iters, (0.372) Loss Time 0.081s / 10iters, (0.008) Data load 0.014s / 10iters, (0.001401)
Learning rate = [0.00956490946871452, 0.00956490946871452] Loss = 0.31870443 (ave = 0.43296322)

Traceback (most recent call last):
File "/home/xxx/paper_code/ContrastiveSeg-main/main_contrastive.py", line 217, in
model.train()
File "/home/xxx/paper_code/ContrastiveSeg-main/segmentor/trainer_contrastive.py", line 420, in train
self.__train()
File "/home/xxx/paper_code/ContrastiveSeg-main/segmentor/trainer_contrastive.py", line 241, in __train
loss = self.pixel_loss(outputs, targets, with_embed=with_embed)
File "/home/xxx/anaconda3/envs/contrastSeg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in call_impl
result = self.forward(*input, **kwargs)
File "/home/xxx/paper_code/ContrastiveSeg-main/lib/loss/loss_contrast.py", line 229, in forward
loss_contrast = self.contrast_criterion(embedding, target, predict)
File "/home/xxx/anaconda3/envs/contrastSeg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in call_impl
result = self.forward(*input, **kwargs)
File "/home/xxx/paper_code/ContrastiveSeg-main/lib/loss/loss_contrast.py", line 146, in forward
loss = self.contrastive(feats, labels
)
File "/home/xxx/paper_code/ContrastiveSeg-main/lib/loss/loss_contrast.py", line 92, in contrastive
anchor_num, n_view = feats
.shape[0], feats
.shape[1]
AttributeError: 'NoneType' object has no attribute 'shape'

the config details about my experiment is:
(contrastSeg) xxx@ubuntu:~/paper_code/ContrastiveSeg-main$ bash scripts/cityscapes/hrnet/run_h_48_d_4_ocr_contrast.sh train 4
Logging to /home/xxx/paper_code/ContrastiveSeg-main//logs/Cityscapes/hrnet_w48_ocr_contrast_lr1x_ocr_contrast_40k.log
World size: 4
2022-01-13 06:35:32,252 INFO [distributed.py, 49] ['main_contrastive.py', '--configs', 'configs/cityscapes/H_48_D_4.json', '--drop_last', 'y', '--phase', 'train', '--gathered', 'n', '--loss_balance', 'y', '--log_to_file', 'n', '--backbone', 'hrnet48', '--model_name', 'hrnet_w48_ocr_contrast', '--data_dir', '/home/xxx/paper_code/ContrastiveSeg-main/dataset//Cityscapes', '--loss_type', 'contrast_auxce_loss', '--gpu', '0', '1', '2', '3', '--max_iters', '40000', '--max_epoch', '500', '--checkpoints_root', '/home/xxx/paper_code/ContrastiveSeg-main/Checkpoints/hrnet_w48_ocr_contrast/Cityscapes/', '--checkpoints_name', 'hrnet_w48_ocr_contrast_lr1x_ocr_contrast_40k', '--pretrained', '/home/xxx/paper_code/ContrastiveSeg-main/pretrained_model/hrnetv2_w48_imagenet_pretrained.pth', '--train_batch_size', '4', '--distributed', '--base_lr', '0.01']
['--configs', 'configs/cityscapes/H_48_D_4.json', '--drop_last', 'y', '--phase', 'train', '--gathered', 'n', '--loss_balance', 'y', '--log_to_file', 'n', '--backbone', 'hrnet48', '--model_name', 'hrnet_w48_ocr_contrast', '--data_dir', '/home/xxx/paper_code/ContrastiveSeg-main/dataset//Cityscapes', '--loss_type', 'contrast_auxce_loss', '--gpu', '0', '1', '2', '3', '--max_iters', '40000', '--max_epoch', '500', '--checkpoints_root', '/home/xxx/paper_code/ContrastiveSeg-main/Checkpoints/hrnet_w48_ocr_contrast/Cityscapes/', '--checkpoints_name', 'hrnet_w48_ocr_contrast_lr1x_ocr_contrast_40k', '--pretrained', '/home/xxx/paper_code/ContrastiveSeg-main/pretrained_model/hrnetv2_w48_imagenet_pretrained.pth', '--train_batch_size', '4', '--distributed', '--base_lr', '0.01']

i probably knew the problem was with the loading of the dataset, but failed to find the concrete problem. what's more, this error is unstablely occurred at the same iteration step!! what an amazing problem, so, i want to find some help from the author for a details about data format and a simple demo for running the .sh file. Thank you so much and waiting for your reply.

Code/main_contrastive.py

Hi, Dr.Zhou.
Sorry to bother you, but could you please give an example to conduct the command file?
Considering various version of bash files, it would be nice to give an description about your naming style.
For example, (1) "run_ideal_spatial_ocrnet.sh", "xxx_ideal_gather_ocrnet.sh", and "xxx_ideal_distribute_octnet.sh"
(2) "run_ideal_spatial_ocrnet.sh", "run_ideal_spatial_ocrnet_b.sh", and "run_ideal_spatial_ocrnet_c.sh"

Thank you, and looking forward to your reponse.

Why ResNet-101 + DeepLab-V3 baseline performs so bad in your repo?

Hi,

First congrats for acceptance of ICCV'21.

I have question about result of ResNet-101 + DeepLab-V3 benchmark.

From your table and log it shows mIoU is 72.75 with pretrained model. However, basical benchmark of ResNet-101 is 77.12 and ResNet-50 is 79.09 (both using pretrained model, either).

Could you explain why ResNet-101 + DeepLab-V3 baseline performs so bad in your setting?

Best,

Questions about T-SNE visualization

Hello, Nice work.
I am curious about how to visualize embeddings on a 2D plot with t-sne, and where the embeddings come from.

The 1st question is about the embedding collection. Specifically, there are two ways collecting embeddings from images(or feature maps).

  • collect embeddings from only one val image or val images in a single mini batch.
  • construct a memory bank to collect feature pixels across all val images, then randomly select N samples as t-sne inputs.

Next question is that the embeddings collected above come from the output of last conv layer before the classifier or the MLP projection head?

CUDA out of memory

use 3090 to train ,but out of memory,the datasets is cityscapes .how to solve this problems

About the hard anchor samping

Hi, thank you for the great work. I am currently using your pipeline on my dataset. I have a question about the hard anchor sampling function. As the label (128 x 128) is resampled to the feature size (16 x 16), the segmentation prediction from the model is not resampled and it pops up an error of different dimensions between labels and predictions in the following line:

hard_indices = ((this_y_hat == cls_id) & (this_y != cls_id)).nonzero()

Is there anything that I can do to fix the error?

Thank you so much!

Problem in contrastive loss

Hi, Dr. Zhou,

Thanks for releasing the code. When reading the code about the contrastive loss in function _contrastive(), a mask is computed by following two lines:

mask = torch.eq(y_anchor, y_contrast.T).float().cuda()

and
mask = mask.repeat(anchor_count, contrast_count)

Now I think the shape of the mask is [anchor_num * anchor_count, class_num * cache_size]. If I did not misunderstand the code, the mask is a 'positive' mask, and each line represents the positive samples of an anchor view.

Then in L134-L138, the function of logits_mask is confusing:

logits_mask = torch.ones_like(mask).scatter_(1,
torch.arange(anchor_num * anchor_count).view(-1, 1).cuda(),
0)
mask = mask * logits_mask

Could you please explain these lines?

Suppose I have anchor_num=6 (2 images, 3 valid classes per image), anchor_count=2 (sample two pixels per class), class_num=5 (class number), cache_size=2 (memory size), then the following code raises RuntimeError:

mask = torch.ones((6 * 2, 5 * 2)).scatter_(1, torch.arange(6 * 2).view(-1, 1), 0)

Output:

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
RuntimeError: index 10 is out of bounds for dimension 1 with size 10

How to migrate to my model?Looking forward to your answer, thanks.

Thank you for your excellent work, Dr. Zhou.
I am a novice postgraduate, and my coding ability is not high. I want to use your method on my own two-class segmentation model. Which py files should I read or what should I pay attention to?
Looking forward to your answer, thank you very much.

memory using?

Hello, Dr. Zhou, I have not seen the application of memory bank in the code base?

Question about hard sample mining

Hi Dr.Zhou, thank you for your excellent work. Did you provide the loss implementation for Hardest Example Sampling and Semi-Hard Example Sampling? I only found the implementation of Segmentation-Aware Hard Anchor Sampling.

image

Explanation of n_view

Hi, Thanks for your great work.

Could you help explain what is the n_view stands for?

n_view = self.max_samples // total_classes
n_view = min(n_view, self.max_views)
X_ = torch.zeros((total_classes, n_view, feat_dim), dtype=torch.float).cuda()
y_ = torch.zeros(total_classes, dtype=torch.float).cuda()

I understand the meaning of the total_classes and feat_dim, but it's difficult for me to understand the meaning of n_view here. Thanks.

T-SNE of features visualization

Hi, Thanks for your nice work.
I have some questions about T-SNE visualization.

  1. What's the meaning of each point in the T-SNE visualization map of your paper. (Each point is a pixel feature?). As you mentioned in the former issue, features(tensor size[8,256,256,512]) after the projection layer are used. I try to draw the T-SNE map and I reshape the features to 8256512=1048576. Then, I got TensorA (1048576, 256). After that, I randomly sample 5000 from the first dimension of A. But I got a bad T-SNE map. So I wanna know how you handle the features after the projection layer and how many images for T-SNE visualization.
  2. The T-SNE map. Are the features of Pixel-wise Cross-Entropy Loss map from segmentation head? (I think it's the baseline method without contrastive loss, right?)
    But features of Pixel-wise Contrastive Learning Objective are from the projection layer. (method with contrastive loss? ) I am confused about where are the features of the two loss from.

Different contrastive loss from the paper

Hi, thanks for the great work!

However, I found that the contrastive loss here seems to be different from Eqn. (3) in the paper. According to the paper, the denominator should be 1 specific positive pair plus all the negative pairs. But in the current implementation, the denominator will be all the positive pairs (excluding identical matching) plus all the negative pairs.

about using your model in other dataset.

Hi,sir. Thank you for release the great code. I try to use your model in my dataset. It is about optic cup and disc segmentation.The dataset has 3 classes. I rewrite .shand .json :,"num_classes": 3, "label_list": [1,2,3],
Here are two errors,can you help ?Thank u!
RuntimeError: weight tensor should be defined either for all or no classes Traceback (most recent call last): File "/data/fyw/ContrastiveOpticSegmentation/main_contrastive.py", line 236, in <module> model.train() File "/data/fyw/ContrastiveOpticSegmentation/segmentor/trainer_contrastive.py", line 420, in train self.__train() File "/data/fyw/ContrastiveOpticSegmentation/segmentor/trainer_contrastive.py", line 241, in __train loss = self.pixel_loss(outputs, targets, with_embed=with_embed) File "/data/anaconda3/envs/ContrastiveOpticSegmentation/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/data/fyw/ContrastiveOpticSegmentation/lib/loss/loss_contrast_mem.py", line 218, in forward loss = self.seg_criterion(pred, target) File "/data/anaconda3/envs/ContrastiveOpticSegmentation/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/data/fyw/ContrastiveOpticSegmentation/lib/loss/loss_helper.py", line 204, in forward loss = self.ce_loss(inputs, target) File "/data/anaconda3/envs/ContrastiveOpticSegmentation/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/data/anaconda3/envs/ContrastiveOpticSegmentation/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1150, in forward return F.cross_entropy(input, target, weight=self.weight, File "/data/anaconda3/envs/ContrastiveOpticSegmentation/lib/python3.8/site-packages/torch/nn/functional.py", line 2846, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
2.Traceback (most recent call last): File "main_contrastive.py", line 185, in <module> handle_distributed(args_parser, os.path.expanduser(os.path.abspath(__file__))) File "/data/fyw/ContrastiveOpticSegmentation/lib/utils/distributed.py", line 70, in handle_distributed raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/data/anaconda3/envs/ContrastiveOpticSegmentation/bin/python', '-u', '-m', 'torch.distributed.launch', '--nproc_per_node', '4', '--master_port', '29962', '/data/fyw/ContrastiveOpticSegmentation/main_contrastive.py', '--configs', 'configs/REFUGE/H_48_D_4_MEM.json', '--drop_last', 'y', '--phase', 'train', '--gathered', 'n', '--loss_balance', 'y', '--log_to_file', 'n', '--backbone', 'hrnet48', '--model_name', 'hrnet_w48_mem', '--gpu', '4', '5', '6', '7', '--data_dir', '/data/dataset/REFUGE', '--loss_type', 'mem_contrast_ce_loss', '--max_iters', '40000', '--train_batch_size', '8', '--checkpoints_root', '/data/fyw/ContrastiveOpticSegmentation/Model/REFUGE/', '--checkpoints_name', 'hrnet_w48_mem_paddle_lr2x_1', '--pretrained', '/data/dataset/hrnetv2_w48_imagenet_pretrained.pth', '--distributed', '--base_lr', '0.01']' returned non-zero exit status 1.

About contrastive loss function

Hi, thank you for the amazing work!
Your paper and concept are really interesting.

I have trouble understanding your contrastive loss function in here.
What should be the input of this _contrastive function?

In other words, would you let me know

  1. which variable should I pass as feats_ and labels_ and
  2. how should their shapes be? (ex. number of anchors X number of views

Problem in the function "_dequeue_and_enqueue"

Hello,
I have one problem at the code pixel_queue_ptr[lb] = (pixel_queue_ptr[lb] + 1) % self.memory_size in the line 138, _dequeue_and_enqueue function, trainer_contrastive.py file. Should pixel_queue_ptr[lb] + 1 be modified to pixel_queue_ptr[lb] + K? Otherwise, pixel_queue[lb, ptr + 1:ptr + 1 + K, :] will be assigned at the next iteration, which is overlapped with pixel_queue[lb, ptr:ptr + K, :].

Repo based on pytorch==0.4.x ?

Thanks for your nice work!
Repo based on pytorch==0.4.x ?

And also, how about the FLOPs (computation) of your method?

tsne

hi, how do you use tsne in a segmentation network to have such visualization?
thanks

论文相似

似乎有一篇论文Region-aware Contrastive Learning for Semantic Segmentation疑似抄袭你们的成果

How to incorporate loss function into my model?

Thank you very much for your work, I have a question: my model is BiSeNetV2, I want to replace my loss function with your pixel contrast loss function, but my output has only one value, and the preds of the pixel contrast loss function is a dictionary, and the keys are 'seg' and 'embed', I don't know what 'seg' and 'embed' mean, and I don't know how to get these two values ​​in BiSeNetV2?

class ContrastCELoss(nn.Module, ABC):
def forward(self, preds, target, with_embed=False):
h, w = target.size(1), target.size(2)
assert "seg" in preds
assert "embed" in preds
seg = preds['seg']
embedding = preds['embed']

Memory and semi-hard example mining code

Hi, well done on the nice work.
I have the following questions regarding the codebase:

  1. I believe there is an issue in the pixel memory code. (or perhaps that's intended but not mentioned in the paper ?)
    In the case where available samples (K) exceed the memory size for a given class, the code replaces the newest rather than the oldest features in the queue (which is for example how moco handles the memory bank).
    Specifically in enqueue_dequeue in trainer_contrastive.py:

if ptr + K >= self.memory_size:
pixel_queue[lb, -K:, :] = nn.functional.normalize(feat,p=2, dim=1)
pixel_queue_ptr[lb] = 0

  1. In section 3.3 of the paper it is mentioned that the version of the method that is giving improvemnts on multiple datasets, entails semi-hard negative example mining and as a sampling procedure from the memory bank. I could not find the exact implementation of this version in the repository.

The only thing I found related to getting negatives from the memory is in loss_contrast_mem.py, method _sample_negative(self, Q) which to the best of my understanding just takes all memory samples per class.

Please correct me if I am missing something here.
If not, are you planning to either add these methods or make trained model weights available?

  1. Finally, It would be helpful to know whether all the miou results in the paper (including the appendix and results that concern ablation on the Cityscapes validation set ) are using multi-scale + flip test time augmentation.

It would be very helpful if you could address these questions.
Thanks

about the decrease of the loss

Hello. Thanks for your excellent work!
I transplant the loss function and memory bank of your code to my own code, and run my code on Cityscapes dataset and my own dataset. But it turns out that the decrease of the contrast loss is very small. For example, the contrast loss drops from 1.27 to 1.11 after 80k epoch on Cityscapes, while the ce loss drops from 1.26 to 0.15. And the same thing happens on my own dataset. It seems that the contrast loss is not being useful during train.
I wonder is it normal for the contrast loss to decrease so little? And what can I do to make full use of the contrast loss, like tuning the hyper-parameters?

When you release the code?

Thanks for your wonderful work on contrast learning for semantic segmentation, and i want to follow your work.

Dimension of feature embedding + full_res_stem

Dear @tfzhou ,

I find your work really interesting - thanks a lot (also to your team).
In your paper you state that you use an
image

Using the HRNet backbone I need to set the full_res_stem env variable to get an embedding of the same dimension as the input image. Else, there is an interpolation logic to upsample the feature-embeddings, which seems to happen for any use of the ResNet backbone.

I was wondering if you use the full_res_stem env variable for your stated results? Also, did you observe a performance difference when the feature embedding dimensions < input dimensions?

Thanks
Oliver

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.