charlescxk / torchsemiseg Goto Github PK

[CVPR 2021] CPS: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

License: MIT License

Python 87.66% C++ 2.68% Cuda 8.75% Makefile 0.09% CSS 0.22% HTML 0.10% Dockerfile 0.05% Shell 0.45%

semantic-segmentation semi-supervised-learning cvpr2021 semi-supervised-segmentation

torchsemiseg's Issues

issue about the random seed

Hi,
Thanks for the excellent work. There is the problem, I run the code reveral times, and each time the loss is different. Are there any random seeds haven't been fixed.
`seed = config.seed
if engine.distributed:
seed = engine.local_rank
torch.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)

if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)`

Reproduce your Resnet-101 result

Thanks for your share! Great works!

I'm trying to reproduce your result based on resnet-101 under the PascalVOC dataset.
I miss your config files of the resnet-101, and I'm wondering is there any change compared with your resnet-50 settings (in here)?

Cheers

Performance on PascalVOC with default config in voc8.res50v3+.CPS

Hello, I am using 1 Telsa P100 16gb with batchsize 4, with your default config.
I only reach about 66% mIOU, different with your report in paper and your log: 73.28

The Final Performance about Experiment

Hello, the final performance of the experiment is the last epoch result or the best performance among the 20-34 epoch？（the ratio is 1/8）

issue for memory and time usage

May I ask how much memory and time for the VOC dataset? I use the script with 1/8 data partitions, and needs nearly 64g GPU memory. Is it normal? Thank you for your help.

Difference between code and paper

Hi, first, thanks for sharing for this great work !
I was reading the paper and the code and I notice in the Mean Teacher exp, the paper said you use x1 and x2 with two different augmentations, but in the code you simply use x to feed the teacher and the student ? Is it normal or did I miss something ?
Moreover, could you release the mean teacher experience for cityscapes ? Because I could not reproduce the results from the paper

Thanks

The performance of default voc8.res50v3+.CPS

Hello, I'm training with double 3080 cards. The batchsize can only be set to 4. The learning rate is 0.0025. The ratio of 1 / 8. I've run it three times and the IOU is only 0.66. What's the reason?Does batchsize have a great impact?

Getting TypeError on running script.sh

I am getting this error on running script.sh for exp.voc/voc8.res50v3+.CPS :

I have checked all the variables and paths specified and they seem to be right. Any idea what is wrong here?

🔥💰⑥名

Extra dict is always null for CityScapes while loading the dataset

Hi, I tried to runt he code here and

TorchSemiSeg/exp.city/city8.res50v3+.CPS+CutMix/dataloader.py

Line 172 in 9c56939

if self.preprocess is not None and extra_dict is not None:

here, the extra dict will always be None as the TrainPre() here

TorchSemiSeg/exp.city/city8.res50v3+.CPS+CutMix/dataloader.py

Line 82 in 9c56939

extra_dict = {}

always returns an empty extra_dict and hence batches won't have the key 'img'

Overlap ratio meaning

Hi, I have a question about the Figure 7 of the paper. What is the formula to obtain the overlap ratio? Is it an mIoU between the predictions of network 1 and network 2 for each kind of samples ?

Thanks

Corrupted Citiscapes archive

The Citiscapes archive from OneDrive seems to be corrupted. I've tested it on two devices and still can't unpack it.

$ unzip city.zip 
Archive:  city.zip
warning [city.zip]:  8575179524 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [city.zip]:  start of central directory not found;
  zipfile corrupt.
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)

issue in getting started

Hi;

I couldn't run the command "cd ./model/voc8.res50v3+.CPS" because there is no folder called model.
where is the folder model?

Run error

I run the script.sh file and report an error.

Traceback (most recent call last):
File "train.py", line 28, in
from apex.parallel import DistributedDataParallel, SyncBatchNorm
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in _load_backward_compatible
File "/home/dj/anaconda3/envs/py3.6torch1.6/lib/python3.6/site-packages/apex-0.1-py3.6.egg/apex/init.py", line 12, in
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in _load_backward_compatible
File "/home/dj/anaconda3/envs/py3.6torch1.6/lib/python3.6/site-packages/apex-0.1-py3.6.egg/apex/optimizers/init.py", line 2, in
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in _load_backward_compatible
File "/home/dj/anaconda3/envs/py3.6torch1.6/lib/python3.6/site-packages/apex-0.1-py3.6.egg/apex/optimizers/fp16_optimizer.py", line 8, in
File "/home/dj/anaconda3/envs/py3.6torch1.6/lib/python3.6/ctypes/init.py", line 361, in getattr
func = self.getitem(name)
File "/home/dj/anaconda3/envs/py3.6torch1.6/lib/python3.6/ctypes/init.py", line 366, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home/dj/anaconda3/envs/py3.6torch1.6/bin/python: undefined symbol: THCudaHalfTensor_normall
Traceback (most recent call last):
File "/home/dj/anaconda3/envs/py3.6torch1.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/dj/anaconda3/envs/py3.6torch1.6/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/dj/anaconda3/envs/py3.6torch1.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/home/dj/anaconda3/envs/py3.6torch1.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/dj/anaconda3/envs/py3.6torch1.6/bin/python', '-u', 'train.py', '--local_rank=0']' returned non-zero exit status 1.

My operating environment is pytorch1.6 CUDA10.0 gcc and g++ 7.5.0 ，Two 2080ti GPU。
I did not follow the method you said when installing apex,I keep failing to install according to that command，But I installed successfully through python3 setup.py install，So I turn to you for help。

resnet50 structure

How can I disable BatchNormalization layer?

Hi again, I have more question about BN layer.

I want to disable batchnormalization layer, so I tried to change model status from model.train() to model.eval() in train.py. (And also tried replacing BatchNorm layer with Identity layer but same error occurred)

The train script occurred this error:
"""
ValueError: Expected input batch_size (21) to match target batch_size (2).
"""

How can I disable the BN layer? Thank you!

Hardware problem

Hello, I am using double 3090 cards for training, but the following problems have occurred. I would like to ask what kind of hardware equipment do you use

?

The Head LR within the DeepLabv3+

Hi,

I notice that you write the comment for the 10 times head learning rate here, but seems you do not perform that. Since training the entire network is really time-consuming, may I kindly ask that whether it is the wrong comment or not.

Cheers.

The performance of default voc8.res50v3+.CPS

thank you for release the code. we run the default config as your public in voc8.res50v3+.CPS, the best performance is 72.76, a little lower than the 73.20 in paper. have any other tricks? the default epoch is 34, the paper is 60? but your released log, 73.28 is also use 34 epoch.

Config file of few supervision.

It is a very nice work.
Thank you so much for your contribution.
Is it possible to release scripts for few-supervision on PASCAL VOC 2012, too?

question about paper

Does the network generate labels for all pixels of each image to supervise another network during training? (in GCT some pixel that higher than a certain threshold is selected).

In the early stage of network training, the labels predicted by poor performance at the beginning are inaccurate, and is it reasonable to use the wrong labels as supervisory information for another network?

Reproduce the results with a newer version of pytorch

When I use pytorch 1.0, apex DDP will automatically exit without error message, so I try to run this code with pytorch 1.8.1. However I got similar results reported in #14 (72.0±0.2 mIU for cps, 1/8 voc r50 setting), can anyone figure out the possible reason that causes the performance decay?
@yh-pengtu, @frank-xwang

The performance on city8.res50v3+.CPS

Hello, I trained labeled ratio 8, 137 epochs, learning_rate 0.02, batch size 8 with 8 cards as your default configuration in city8.res50v3+.CPS，but the mean_IU is 70.682% which is lower than 74.39% in the paper. Later，i change the epoch to 240 and the learning rate to 0.04 refers to the experiments in the paper，but it still doesn't work. Do you have any advice for it？

Thank you a lot.

configs for resnet101

Hi,

Thank you for the sharing and excellent work. I have run the code for resnet50 but I fail to find the configs for resnet101, would you like to share them?

Besides, I notice there are some tiny changes in the resnet50 and ASPP. Is there any reason? and how to get the pre-trained model trained on ImageNet? (I mean using standard supervised learning or self-supervised learning?)

Besides, I try to download the city.zip and unzip it. But I fail to unzip it. So would you like to have a check?

By the way, thank you for your code, it is very helpful.

Best regards,
Yuhang

The problem on Installation

Hi, I tried to setup the base virtual environment, but I faced a problem.

$ conda env create -f semiseg.yaml

Then, I can see this message.

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

jsonschema==2.6.0=py36_0
libprotobuf==3.6.1=hd408876_0
libarchive==3.3.3=h5d8350f_5
lzo==2.10=h49e0be7_2
kiwisolver==1.0.1=py36hf484d3e_0
gmp==6.1.2=h6c8ec71_1
fontconfig==2.13.0=h9420a91_0
cudatoolkit==9.0=h13b8566_0
lazy-object-proxy==1.3.1=py36h14c3975_2
sip==4.19.8=py36hf484d3e_0
lxml==4.2.5=py36hefd8a0e_0
bzip2==1.0.6=h14c3975_5
.....
.
.
.
.
.
.
I skipped others since there are too many lines.
Maybe almost dependency packages cannot be found.

My environment.
Win10
Anaconda Version : 1.7.2
GPU: RTX3090

Please help me.
Thank you.

Weight decay difference between paper and config

Hi, thank you for releasing awesome work!

I have a question about weight decay value config on resnet50 - pascal voc - 1/8 labels.

The paper said "The momentum is fixed as 0.9 and the weight decay is set to 0.0005(5e-4)"

But in config file, the weight decay value defined with 1e-4.
"""
In TorchSemiSeg/exp.voc/voc8.res50v3+2B.CPS+2BCutMix/config.py ...
line 106: C.weight_decay = 1e-4
"""

Which value should I follow? Thank you!

About the performance of 1/8 labels.

I follow the default setting of codes and obtained 71.687% mIoU when training with 1/8 labels on pascal voc，but this paper reported 73.67% mIoU. Why the performance gap so huge, and anyone can tell me how to improve this code? Thanks very much!

Parameter Settings at different scales

Hi,is the learning rate of 1/16 also 0.0025?
I can't achieve the accuracy in the paper by using 0.0025.

Self-training Details

Hi,

Thanks so much for the share! I'm also very enjoy to read your paper!

I just miss the code of the self-training, in your Table 7. Could you please provide some details about it?

Cheers!

Training process stops after few epochs

First of all, thank you for your great work.
Currently, I'm using your code to train on another dataset and face the problem described in the attached image.
After install APEX successfully, I run training code with 2 A6000 GPUs and it stops after few epochs.
Another problem I want to ask is that my server keeps restarting when I ran the training code for several times before I could. Do you know the reason why?
Thank you very much

Excuse me, can you train with a single GPU? I only have one GPU available at the moment!

Ahout the cps loss on labeled data?

Hi, @charlesCXK ,

Great work and thanks for the share of the code.
I have a question that the cps loss in the paper is also computed on the labeled data, but can't find in the code. Is it forgot or the code is not complete.

Best regards.

Some environmental issues

Due to the server cuda version, I was unable to use the pytorch1.0.0 version requested by the author. So I upgraded PyTorch to 1.9.1. But then it didn't work.How can I deal with this problem.

_
/opt/conda/envs/semiseg2/lib/python3.6/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torch.distributed.run.
Note that --use_env is set by default in torch.distributed.run.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

FutureWarning,
WARNING:torch.distributed.run:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Traceback (most recent call last):
File "train.py", line 28, in
Traceback (most recent call last):
File "train.py", line 28, in
from apex.parallel import DistributedDataParallel, SyncBatchNorm
from apex.parallel import DistributedDataParallel, SyncBatchNorm
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/init.py", line 12, in
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/init.py", line 12, in
from . import optimizers
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/init.py", line 2, in
from . import optimizers
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/init.py", line 2, in
from .fp16_optimizer import FP16_Optimizer
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fp16_optimizer.py", line 8, in
from .fp16_optimizer import FP16_Optimizer
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fp16_optimizer.py", line 8, in
lib.THCudaHalfTensor_normall.argtypes=[ctypes.c_void_p, ctypes.c_void_p]
File "/opt/conda/envs/semiseg2/lib/python3.6/ctypes/init.py", line 361, in getattr
lib.THCudaHalfTensor_normall.argtypes=[ctypes.c_void_p, ctypes.c_void_p]
File "/opt/conda/envs/semiseg2/lib/python3.6/ctypes/init.py", line 361, in getattr
func = self.getitem(name)
File "/opt/conda/envs/semiseg2/lib/python3.6/ctypes/init.py", line 366, in getitem
func = self.getitem(name)
File "/opt/conda/envs/semiseg2/lib/python3.6/ctypes/init.py", line 366, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /opt/conda/envs/semiseg2/bin/python: undefined symbol: THCudaHalfTensor_normall
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /opt/conda/envs/semiseg2/bin/python: undefined symbol: THCudaHalfTensor_normall
Traceback (most recent call last):
File "train.py", line 28, in
from apex.parallel import DistributedDataParallel, SyncBatchNorm
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/init.py", line 12, in
from . import optimizers
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/init.py", line 2, in
from .fp16_optimizer import FP16_Optimizer
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fp16_optimizer.py", line 8, in
lib.THCudaHalfTensor_normall.argtypes=[ctypes.c_void_p, ctypes.c_void_p]
File "/opt/conda/envs/semiseg2/lib/python3.6/ctypes/init.py", line 361, in getattr
func = self.getitem(name)
File "/opt/conda/envs/semiseg2/lib/python3.6/ctypes/init.py", line 366, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /opt/conda/envs/semiseg2/bin/python: undefined symbol: THCudaHalfTensor_normall
Traceback (most recent call last):
File "train.py", line 28, in
from apex.parallel import DistributedDataParallel, SyncBatchNorm
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/init.py", line 12, in
from . import optimizers
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/init.py", line 2, in
from .fp16_optimizer import FP16_Optimizer
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fp16_optimizer.py", line 8, in
lib.THCudaHalfTensor_normall.argtypes=[ctypes.c_void_p, ctypes.c_void_p]
File "/opt/conda/envs/semiseg2/lib/python3.6/ctypes/init.py", line 361, in getattr
func = self.getitem(name)
File "/opt/conda/envs/semiseg2/lib/python3.6/ctypes/init.py", line 366, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /opt/conda/envs/semiseg2/bin/python: undefined symbol: THCudaHalfTensor_normall
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 34178) of binary: /opt/conda/envs/semiseg2/bin/python
Traceback (most recent call last):
File "/opt/conda/envs/semiseg2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/envs/semiseg2/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/torch/distributed/run.py", line 692, in run
)(*cmd_args)
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/envs/semiseg2/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

        train.py FAILED

=======================================
Root Cause:
[0]:
time: 2021-09-22_11:17:30
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 34178)
error_file: <N/A>
msg: "Process failed with exitcode 1"

Other Failures:
[1]:
time: 2021-09-22_11:17:30
rank: 1 (local_rank: 1)
exitcode: 1 (pid: 34179)
error_file: <N/A>
msg: "Process failed with exitcode 1"
[2]:
time: 2021-09-22_11:17:30
rank: 2 (local_rank: 2)
exitcode: 1 (pid: 34180)
error_file: <N/A>
msg: "Process failed with exitcode 1"
[3]:
time: 2021-09-22_11:17:30
rank: 3 (local_rank: 3)
exitcode: 1 (pid: 34181)
error_file: <N/A>
msg: "Process failed with exitcode 1"

22 11:17:31 using devices 0, 1, 2, 3
_

Running fully supervised baselines with DeepLabv3+ and HRNet-W48

In Figure 3 in the paper, there's a comparison of fully supervised baselines. How to reproduce these results (especially with HRNet)?

About ohem loss

Hi, Xiaokang,
Thanks for sharing so solid work! I noticed that the loss in supervised learning is ohem loss. Have you ever done the experiments for ce loss and how about the result?

some confusion of iters setting.

Hi.According to the Getting Started.md, there is different epochs for different label ratio. but i'm confused about the setting's meaning.
if i'm right, here you only need to specify the total iters.It is not necessary to calculate the epoch corresponding to different label ratio?

If I want to train supervised baseline, does it mean that if my total iters are 40k and the label ratio is 0.125, then I need to do 40k iterations(maybe batch size is 8) on this 0.125*10852 labeled images only? But if i use semi supervised method, one iteration means fetching the labeled data once and fetching the unlabeled data once?

Hope to receive your reply. very thanks.

best,

Question about the crop size

Thx for your marvelous work!
When I was trying to reproduce your work in voc2012 dataset, I felt strange about the size of inputting images after cropping operation, should it be like batch* channel321321, same with other methods? In your work the size of images is batch* channel512512 now，if I have not made a mistake.

Stop Gradient

We notice that in the manuscript, Y should not backward the gradients.
But in the codes, there are no operation to stop the gradients?

other methods implement different?

Thanks for share the code, I have reproduced the result of voc.CPS, but I also have some confuse.

Hello, I found some difference of other methods's implement.
In MeanTeacher, the paper propose to update the teacher model weight with ema of student's model. but in code, it only have mse loss between teacher and student model.

Hi, how should I test your pretrained model?

Other SOTA semi-supervised segmentation methods

Hi,could you give a date when you can release other SOTA semi-supervised segmentation methods?

Please ask, what is the role of try_index?

pooling_size = (min(try_index(self.pooling_size, 0), x.shape[2]),
min(try_index(self.pooling_size, 1), x.shape[3]))

About number of epochs/iters

Hi @charlesCXK, I was reading your paper and code, and I got a question concerning the numbers of epochs (max number of iters), did you choose it arbitrary ? Or do you take the fully supervised number of iters from deeplabv3+ as a starting point and adapt it to the various splits ?

I hope my question is clear :)

Reproduce ResNet101 result on Cityscapes

Hi,

Thank you for the sharing and excellent work. In order to reproduce the Tab2 results(w/o CutMix Aug) of ResNet101 on Cityscapes dataset, we just change the path of pretrain_model path in 'config.py' with the path of the resnet101 model. However, the results of ResNet101(1/16 63.632) are far from the paper(1/16 72.18), even worse than the performance of ResNet50(1/16 68.21). Why the performance gap so huge?

C.dataset_path = osp.join(C.volna, 'DATA/pascal_voc')
C.img_root_folder = C.dataset_path
C.gt_root_folder = C.dataset_path
C.pretrained_model = C.volna + 'DATA/pytorch-weight/resnet50_v1c.pth'

here is my experimental environment：
8 Tesla V100 GPUs, pytorch 1.0.0, python 3.6.7

how to train model with custom data?

Thank you for your job! I want to know how to train cps with my custom dataset, i should change code where it is?

Other SOTA methods on Cityscapes

Hi, first, thanks for sharing for this great work !
I noticed that you have released other SOTA methods on the VOC dataset. I wonder if you will release them on the Cityscapes dataset?

Thanks

About different initialization?

Hi, Xiaokang,
Congrats so solid work. I have a question about the different initialization, I check the code of the segmentation model, but I don't find the operation of parameters initialization, I want to know the details of this operation, could you give me some advice?
Best,
Xiangde.

like your code❤

overlap in you supervised and unsupervised data-set, for few supervision data

In you few-supervise setting, which is in here which seems is labeled data but over lap with here.

Also,
in you 1/8 ratio, labeled with unlabeled

in you 1/4 ratio, labeled with unlabeled,

in you 1/2 ratio,

TorchSemiSeg/DATA/pascal_voc/subset_train_aug/pseudoseg_labeled_1-2.txt

Line 732 in bab3a31

2009_000662

and

TorchSemiSeg/DATA/pascal_voc/subset_train_aug/pseudoseg_unlabeled_1-2.txt

Line 321 in bab3a31

2009_000662

In each ratio, there is one overlap, and I suspect it should be a mis-operation.

Parameter Initialization for Mean Teacher

Thanks for sharing this excellent work!

I have a question for the parameter initialization in this Teacher-Student paradigm:

At the very beginning of training, the parameters for the Teacher net are initialized by that of the Student net, as in https://github.com/charlesCXK/TorchSemiSeg/blob/main/exp.voc/voc8.res50v3%2B.MeanTeacher/network.py#L30.
Under this case, will the copy destroy the original setting? The parameters (of the semantic segmentation head) for the Teacher and Student are initialized differently as you said, but they are now unified by param_t.data.copy_(param_s.data).

Looking forward to your clarification! Thank you

charlescxk / torchsemiseg Goto Github PK

torchsemiseg's Issues

======================================= Root Cause: [0]: time: 2021-09-22_11:17:30 rank: 0 (local_rank: 0) exitcode: 1 (pid: 34178) error_file: <N/A> msg: "Process failed with exitcode 1"

Recommend Projects

Recommend Topics

Recommend Org

=======================================
Root Cause:
[0]:
time: 2021-09-22_11:17:30
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 34178)
error_file: <N/A>
msg: "Process failed with exitcode 1"