clim,wusize

CANNOT find resnet50_msra-5891d200.pth

Hi, thanks a lot for ur excellent work!!! However, when i try to pre-train the detector on OV-COCO using detic, i can not find the checkpoint file: resnet50_msra-5891d200.pth which used for initializing the backbone of OVDTwoStageDetector
Can u supplement this document？

mmdetection reproducibility

[First run]

[Second run]

Hello, I have a question about how to reproduce the model on mmdetection 3.x. The model returns slightly different outputs even after fixing all seeds through mmdetection:

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
CUDA_VISIBLE_DEVICES=$NODES python -m torch.distributed.launch \
    --nnodes=$NNODES \
    --node_rank=$NODE_RANK \
    --master_addr=$MASTER_ADDR \
    --nproc_per_node=$GPUS \
    --master_port=$PORT \
    $(dirname "$0")/train.py \
    $CONFIG \
    **--cfg-options randomness.seed=$SEED \
    randomness.diff_rank_seed=True \
    randomness.deterministic=True \**
    --launcher pytorch ${@:5}

Setting "randomness.seed" and "randomness.deterministic" will invoke the function defined in the mmengine library:

def set_random_seed(seed: Optional[int] = None,
                    deterministic: bool = False,
                    diff_rank_seed: bool = False) -> int:
    """Set random seed.

    Args:
        seed (int, optional): Seed to be used.
        deterministic (bool): Whether to set the deterministic option for
            CUDNN backend, i.e., set `torch.backends.cudnn.deterministic`
            to True and `torch.backends.cudnn.benchmark` to False.
            Defaults to False.
        diff_rank_seed (bool): Whether to add rank number to the random seed to
            have different random seed in different threads. Defaults to False.
    """
    if seed is None:
        seed = sync_random_seed()

    if diff_rank_seed:
        rank = get_rank()
        seed += rank

    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    # torch.cuda.manual_seed(seed)
    if is_cuda_available():
        torch.cuda.manual_seed_all(seed)
    elif is_musa_available():
        torch.musa.manual_seed_all(seed)
    # os.environ['PYTHONHASHSEED'] = str(seed)
    if deterministic:
        if torch.backends.cudnn.benchmark:
            print_log(
                'torch.backends.cudnn.benchmark is going to be set as '
                '`False` to cause cuDNN to deterministically select an '
                'algorithm',
                logger='current',
                level=logging.WARNING)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

        if digit_version(TORCH_VERSION) >= digit_version('1.10.0'):
            torch.use_deterministic_algorithms(True)
    return seed

However, this model and BARON show its Faster-RCNN model returns different rpn results upon each run. TwoStageDetector in MMDetection contains a function:

    def extract_feat(self, batch_inputs: Tensor) -> Tuple[Tensor]:
        """Extract features.

        Args:
            batch_inputs (Tensor): Image tensor with shape (N, C, H ,W).

        Returns:
            tuple[Tensor]: Multi-level features that may have
            different resolutions.
        """
        x = self.backbone(batch_inputs)
        if self.with_neck:
            x = self.neck(x)
        return x

The "self.backbone" is a resnet model saved in mmdet/models/backbones/resnet.py. The ResNet model produces different results during forward():


    def forward(self, x):
        """Forward function."""
        if self.deep_stem:
            x = self.stem(x)
        else:
            x = self.conv1(x)
            x = self.norm1(x)
            x = self.relu(x)
        x = self.maxpool(x)

        outs = []
        for i, layer_name in enumerate(self.res_layers):
            res_layer = getattr(self, layer_name)
            x = res_layer(x)
            if i in self.out_indices:
                outs.append(x)
        return tuple(outs)

I checked that the "x" was the same every time.

        x = self.conv1(x)

I checked the log and the conv1 produced different results for the same inputs. The model does calculate the same total losses every time, but the backward process changes the parameters differently even from the same loss.

How can I guarantee the rpn results to be consistent?

Thanks for your response in advance!

wusize / clim Goto Github PK

clim's People

Contributors

Stargazers

Watchers

Forkers

clim's Issues

CANNOT find resnet50_msra-5891d200.pth

mmdetection reproducibility

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent