sekunde / pri3d Goto Github PK

View Code? Open in Web Editor NEW

149.0 14.0 9.0 2.83 MB

[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?

Python 9.09% Shell 0.15% Jupyter Notebook 90.28% C 0.03% C++ 0.18% Cuda 0.27%

deep-learning python iccv2021 3d computer-vision

pri3d's People

Contributors

Stargazers

Watchers

Forkers

jammer345 jlqzzz peterzhousz jiadingfang metavai weakly-supervised-learning baitianyu-kun whuhxb chenhaomingbob

pri3d's Issues

Pre-trained Model without downstream task fine-tuning

Hi,

Thanks for the codebase. It seems like the current provided model zoo only have models fine-tuned on different downstream tasks. I somehow want to take a look at the original features just from pre-training without any fine-tuning so that I can maybe use them for other tasks. Do you think such request makes sense, or you think I could use one from the existing model zoo and it should work similarly?

Thanks in advance.

Prepare ScanNet Pre-training Data

In the Prepare ScanNet Pre-training Data section, TARGET and SCANNET_DIR need to be modified, but what data should be stored in the folders pointed to by these two parameters in what structure?

I downloaded ScanNet's scannet_frames_25k data and its structure is:

 <scannet_frames_25k>
  |---- <scene0000_00>                  
        |---- <color>
                |---- 000000.jpg
                |---- 000100.jpg
                |---- 000200.jpg
                |---- ... # RGB images
        |---- <depth>
                |---- 000000.png
                |---- 000100.png
                |---- 000200.png
                |---- ... # depth images
        |---- <instance>
                |---- 000000.png
                |---- 000100.png
                |---- 000200.png
                |---- ... # instance images
        |---- <label>
                |---- 000000.png
                |---- 000100.png
                |---- 000200.png
                |---- ... # label images
        |---- <pose>
                |---- 000000.txt
                |---- 000100.txt
                |---- 000200.txt
                |---- ... # camera poses
        |---- intrinsics_color.txt
        |---- intrinsics_depth.txt
  |---- <scene0000_01>   
        |---- ... # same structure as before 
  |---- ... # like <scene*>

Is the above data to be put into the TARGET or SCANNET_DIR mentioned before? Or do I have to download the specified ScanNet scene separately, e.g. scene0000_00, with the following structure:

<scanId>
|-- <scanId>.sens
    RGB-D sensor stream containing color frames, depth frames, camera poses and other data
|-- <scanId>_vh_clean.ply
    High quality reconstructed mesh
|-- <scanId>_vh_clean_2.ply
    Cleaned and decimated mesh for semantic annotations
|-- <scanId>_vh_clean_2.0.010000.segs.json
    Over-segmentation of annotation mesh
|-- <scanId>.aggregation.json, <scanId>_vh_clean.aggregation.json
    Aggregated instance-level semantic annotations on lo-res, hi-res meshes, respectively
|-- <scanId>_vh_clean_2.0.010000.segs.json, <scanId>_vh_clean.segs.json
    Over-segmentation of lo-res, hi-res meshes, respectively (referenced by aggregated semantic annotations)
|-- <scanId>_vh_clean_2.labels.ply
    Visualization of aggregated semantic segmentation; colored by nyu40 labels (see img/legend; ply property 'label' denotes the nyu40 label id)
|-- <scanId>_2d-label.zip
    Raw 2d projections of aggregated annotation labels as 16-bit pngs with ScanNet label ids
|-- <scanId>_2d-instance.zip
    Raw 2d projections of aggregated annotation instances as 8-bit pngs
|-- <scanId>_2d-label-filt.zip
    Filtered 2d projections of aggregated annotation labels as 16-bit pngs with ScanNet label ids
|-- <scanId>_2d-instance-filt.zip
    Filtered 2d projections of aggregated annotation instances as 8-bit pngs

So, what I am asking is: what files should I put into TARGET and SCANNET_DIR?

Cannot find scannet.py when preparing scannet data.

Hi!

first of all, thanks for sharing the code! Great job!

When preparing the scannet data for pretraining, I cannot find the file scannet.py. Do you have any idea what went wrong?

This is where the issue occurs:

Pri3D/README.md

Line 74 in 63c728b

python scannet.py --input SCANNET_DATA --output SCANNET_OUT_PATH

cd pretrain/data_preprocess/scannet
python scannet.py --input SCANNET_DATA --output SCANNET_OUT_PATH

Did you validate the usefulness of the pretrained 3D CNN in 3D tasks?

Hi, thanks for sharing your great and comprehensive work.
As my understanding, 3D backbone is also trained during the pretraining process. Did you do any experiment to validate the usefulness of the pretrained 3D network for downstream 3D tasks?

Was "pretrain.depth = True" set in modelzoo?

In the scannet.sh script, the setting "pretrain.depth" is set to "True":

Pri3D/pretrain/pri3d/scripts/scannet.sh

Line 26 in a607a05

pretrain.depth=True \

Was this also the setting that was used for all the models in the modelzoo? If I understand correctly, this enables the depth loss in addition to the VIEW and GEO loss, is that correct?

Pri3D/pretrain/pri3d/model/model.py

Line 94 in a607a05

if self.config.pretrain.depth:

Downstream task: Instance Segmentation, no result

Instance Segmentation/Detection on ScanNet

I followed the guidelines here and did the following.

download scannet_frames_25k and unzip it.
cd downstream/insseg/dataset;
python scannet2coco.py --scannet_path <path of scannet_frames_25k> --phase train
python scannet2coco.py --scannet_path <path of scannet_frames_25k> --phase val
It's not quite the same here, I'm using https://github.com/soumyaiitkgp/Custom_Mask_RCNN for detection.
I modified load_coco() in MaskRCNN/sample/coco/coco.py with the following code:

def load_custom(self, dataset_dir, subset, class_ids=None):
        """Load a subset of the custom dataset.
        dataset_dir: Root directory of the dataset.
        subset: Subset to load: train or val
        """

        f = open("labels_nyu40_18.txt")
        lines = f.readlines()
        for line in lines:
            line = line.split(", ")
            id_num = int(line[0].split(":")[1])
            category = line[1].split(":")[1]
            self.add_class("custom", id_num, category)
        f.close()

        annotations_dir = os.path.join(dataset_dir, "annotations")

        coco = COCO("{}/scannet_{}.coco.json".format(annotations_dir, subset))
        # Train or validation dataset?
        # assert subset in ["train", "val"]
#         if subset == "train" or subset == "val":
#             subset = "train"
        dataset_dir = os.path.join(dataset_dir, subset)

        # Add images
        if not class_ids:
            # All classes
            class_ids = sorted(coco.getCatIds())
        # All images or a subset?
        if class_ids:
            image_ids = []
            for id in class_ids:
                image_ids.extend(list(coco.getImgIds(catIds=[id])))
            # Remove duplicates
            image_ids = list(set(image_ids))
        else:
            # All images
            image_ids = list(coco.imgs.keys())

        # Add images
        for i in image_ids:
            self.add_image(
                "custom", image_id=i,
                path=os.path.join(dataset_dir, coco.imgs[i]['file_name']),
                width=coco.imgs[i]["width"],
                height=coco.imgs[i]["height"],
                annotations=coco.loadAnns(coco.getAnnIds(
                    imgIds=[i], catIds=class_ids, iscrowd=None)))

The contents of labels_nyu40_18.txt are as follows:

id:1, category:cabinet
id:2, category:bed
id:3, category:chair
id:4, category:sofa
id:5, category:table
id:6, category:door
id:7, category:window
id:8, category:bookshelf
id:9, category:picture
id:10, category:counter
id:11, category:desk
id:12, category:curtain
id:13, category:refridgerator
id:14, category:shower curtain
id:15, category:toilet
id:16, category:sink
id:17, category:bathtub
id:18, category:otherfurniture

Config is:

class CustomConfig(Config):
    NAME = "custom"
    BACKBONE = "resnet50"
    IMAGES_PER_GPU = 2
    NUM_CLASSES = 1 + 18 
    STEPS_PER_EPOCH = 100
    DETECTION_MIN_CONFIDENCE = 0.9

after run MaskRCNN/sample/demo.ipynb, the config shows:

Configurations:
BACKBONE                       resnet50
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.9
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                31
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           custom
NUM_CLASSES                    19
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                100
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001

MaskRCNN/sample/coco/inspect_weight.ipynb and MaskRCNN/sample/coco/inspect_data.ipynb: https://github.com/ydzat/Workspace/tree/master/AI/MaskRCNN/sample/coco
my conda env is for_maskrcnn2.yaml

I guess, could it be that the annotations generated based on the scannet2coco.py provided by you cannot be used in the MaskRCNN project?

Model zoo request: VIEW only and GEO only on ScanNet

I am really interested in how the features change when using the different losses (VIEW only, GEO only, or also combined). From what I've seen, all provided models were trained using the same loss or otherwise on a different dataset. (Or did I overlook something?) Is it possible to make more models available, e.g. VIEW only and GEO only on ScanNet?

I am aware this is pushing your kindness, because it's already great that you provided such an extensive codebase. Still, I at least wanted to ask. :)

About computing overlap ratio of two frams

Thanks for your great work and released code! I notice that in your paper the pixel correspondences between the two frames are then determined as those whose 3D world locations lie within 2cm of each other. But in the preprocess script pretrain/data_preprocess/scannet/compute_full_overlapping.py, the match_indices radius is set as voxel_size*1.5, which by default is 7.5cm I think. I am confused with the 2cm and 7.5cm. Could you rovide some explanations? Thanks.

sekunde / pri3d Goto Github PK

pri3d's People

Contributors

Stargazers

Watchers

Forkers

pri3d's Issues

Pre-trained Model without downstream task fine-tuning

Prepare ScanNet Pre-training Data

Cannot find scannet.py when preparing scannet data.

Did you validate the usefulness of the pretrained 3D CNN in 3D tasks?

Was "pretrain.depth = True" set in modelzoo?

Downstream task: Instance Segmentation, no result

Model zoo request: VIEW only and GEO only on ScanNet

About computing overlap ratio of two frams

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent