Git Product home page Git Product logo

pri3d's People

Contributors

sekunde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pri3d's Issues

Pre-trained Model without downstream task fine-tuning

Hi,

Thanks for the codebase. It seems like the current provided model zoo only have models fine-tuned on different downstream tasks. I somehow want to take a look at the original features just from pre-training without any fine-tuning so that I can maybe use them for other tasks. Do you think such request makes sense, or you think I could use one from the existing model zoo and it should work similarly?

Thanks in advance.

Prepare ScanNet Pre-training Data

In the Prepare ScanNet Pre-training Data section, TARGET and SCANNET_DIR need to be modified, but what data should be stored in the folders pointed to by these two parameters in what structure?

I downloaded ScanNet's scannet_frames_25k data and its structure is:

 <scannet_frames_25k>
  |---- <scene0000_00>                  
        |---- <color>
                |---- 000000.jpg
                |---- 000100.jpg
                |---- 000200.jpg
                |---- ... # RGB images
        |---- <depth>
                |---- 000000.png
                |---- 000100.png
                |---- 000200.png
                |---- ... # depth images
        |---- <instance>
                |---- 000000.png
                |---- 000100.png
                |---- 000200.png
                |---- ... # instance images
        |---- <label>
                |---- 000000.png
                |---- 000100.png
                |---- 000200.png
                |---- ... # label images
        |---- <pose>
                |---- 000000.txt
                |---- 000100.txt
                |---- 000200.txt
                |---- ... # camera poses
        |---- intrinsics_color.txt
        |---- intrinsics_depth.txt
  |---- <scene0000_01>   
        |---- ... # same structure as before 
  |---- ... # like <scene*>  

Is the above data to be put into the TARGET or SCANNET_DIR mentioned before? Or do I have to download the specified ScanNet scene separately, e.g. scene0000_00, with the following structure:

<scanId>
|-- <scanId>.sens
    RGB-D sensor stream containing color frames, depth frames, camera poses and other data
|-- <scanId>_vh_clean.ply
    High quality reconstructed mesh
|-- <scanId>_vh_clean_2.ply
    Cleaned and decimated mesh for semantic annotations
|-- <scanId>_vh_clean_2.0.010000.segs.json
    Over-segmentation of annotation mesh
|-- <scanId>.aggregation.json, <scanId>_vh_clean.aggregation.json
    Aggregated instance-level semantic annotations on lo-res, hi-res meshes, respectively
|-- <scanId>_vh_clean_2.0.010000.segs.json, <scanId>_vh_clean.segs.json
    Over-segmentation of lo-res, hi-res meshes, respectively (referenced by aggregated semantic annotations)
|-- <scanId>_vh_clean_2.labels.ply
    Visualization of aggregated semantic segmentation; colored by nyu40 labels (see img/legend; ply property 'label' denotes the nyu40 label id)
|-- <scanId>_2d-label.zip
    Raw 2d projections of aggregated annotation labels as 16-bit pngs with ScanNet label ids
|-- <scanId>_2d-instance.zip
    Raw 2d projections of aggregated annotation instances as 8-bit pngs
|-- <scanId>_2d-label-filt.zip
    Filtered 2d projections of aggregated annotation labels as 16-bit pngs with ScanNet label ids
|-- <scanId>_2d-instance-filt.zip
    Filtered 2d projections of aggregated annotation instances as 8-bit pngs

So, what I am asking is: what files should I put into TARGET and SCANNET_DIR?

Cannot find scannet.py when preparing scannet data.

Hi!

first of all, thanks for sharing the code! Great job!

When preparing the scannet data for pretraining, I cannot find the file scannet.py. Do you have any idea what went wrong?

This is where the issue occurs:

python scannet.py --input SCANNET_DATA --output SCANNET_OUT_PATH

cd pretrain/data_preprocess/scannet
python scannet.py --input SCANNET_DATA --output SCANNET_OUT_PATH

Downstream task: Instance Segmentation, no result

Instance Segmentation/Detection on ScanNet

I followed the guidelines here and did the following.

  1. download scannet_frames_25k and unzip it.
  2. cd downstream/insseg/dataset;
  3. python scannet2coco.py --scannet_path <path of scannet_frames_25k> --phase train
    python scannet2coco.py --scannet_path <path of scannet_frames_25k> --phase val
  4. It's not quite the same here, I'm using https://github.com/soumyaiitkgp/Custom_Mask_RCNN for detection.
    I modified load_coco() in MaskRCNN/sample/coco/coco.py with the following code:
def load_custom(self, dataset_dir, subset, class_ids=None):
        """Load a subset of the custom dataset.
        dataset_dir: Root directory of the dataset.
        subset: Subset to load: train or val
        """

        f = open("labels_nyu40_18.txt")
        lines = f.readlines()
        for line in lines:
            line = line.split(", ")
            id_num = int(line[0].split(":")[1])
            category = line[1].split(":")[1]
            self.add_class("custom", id_num, category)
        f.close()

        annotations_dir = os.path.join(dataset_dir, "annotations")

        coco = COCO("{}/scannet_{}.coco.json".format(annotations_dir, subset))
        # Train or validation dataset?
        # assert subset in ["train", "val"]
#         if subset == "train" or subset == "val":
#             subset = "train"
        dataset_dir = os.path.join(dataset_dir, subset)

        # Add images
        if not class_ids:
            # All classes
            class_ids = sorted(coco.getCatIds())
        # All images or a subset?
        if class_ids:
            image_ids = []
            for id in class_ids:
                image_ids.extend(list(coco.getImgIds(catIds=[id])))
            # Remove duplicates
            image_ids = list(set(image_ids))
        else:
            # All images
            image_ids = list(coco.imgs.keys())

        # Add images
        for i in image_ids:
            self.add_image(
                "custom", image_id=i,
                path=os.path.join(dataset_dir, coco.imgs[i]['file_name']),
                width=coco.imgs[i]["width"],
                height=coco.imgs[i]["height"],
                annotations=coco.loadAnns(coco.getAnnIds(
                    imgIds=[i], catIds=class_ids, iscrowd=None)))

The contents of labels_nyu40_18.txt are as follows:

id:1, category:cabinet
id:2, category:bed
id:3, category:chair
id:4, category:sofa
id:5, category:table
id:6, category:door
id:7, category:window
id:8, category:bookshelf
id:9, category:picture
id:10, category:counter
id:11, category:desk
id:12, category:curtain
id:13, category:refridgerator
id:14, category:shower curtain
id:15, category:toilet
id:16, category:sink
id:17, category:bathtub
id:18, category:otherfurniture
  1. Config is:
class CustomConfig(Config):
    NAME = "custom"
    BACKBONE = "resnet50"
    IMAGES_PER_GPU = 2
    NUM_CLASSES = 1 + 18 
    STEPS_PER_EPOCH = 100
    DETECTION_MIN_CONFIDENCE = 0.9

after run MaskRCNN/sample/demo.ipynb, the config shows:

Configurations:
BACKBONE                       resnet50
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.9
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                31
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           custom
NUM_CLASSES                    19
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                100
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001
  1. MaskRCNN/sample/coco/inspect_weight.ipynb and MaskRCNN/sample/coco/inspect_data.ipynb: https://github.com/ydzat/Workspace/tree/master/AI/MaskRCNN/sample/coco
    my conda env is for_maskrcnn2.yaml

I guess, could it be that the annotations generated based on the scannet2coco.py provided by you cannot be used in the MaskRCNN project?

Model zoo request: VIEW only and GEO only on ScanNet

I am really interested in how the features change when using the different losses (VIEW only, GEO only, or also combined). From what I've seen, all provided models were trained using the same loss or otherwise on a different dataset. (Or did I overlook something?) Is it possible to make more models available, e.g. VIEW only and GEO only on ScanNet?

I am aware this is pushing your kindness, because it's already great that you provided such an extensive codebase. Still, I at least wanted to ask. :)

About computing overlap ratio of two frams

Thanks for your great work and released code! I notice that in your paper the pixel correspondences between the two frames are then determined as those whose 3D world locations lie within 2cm of each other. But in the preprocess script pretrain/data_preprocess/scannet/compute_full_overlapping.py, the match_indices radius is set as voxel_size*1.5, which by default is 7.5cm I think. I am confused with the 2cm and 7.5cm. Could you rovide some explanations? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.