sekunde / pri3d Goto Github PK
View Code? Open in Web Editor NEW[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?
[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?
Hi,
Thanks for the codebase. It seems like the current provided model zoo only have models fine-tuned on different downstream tasks. I somehow want to take a look at the original features just from pre-training without any fine-tuning so that I can maybe use them for other tasks. Do you think such request makes sense, or you think I could use one from the existing model zoo and it should work similarly?
Thanks in advance.
In the Prepare ScanNet Pre-training Data section, TARGET and SCANNET_DIR need to be modified, but what data should be stored in the folders pointed to by these two parameters in what structure?
I downloaded ScanNet's scannet_frames_25k data and its structure is:
<scannet_frames_25k>
|---- <scene0000_00>
|---- <color>
|---- 000000.jpg
|---- 000100.jpg
|---- 000200.jpg
|---- ... # RGB images
|---- <depth>
|---- 000000.png
|---- 000100.png
|---- 000200.png
|---- ... # depth images
|---- <instance>
|---- 000000.png
|---- 000100.png
|---- 000200.png
|---- ... # instance images
|---- <label>
|---- 000000.png
|---- 000100.png
|---- 000200.png
|---- ... # label images
|---- <pose>
|---- 000000.txt
|---- 000100.txt
|---- 000200.txt
|---- ... # camera poses
|---- intrinsics_color.txt
|---- intrinsics_depth.txt
|---- <scene0000_01>
|---- ... # same structure as before
|---- ... # like <scene*>
Is the above data to be put into the TARGET or SCANNET_DIR mentioned before? Or do I have to download the specified ScanNet scene separately, e.g. scene0000_00, with the following structure:
<scanId>
|-- <scanId>.sens
RGB-D sensor stream containing color frames, depth frames, camera poses and other data
|-- <scanId>_vh_clean.ply
High quality reconstructed mesh
|-- <scanId>_vh_clean_2.ply
Cleaned and decimated mesh for semantic annotations
|-- <scanId>_vh_clean_2.0.010000.segs.json
Over-segmentation of annotation mesh
|-- <scanId>.aggregation.json, <scanId>_vh_clean.aggregation.json
Aggregated instance-level semantic annotations on lo-res, hi-res meshes, respectively
|-- <scanId>_vh_clean_2.0.010000.segs.json, <scanId>_vh_clean.segs.json
Over-segmentation of lo-res, hi-res meshes, respectively (referenced by aggregated semantic annotations)
|-- <scanId>_vh_clean_2.labels.ply
Visualization of aggregated semantic segmentation; colored by nyu40 labels (see img/legend; ply property 'label' denotes the nyu40 label id)
|-- <scanId>_2d-label.zip
Raw 2d projections of aggregated annotation labels as 16-bit pngs with ScanNet label ids
|-- <scanId>_2d-instance.zip
Raw 2d projections of aggregated annotation instances as 8-bit pngs
|-- <scanId>_2d-label-filt.zip
Filtered 2d projections of aggregated annotation labels as 16-bit pngs with ScanNet label ids
|-- <scanId>_2d-instance-filt.zip
Filtered 2d projections of aggregated annotation instances as 8-bit pngs
So, what I am asking is: what files should I put into TARGET and SCANNET_DIR?
Hi!
first of all, thanks for sharing the code! Great job!
When preparing the scannet data for pretraining, I cannot find the file scannet.py
. Do you have any idea what went wrong?
This is where the issue occurs:
Line 74 in 63c728b
cd pretrain/data_preprocess/scannet
python scannet.py --input SCANNET_DATA --output SCANNET_OUT_PATH
Hi, thanks for sharing your great and comprehensive work.
As my understanding, 3D backbone is also trained during the pretraining process. Did you do any experiment to validate the usefulness of the pretrained 3D network for downstream 3D tasks?
In the scannet.sh script, the setting "pretrain.depth" is set to "True":
Pri3D/pretrain/pri3d/scripts/scannet.sh
Line 26 in a607a05
Was this also the setting that was used for all the models in the modelzoo? If I understand correctly, this enables the depth loss in addition to the VIEW and GEO loss, is that correct?
Pri3D/pretrain/pri3d/model/model.py
Line 94 in a607a05
Instance Segmentation/Detection on ScanNet
I followed the guidelines here and did the following.
cd downstream/insseg/dataset
;python scannet2coco.py --scannet_path <path of scannet_frames_25k> --phase train
python scannet2coco.py --scannet_path <path of scannet_frames_25k> --phase val
load_coco()
in MaskRCNN/sample/coco/coco.py
with the following code:def load_custom(self, dataset_dir, subset, class_ids=None):
"""Load a subset of the custom dataset.
dataset_dir: Root directory of the dataset.
subset: Subset to load: train or val
"""
f = open("labels_nyu40_18.txt")
lines = f.readlines()
for line in lines:
line = line.split(", ")
id_num = int(line[0].split(":")[1])
category = line[1].split(":")[1]
self.add_class("custom", id_num, category)
f.close()
annotations_dir = os.path.join(dataset_dir, "annotations")
coco = COCO("{}/scannet_{}.coco.json".format(annotations_dir, subset))
# Train or validation dataset?
# assert subset in ["train", "val"]
# if subset == "train" or subset == "val":
# subset = "train"
dataset_dir = os.path.join(dataset_dir, subset)
# Add images
if not class_ids:
# All classes
class_ids = sorted(coco.getCatIds())
# All images or a subset?
if class_ids:
image_ids = []
for id in class_ids:
image_ids.extend(list(coco.getImgIds(catIds=[id])))
# Remove duplicates
image_ids = list(set(image_ids))
else:
# All images
image_ids = list(coco.imgs.keys())
# Add images
for i in image_ids:
self.add_image(
"custom", image_id=i,
path=os.path.join(dataset_dir, coco.imgs[i]['file_name']),
width=coco.imgs[i]["width"],
height=coco.imgs[i]["height"],
annotations=coco.loadAnns(coco.getAnnIds(
imgIds=[i], catIds=class_ids, iscrowd=None)))
The contents of labels_nyu40_18.txt are as follows:
id:1, category:cabinet
id:2, category:bed
id:3, category:chair
id:4, category:sofa
id:5, category:table
id:6, category:door
id:7, category:window
id:8, category:bookshelf
id:9, category:picture
id:10, category:counter
id:11, category:desk
id:12, category:curtain
id:13, category:refridgerator
id:14, category:shower curtain
id:15, category:toilet
id:16, category:sink
id:17, category:bathtub
id:18, category:otherfurniture
class CustomConfig(Config):
NAME = "custom"
BACKBONE = "resnet50"
IMAGES_PER_GPU = 2
NUM_CLASSES = 1 + 18
STEPS_PER_EPOCH = 100
DETECTION_MIN_CONFIDENCE = 0.9
after run MaskRCNN/sample/demo.ipynb
, the config shows:
Configurations:
BACKBONE resnet50
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.9
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 1
IMAGE_CHANNEL_COUNT 3
IMAGE_MAX_DIM 1024
IMAGE_META_SIZE 31
IMAGE_MIN_DIM 800
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME custom
NUM_CLASSES 19
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
PRE_NMS_LIMIT 6000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 100
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001
MaskRCNN/sample/coco/inspect_weight.ipynb
and MaskRCNN/sample/coco/inspect_data.ipynb
: https://github.com/ydzat/Workspace/tree/master/AI/MaskRCNN/sample/cocofor_maskrcnn2.yaml
I guess, could it be that the annotations generated based on the scannet2coco.py provided by you cannot be used in the MaskRCNN project?
I am really interested in how the features change when using the different losses (VIEW only, GEO only, or also combined). From what I've seen, all provided models were trained using the same loss or otherwise on a different dataset. (Or did I overlook something?) Is it possible to make more models available, e.g. VIEW only and GEO only on ScanNet?
I am aware this is pushing your kindness, because it's already great that you provided such an extensive codebase. Still, I at least wanted to ask. :)
Thanks for your great work and released code! I notice that in your paper the pixel correspondences between the two frames are then determined as those whose 3D world locations lie within 2cm of each other. But in the preprocess script pretrain/data_preprocess/scannet/compute_full_overlapping.py
, the match_indices radius is set as voxel_size*1.5, which by default is 7.5cm I think. I am confused with the 2cm and 7.5cm. Could you rovide some explanations? Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.