Learning Articulated Shape with Keypoint Pseudo-labels from Web Images

Code repository for the paper:
Learning Articulated Shape with Keypoint Pseudo-labels from Web Images
Anastasis Stathopoulos, Georgios Pavlakos, Ligong Han, Dimitris Metaxas
CVPR 2023
[paper] [project page]

Installation

We recommend installing the dependencies in a clean conda environment by running:

conda env create -f environment.yml

After the installation is complete, activate the conda environment by running:

conda activate animals3d

Then, you should install the Neural Mesh Renderer. Before installing run:

cd external
export CUDA_HOME=/usr/local/cuda

Check the CUDA version of PyTorch by running python -c 'import torch; print(torch.version.cuda)'. Make sure you set the right CUDA_HOME and build extension. Then run:

python setup.py install

The next step is to install acsm as Python package. Since you might want to modify the code, we recommend running the following:

cd ../acsm
python setup.py develop

Data Prepartation

Now that the installation is complete, please follow the data preparation instructions to prepare the data.

Keypoint PLs on Web Images

We describe our pipeline for training with web images. In case you do not want to use unlabeled images from the web, you can skip this section.

1) Download web images

We download web images from Flickr using metadata from the YFCC100m dataset. Instead of downloading all metada from YFCC100m, we used the search bar in this website that enables downloading the metadata for a desired category in a single json file. We provide the downloaded metadata for the 5 animal categories used in our paper in categ_ids.zip. Download the file and extract it in prepare_data.

You can download web images for the desired category with the following command:

python prepare_data/download_images.py --category <category>

The downloaded images are stored in data/yfcc100m/images/<category>.

2) Bounding box detections

You should create bounding box detections with by running:

python prepare_data/save_boxes.py --category <category> \
--opts MODEL.WEIGHTS detectron2://COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x/139173657/model_final_68b088.pkl

The above command will store the detections in data/yfcc100m/labels_0/<category>_bbox.json file.

3) PLs with 2D pose estimator

Download the weights for models trained with 150 labeled images from the checkpoints.zip and extract it in pose/results/checkpoints. You can generate keypoint pseudo-labels by running the following command from the pose directory

python generate_pl.py --category <cateogory>

Running the above command will create keypoint pseudo-labels with the provided weights. We also provide training and evaluation code for the pose estimation network. Details can be seen here.

4) Data Selection

We provide the code for the four data selection criteria used in the paper. To create a subset of web images and generated PLs for training, run the following command.

python prepare_data/data_selection.py --category <category> --filter <selection_criterion> --selection_num <number of images>

Adding the argument --visualize to the above command will additionally create some visualizations. Next, we offer more details on the four data selection creteria:

KP-Conf: This criterion uses the detection confidence from the PLs genereated previously.
CF-MT: This criterion selects samples based on multi-transform consistency. Before running the data selection script run the following command from the pose directory, to generate PLs for multiple input image transformations.

python generate_pl_mt.py --category <category>

CF-CM: This criterion requires predictions from an auxiliary pose estimator. You can train another pose estimator as described here. Then, generate keypoint PLs with the following command (run from the pose directory):

python generate_pl.py --category <cateogory> --name <name> --is_aux

CF-CM²: This criterion requires keypoint predictions from a 3D shape predictor. You can generate keypoint PLs using ACSM with the following command (run from the acsm directory):

python scripts/generate_pl.py --scale_mesh --category <category> --name <name> --iter_num <iter_num>

Training and Evaluation with ACSM

Run Evaluation Code

You can evaluate the ACSM models by running the following command from the acsm directory.

python scripts/evaluate.py --category sheep --dataset pascal --scale_mesh --kp_anno --sfm_anno

Running the above command will compute the AUC and the camera error for 3D reconstructions of sheep in Pascal using the provided model weights. The provided models are trained with 150 labeled images and keypoint pseudo-labels from web images. You can change the arguments in --category and --dataset for evaluation with different categories and datasets respectively.

Run Visualization Code

You can visualize the predicted shapes from ACSM by running the following command from the acsm directory.

python scripts/visualize.py --category horse --dataset pascal --scale_mesh --vis_num 20

Running the above command will generate visualizations for 20 random images with horses from Pascal dataset. You can change the arguments in --category and --dataset for generating visual results with different categories and datasets respectively.

Run Training Code

You can train ACSM yourself with the acsm/scripts/train.py script. The training code is adapted from the official repo and uses vizdom for visualizations during training. Before running the training script start the vizdom server from the acsm directory with the following command:

python -m visdom.server -port <port to forward results>

Train your model with a command as shown in the following example:

python scripts/train.py --name horse_150 --category horse \
--kp_anno --scale_mesh --flip_train True --plot_scalars --display_visuals \
--use_pascal --use_coco

Running the above command will train a model for 3D shape prediction of horses with 150 labeled images and the default settings.

To utilize keypoint PLs from web images, also include the following arguments in the previous command.

--use_web_images --web_images_num <number of images with PL> --filter <selection criterion name>

Note that you need to have downloaded the web images and created keypoint PLs before including the above arguments for training your model.

Acknowledgements

Parts of the code are borrowed or adapted from the following repos:

Citation

If you find this code useful for your research or the use data generated by our method, please consider citing the following paper:

@Inproceedings{stathopoulos2023learning,
  Title  = {Learning Articulated Shape with Keypoint Pseudo-labels from Web Images},
  Author = {Stathopoulos, Anastasis and
  Pavlakos, Georgios and
  Han, Ligong and
  Metaxas, Dimitris},
  Booktitle = {CVPR},
  Year = {2023}
}

statho / animals3d Goto Github PK

animals3d's Introduction

Learning Articulated Shape with Keypoint Pseudo-labels from Web Images

Installation

Data Prepartation

Keypoint PLs on Web Images

1) Download web images

2) Bounding box detections

3) PLs with 2D pose estimator

4) Data Selection

Training and Evaluation with ACSM

Run Evaluation Code

Run Visualization Code

Run Training Code

Acknowledgements

Citation

animals3d's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org