Code repository for the paper:
Learning Articulated Shape with Keypoint Pseudo-labels from Web Images
Anastasis Stathopoulos, Georgios Pavlakos, Ligong Han, Dimitris Metaxas
CVPR 2023
[paper] [project page]
We recommend installing the dependencies in a clean conda environment by running:
conda env create -f environment.yml
After the installation is complete, activate the conda environment by running:
conda activate animals3d
Then, you should install the Neural Mesh Renderer. Before installing run:
cd external
export CUDA_HOME=/usr/local/cuda
Check the CUDA version of PyTorch by running python -c 'import torch; print(torch.version.cuda)'
. Make sure you set the right CUDA_HOME
and build extension. Then run:
python setup.py install
The next step is to install acsm
as Python package. Since you might want to modify the code, we recommend running the following:
cd ../acsm
python setup.py develop
Now that the installation is complete, please follow the data preparation instructions to prepare the data.
We describe our pipeline for training with web images. In case you do not want to use unlabeled images from the web, you can skip this section.
We download web images from Flickr using metadata from the YFCC100m dataset. Instead of downloading all metada from YFCC100m, we used the search bar in this website that enables downloading the metadata for a desired category in a single json file. We provide the downloaded metadata for the 5 animal categories used in our paper in categ_ids.zip. Download the file and extract it in prepare_data
.
You can download web images for the desired category with the following command:
python prepare_data/download_images.py --category <category>
The downloaded images are stored in data/yfcc100m/images/<category>
.
You should create bounding box detections with by running:
python prepare_data/save_boxes.py --category <category> \
--opts MODEL.WEIGHTS detectron2://COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x/139173657/model_final_68b088.pkl
The above command will store the detections in data/yfcc100m/labels_0/<category>_bbox.json
file.
Download the weights for models trained with 150 labeled images from the checkpoints.zip and extract it in pose/results/checkpoints
. You can generate keypoint pseudo-labels by running the following command from the pose
directory
python generate_pl.py --category <cateogory>
Running the above command will create keypoint pseudo-labels with the provided weights. We also provide training and evaluation code for the pose estimation network. Details can be seen here.
We provide the code for the four data selection criteria used in the paper. To create a subset of web images and generated PLs for training, run the following command.
python prepare_data/data_selection.py --category <category> --filter <selection_criterion> --selection_num <number of images>
Adding the argument --visualize
to the above command will additionally create some visualizations. Next, we offer more details on the four data selection creteria:
- KP-Conf: This criterion uses the detection confidence from the PLs genereated previously.
- CF-MT: This criterion selects samples based on multi-transform consistency. Before running the data selection script run the following command from the
pose
directory, to generate PLs for multiple input image transformations.
python generate_pl_mt.py --category <category>
- CF-CM: This criterion requires predictions from an auxiliary pose estimator. You can train another pose estimator as described here. Then, generate keypoint PLs with the following command (run from the
pose
directory):
python generate_pl.py --category <cateogory> --name <name> --is_aux
- CF-CM2: This criterion requires keypoint predictions from a 3D shape predictor. You can generate keypoint PLs using ACSM with the following command (run from the
acsm
directory):
python scripts/generate_pl.py --scale_mesh --category <category> --name <name> --iter_num <iter_num>
You can evaluate the ACSM models by running the following command from the acsm
directory.
python scripts/evaluate.py --category sheep --dataset pascal --scale_mesh --kp_anno --sfm_anno
Running the above command will compute the AUC and the camera error for 3D reconstructions of sheep in Pascal using the provided model weights. The provided models are trained with 150 labeled images and keypoint pseudo-labels from web images. You can change the arguments in --category
and --dataset
for evaluation with different categories and datasets respectively.
You can visualize the predicted shapes from ACSM by running the following command from the acsm
directory.
python scripts/visualize.py --category horse --dataset pascal --scale_mesh --vis_num 20
Running the above command will generate visualizations for 20 random images with horses from Pascal dataset. You can change the arguments in --category
and --dataset
for generating visual results with different categories and datasets respectively.
You can train ACSM yourself with the acsm/scripts/train.py
script. The training code is adapted from the official repo and uses vizdom for visualizations during training. Before running the training script start the vizdom server from the acsm
directory with the following command:
python -m visdom.server -port <port to forward results>
Train your model with a command as shown in the following example:
python scripts/train.py --name horse_150 --category horse \
--kp_anno --scale_mesh --flip_train True --plot_scalars --display_visuals \
--use_pascal --use_coco
Running the above command will train a model for 3D shape prediction of horses with 150 labeled images and the default settings.
To utilize keypoint PLs from web images, also include the following arguments in the previous command.
--use_web_images --web_images_num <number of images with PL> --filter <selection criterion name>
Note that you need to have downloaded the web images and created keypoint PLs before including the above arguments for training your model.
Parts of the code are borrowed or adapted from the following repos:
If you find this code useful for your research or the use data generated by our method, please consider citing the following paper:
@Inproceedings{stathopoulos2023learning,
Title = {Learning Articulated Shape with Keypoint Pseudo-labels from Web Images},
Author = {Stathopoulos, Anastasis and
Pavlakos, Georgios and
Han, Ligong and
Metaxas, Dimitris},
Booktitle = {CVPR},
Year = {2023}
}