Git Product home page Git Product logo

dino-tracker's Introduction

DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video (ECCV 2024)

dino-tracker.mp4

Usage

  1. Setup
  2. Preprocessing
  3. Training
  4. Inference

Setup

Clone the repository:

git clone https://github.com/AssafSinger94/dino-tracker.git

Switch to the project directory:

cd dino-tracker

To setup the environment, run:

conda create -n dino-tracker python=3.9
conda activate dino-tracker
pip install -r requirements.txt

Add current path to PYTHONPATH:

export PYTHONPATH=`pwd`:$PYTHONPATH

Preprocessing

Given an input video, we start by extracting optical flow and DINO best-buddy correspondences. The input video directory should have the following structure:

├──<VIDEO_DIR>
    ├──video/
        ├──00000.png
        ├──00001.png
        ├──...
    ├──masks/ # optional
        ├──00000.png
        ├──00001.png
        ├──...

where masks contains the per-frame foreground masks. If masks is not provided, foreground masks are automatically computed using DINO features saliency maps.

In case the video is in mp4 format, convert it to frames by simply running:

python ./preprocessing/mp4_to_frames.py \
    --video-path <PATH_TO_MP4> \
    --output-folder <VIDEO_DIR_PATH>/video

To run the preprocessing pipeline, run the following:

python ./preprocessing/main_preprocessing.py \
    --config ./config/preprocessing.yaml \
    --data-path <VIDEO_DIR_PATH>

The script outputs chained optical flow trajectories, DINO embeddings and DINO best-buddies in the following structure:

├──<VIDEO_DIR>
    ├──video/
    ├──masks/
    ├──dino_best_buddies/
    ├──dino_embeddings/
    ├──of_trajectories/

Training

Once preprocessing is finished, run the following command to train DINO-Tracker:

python ./train.py \
    --config ./config/train.yaml \
    --data-path <VIDEO_DIR_PATH>

The checkpoints are saved under:

├──<VIDEO_DIR>
    ├──models
        ├──dino_tracker
            ├──delta_dino_<ITER>.pt
            ├──tracker_head_<ITER>.pt

Inference

Trajectory creation and visualization

To predict and visualize trajectories with a trained DINO-Tracker, run the following scripts sequentially:

python ./inference_grid.py \
    --config ./config/train.yaml \
    --data-path <VIDEO_DIR_PATH> \
    --use-segm-mask # optional, used for sampling only foreground points
python visualization/visualize_rainbow.py \
    --data-path <VIDEO_DIR_PATH> \
    --plot-trails # optional, used for visualizing motion trails.

The first script creates trajectories for a grid of query points in the first frame, while the second script visualizes them. The --plot-trails option is used for visualizing motion trails. Note that this option requires a segmentation mask for the first frame. If --plot-trails is not provided, the script only visualizes the tracked positions in circles. The visualizations are outputted under <VIDEO_DIR_PATH>/visualizations directory.

TAP-Vid evaluation

To evaluate on TAP-Vid-DAVIS, please see the following steps. The same steps can be applied for TAP-Vid Kinetics and BADJA datasets.

  1. Download benchmark data file tapvid_davis_data_strided.pkl from this link, put it under ./tapvid/tapvid_davis_data_strided.pkl,

  2. Download pre-trained weights and videos from this link under davis_480.zip, unzip the folder to ./dataset/tapvid-davis/,

  3. Extract DINO embeddings for all videos by running the following:

python ./preprocessing/save_dino_embed_video.py \
    --config ./config/preprocessing.yaml \
    --data-path ./dataset/tapvid-davis/<VIDEO_ID>

The above should be run for all videos in the benchmark, e.g. <VIDEO_ID> = {0, 1, ..., 29} for DAVIS.

  1. Predict trajectories on benchmark query points by running the following for all benchmark videos:
python inference_benchmark.py \
    --config ./config/train.yaml \
    --data-path ./dataset/tapvid-davis/<VIDEO_ID> \
    --benchmark-pickle-path ./tapvid/tapvid_davis_data_strided.pkl \
    --video-id <VIDEO_ID>
  1. Evaluate the model accuracy by running the following:
python ./eval/eval_benchmark.py \
    --dataset-root-dir ./dataset/tapvid-davis \
    --benchmark-pickle-path ./tapvid/tapvid_davis_data_strided.pkl \
    --out-file ./tapvid/comp_metrics_davis.csv \
    --dataset-type tapvid # tapvid | BADJA

The evaluation should output: average_pts_within_thresh: 0.8066 | occlusion_acc: 0.8854 | average_jaccard: 0.6528.

The output CSV file contains all TAP-Vid metrics (position accuracy, occlusion accuracy, Average Jaccard) for all videos.

Citation

@misc{dino_tracker_2024,
    author        = {Tumanyan, Narek and Singer, Assaf and Bagon, Shai and Dekel, Tali},
    title         = {DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video},
    month         = {March},
    year          = {2024},
    eprint        = {2403.14548},
    archivePrefix = {arXiv},
    primaryClass  = {cs.CV}
}

dino-tracker's People

Contributors

tnarek avatar assafsinger94 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.