nhtlongcs / aic2022-ver Goto Github PK

Text Query based Traffic Video Event Retrieval with Global-Local Fusion Embedding

Python 97.17% Shell 2.17% Dockerfile 0.66%

image-text-matching retrieval aicitychallenge pytorch pytorch-lightning

aic2022-ver's Introduction

AIC2022-Video-Event-Retrieval

This repo contains the code and data for our project, which was accepted at CVPRW 2022. Our project is a new approach to a natural language-based vehicle retrieval task. paper

For reproducibility, we also provide a colab notebook that contains the code for reproducing the results.

Development environment

Before using this repo, please use the environment setup as below.

Pre-installation

Install conda according to the instructions on the homepage Before installing the repo, we need to install the CUDA driver version >=10.2.

$ conda env create -f environment.yml
$ conda activate hcmus
$ pip install -r requirements.txt
$ pip install -e .

Prepare data

Create a symbolic link to the data directory in the data directory of the project.

$ cd /Users/your_short_username/path/to/where/you/want/to/put/the/symlink
$ ln -s /Volumes/HDD_name/path/to/where/you/are/storing/the/moved/files    symbolic_link_name_you_want_to_use

Ensure your data folder structure as same as our data_sample before running the code.

$ ./tools/extract_vdo2frms_AIC.sh ./data/AIC22_Track2_NL_Retrieval/ ./data/meta/extracted_frames/
$ cp ./data/AIC22_Track2_NL_Retrieval/*.json ./data/meta/
$ ./tools/preproc_motion.sh ./data/meta
$ ./tools/preproc_srl.sh ./data/meta

For detail, please take a look at extract data notebook

For testing purpose, you can use the command above with data_dir is ./data_sample/meta

Reading detail document of preprocessing part can be found in the srl part and basic part (adapted from hcmus team and alibaba team source code).

Inference

We provide a simple inference script for inference purpose. With artifacts/ is the directory where you store the trained classification model.

$ ./tools/infer.sh ./data/meta/

For detail, please take a look at Predictor class in src/predictor.py or inference notebook

Training

Updating

Deployment (not working yet)

For deployment/training purpose, docker is an ready-to-use solution.

To build docker image:

$ cd <this-repo>
$ DOCKER_BUILDKIT=1 docker build -t aic22:latest .

To start docker container:

$ docker run --rm --name aic-t2 --gpus device=0 --shm-size 16G -it -v $(pwd)/:/home/workspace/src/ aic22:latest /bin/bash

With device is the GPU device number, and shm-size is the shared memory size (should be larger than the size of the model).

To attach to the container:

$ docker attach aic-t2

Contribution guide

If you want to contribute to this repo, please follow steps below:

Fork your own version from this repository
Checkout to another branch, e.g. fix-loss, add-feat.
Make changes/Add features/Fix bugs
Add test cases in the test folder and run them to make sure they are all passed (see below)
Create and describe feature/bugfix in the PR description (or create new document)
Push the commit(s) to your own repository
Create a pull request on this repository

pip install pytest
python -m pytest tests/

Expected result:

============================== test session starts ===============================
platform darwin -- Python 3.7.12, pytest-7.1.1, pluggy-1.0.0
rootdir: /Users/nhtlong/workspace/aic/aic2022
collected 10 items

tests/test_args.py ...                                                     [ 30%]
tests/test_utils.py .                                                      [ 40%]
tests/uts/test_dataset.py .                                                [ 50%]
tests/uts/test_eval.py .                                                   [ 60%]
tests/uts/test_extractor.py ...                                            [ 90%]
tests/uts/test_model.py .                                                  [100%]

aic2022-ver's People

Contributors

Stargazers

Watchers

Forkers

huypl53

aic2022-ver's Issues

Add relation graph extractor

This module extracts relation graph between tracks, the outputs determine which tracks follows or is_followed by other tracks.

Sort uuids for label consistency when resume training

The uuids might be changed when resuming training, which can hurt the performance of model (affect the instance loss)
Part of code that produces this error:

class CityFlowNLDataset(Dataset):
      ....
      self.list_of_uuids = list(tracks.keys())  # the order can be different when resuming
      self.list_of_tracks = list(tracks.values())
      ...
      self.all_indexs = list(range(len(self.list_of_uuids)))
def __getitem__(self, index):
      tmp_index = self.all_indexs[index] # then this will be used as target for instance loss
      ...

Update result visualize (UI version)

Adding new version of hcmus visualize tool
Update document (input format, convert scripts)

Add loss monitor

display multiple loss for training process analyst purpose

Update logging (vis)

Add logging callback (support visualize query - tracklet/image)
Usage of logging callback

Fix metric

The implement metric only calculates the results on 1 batch and averages.

We need to call metric.calculate at the on_eval_end step to perform the search on the entire dataset. Currently, the similarity search method only works in 1 batch, so the top 5 is quite high (compared to the random percent of 5/8). Please correct by append embedding to the list on on_val_step model hook

Add 2 dataset/dataloader for query text features and visual features (motion + crop).
Add a new class for inference (nn.Module or pl.LightningModule), which can load in checkpoint and has two functions: encoding texts and encoding visual
Finally, use Faiss to calculate similarity, save results to file for submission and visualization
Small example snippet

@torch.no_grad()
def inference(self):
    self.model.eval()

    # Extract lang feats
    lang_results = {}
    for idx, batch in enumerate(self.lang_dataloader):
        batch = move_to(batch, self.device)
        lang_feats = self.model.encode_nlang_feats(batch)
        lang_feats = move_to(lang_feats, torch.device('cpu')).detach().numpy()
        ids = batch['ids']
        for lang_id, lang_feat in zip(ids, lang_feats):
            lang_results[lang_id] = lang_feat.tolist()
        
    with open(osp.join(self.savedir, 'text_embeds.json'), 'w') as f:
        json.dump(lang_results, f)

    # Extract visual feats
    visual_results = {}
    for idx, batch in enumerate(self.visual_dataloader):
        batch = move_to(batch, self.device)
        visual_feats = self.model.encode_visual_feats(batch, inference=True)
        visual_feats = move_to(visual_feats, torch.device('cpu'))
        visual_feats = visual_feats.detach().numpy()
        ids = batch['ids']
        for visual_id, visual_feat in zip(ids, visual_feats):
            visual_results[visual_id] = visual_feat.tolist()
        
    with open(osp.join(self.savedir, 'visual_embeds.json'), 'w') as f:
        json.dump(visual_results, f)

    # Faiss retrieval
    retriever = FaissRetrieval(dimension=self.dimension)
    query_embeddings = np.stack(lang_results.values(), axis=0).astype(np.float32)
    gallery_embeddings = np.stack(visual_results.values(), axis=0).astype(np.float32)
    query_ids = list(lang_results.keys())
    gallery_ids = list(visual_results.keys())

    retriever.similarity_search(
        query_embeddings,
        gallery_embeddings,
        query_ids,
        gallery_ids,
        top_k=self.top_k,
        save_results=osp.join(self.savedir, 'retrieval_results.json'))

Add meta classifer

Add stop turn detector

This module is for post process, helps filter out retrieval prediction
Should be used on test tracks to determine which track has turn or stop actions

Add competition metric (MRR)
Create abstract metric class for retrieval purpose (as good as possible)
Add new auto tests

Adapt module (tracking, detect veh) from hcmus 2021 source code
Add document
Add tests