weijiawu / transvtspotter Goto Github PK

View Code? Open in Web Editor NEW

76.0 3.0 11.0 80.12 MB

A new video text spotting framework with Transformer

Python 90.50% Jupyter Notebook 4.45% Shell 0.01% C++ 0.47% Cuda 4.57%

transvtspotter's Introduction

TransVTSpotter: End-to-end Video Text Spotter with Transformer

Introduction

A Multilingual, Open World Video Text Dataset and End-to-end Video Text Spotter with Transformer

Link to our MOVText: A Large-Scale, Multilingual Open World Dataset for Video Text Spotting

Updates

(11/05/2022) TransDETR, a better transformer-based video text spotting method has been launched.
(08/04/2021) Refactoring the code.
(10/20/2021) The complete code has been released .

ICDAR2015(video) Tracking challenge

Methods	MOTA	MOTP	IDF1	Mostly Matched	Partially Matched	Mostly Lost
TransVTSpotter	45.75	73.58	57.56	658	611	647

Notes

The training time is on 8 NVIDIA V100 GPUs with batchsize 16.
We use the models pre-trained on COCOTextV2.
We do not release the recognition code due to the company's regulations.

Demo

Installation

The codebases are built on top of Deformable DETR and TransTrack.

Requirements

Linux, CUDA>=9.2, GCC>=5.4
Python>=3.7
PyTorch ≥ 1.5 and torchvision that matches the PyTorch installation. You can install them together at pytorch.org to make sure of this
OpenCV is optional and needed by demo and visualization

Steps

Install and build libs

git clone [email protected]:weijiawu/TransVTSpotter.git
cd TransVTSpotter
cd models/ops
python setup.py build install
cd ../..
pip install -r requirements.txt

Prepare datasets and annotations

COCOTextV2 dataset is available in COCOTextV2.

python3 track_tools/convert_COCOText_to_coco.py

ICDAR2015 dataset is available in icdar2015.

python3 track_tools/convert_ICDAR15video_to_coco.py

Pre-train on COCOTextV2

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_track.py  --output_dir ./output/Pretrain_COCOTextV2 --dataset_file pretrain --coco_path ./Data/COCOTextV2 --batch_size 2  --with_box_refine --num_queries 500 --epochs 300 --lr_drop 100 --resume ./output/Pretrain_COCOTextV2/checkpoint.pth

python3 track_tools/Pretrain_model_to_mot.py

The pre-trained model is available Baidu Netdisk， password:59w8. Google Netdisk

And the MOTA 44% can be found here password:xnlw. Google Netdisk

Train TransVTSpotter

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_track.py  --output_dir ./output/ICDAR15 --dataset_file text --coco_path ./Data/ICDAR2015_video --batch_size 2  --with_box_refine  --num_queries 300 --epochs 80 --lr_drop 40 --resume ./output/Pretrain_COCOTextV2/pretrain_coco.pth

Inference and Visualize TransVTSpotter

# Inference
python3 main_track.py  --output_dir ./output/ICDAR15 --dataset_file text --coco_path ./Data/ICDAR2015_video --batch_size 1 --resume ./output/ICDAR15/checkpoint.pth --eval --with_box_refine --num_queries 300 --track_thresh 0.3

# Visualize
python3 track_tools/Evaluation_ICDAR15_video/vis_tracking.py

License

TransVTSpotter is released under MIT License.

Citing

If you use TranVTSpotter in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{wu2021opentext,
  title={A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer},
  author={Weijia Wu, Debing Zhang, Yuanqiang Cai, Sibo Wang, Jiahong Li, Zhuang Li, Yejun Tang, Hong Zhou},
  journal={35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks},
  year={2021}
}

transvtspotter's People

Contributors

Stargazers

Watchers

Forkers

hn18001 980044579 cv-ip vijin-freelancing shiyi-mu lsabrinax swall0w devindesilva shualite aniketgurav amutong

transvtspotter's Issues

Upload pretrain weights to google drive

Could you please upload the pretrain weights to google drive as its not available to download in other countries. Thanks

RuntimeError: median cannot be called with empty tensor

Traceback (most recent call last):
File "main_track.py", line 363, in
main(args)
File "main_track.py", line 326, in main
model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm)
File "TransVTSpotter/engine_track.py", line 41, in train_one_epoch
for _ in metric_logger.log_every(range(len(data_loader)), print_freq, header):
File "TransVTSpotter/util/misc.py", line 260, in log_every
meters=str(self),
File "TransVTSpotter/util/misc.py", line 210, in str
"{}: {}".format(name, str(meter))
File "TransVTSpotter/util/misc.py", line 109, in str
median=self.median,
File "TransVTSpotter/util/misc.py", line 88, in median
return d.median().item()
RuntimeError: median cannot be called with empty tensor

l think there might be something wrong with the datasets. My path of the datasets is as below:

Is that right? Can u give me some examples of the structure of the datasets or the solution to this error? Thanks!

No res_video_1.json after running "python track_tools/convert_ICDAR15video_to_coco.py"

Hi,

Thanks for your great work!

I am a bit confused after I run the
python track_tools/Evaluation_ICDAR15_video/vis_tracking.py
Then, I get
"No such file or directory: './output/ICDAR15/test/best_json_tracks/res_video_1.json'

I have seen issue #2 , and confirm I have run the
python track_tools/convert_ICDAR15video_to_coco.py
But, it seems that the "res_video_1.json" has not been generated successfully.
I only find "train.json" and "test.json" under the "annotations_coco_rotate/", should I name one of them to "res_video_1.json" and copy it to "./output/ICDAR15/test/best_json_tracks/res_video_1.json"?

Plz, help me! Thanks a lot!

Cannot reproduce results.

Thank you for the nice work! I'm having problems reproducing the results in your paper. I was hoping you can help.

I have done the following steps.

Download ICDAR15 video training and official test video dataset.
Prepare training and test dataset folder using: video2frames & convert_ICDAR15video_to_coco.
Download pretrain_coco.pth from your Baidu drive.
Train on ICDAR15 video using python -m torch.distributed.launch --nproc_per_node=8 --use_env main_track.py --output_dir ./output/icdar_tiv --dataset_file text --coco_path "${MY_DATA_DIR}/icdar_tiv" --batch_size 2 --with_box_refine --num_queries 300 --epochs 80 --lr_drop 40 --resume ./pths/pretrain_coco.pth.
Generate inferences using trained model on official test set: python main_track.py --eval --output_dir ./output/icdar_tiv_submit --resume ./output/icdar_tiv/checkpoint0079.pth --dataset_file text --coco_path "${MY_DATA_DIR}/icdar_tiv_test" --batch_size 1 --with_box_refine --num_queries 300
Zip up the results in output/icdar_tiv_submit/text/xml_dir.
Submit results to official ICDAR2015.

The resulting MOTA is 2.08% and very far from the expected ~45%. Note that the "Mostly Matched" is 842 matching reported results, so it seems that the object detection is working, but tracking is failing. Am I missing something from the code? Thanks for any help.

Couldn't get the json file

there was an error "FileNotFoundError: [Errno 2] No such file or directory: './output/ICDAR15/test/best_json_tracks/res_video_1.mp4.json"

I downloaded the IC15 video dataset and run "python track_tools/convert_ICDAR15video_to_coco.py".

And I couldn't find files in json or jpg format downloaded from the icdar website https://rrc.cvc.uab.es/?ch=3&com=downloads.
Unziped files only have '.mp4' or '.xml' and '***.txt'

How could I get the json annotatation files such as 'res_video_1.mp4.json'?