Git Product home page Git Product logo

transvtspotter's Introduction

TransVTSpotter: End-to-end Video Text Spotter with Transformer

License: MIT

Introduction

A Multilingual, Open World Video Text Dataset and End-to-end Video Text Spotter with Transformer

Link to our MOVText: A Large-Scale, Multilingual Open World Dataset for Video Text Spotting

Updates

  • (11/05/2022) TransDETR, a better transformer-based video text spotting method has been launched.

  • (08/04/2021) Refactoring the code.

  • (10/20/2021) The complete code has been released .

Methods MOTA MOTP IDF1 Mostly Matched Partially Matched Mostly Lost
TransVTSpotter 45.75 73.58 57.56 658 611 647

Notes

  • The training time is on 8 NVIDIA V100 GPUs with batchsize 16.
  • We use the models pre-trained on COCOTextV2.
  • We do not release the recognition code due to the company's regulations.

Demo

Installation

The codebases are built on top of Deformable DETR and TransTrack.

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4
  • Python>=3.7
  • PyTorch ≥ 1.5 and torchvision that matches the PyTorch installation. You can install them together at pytorch.org to make sure of this
  • OpenCV is optional and needed by demo and visualization

Steps

  1. Install and build libs
git clone [email protected]:weijiawu/TransVTSpotter.git
cd TransVTSpotter
cd models/ops
python setup.py build install
cd ../..
pip install -r requirements.txt
  1. Prepare datasets and annotations

COCOTextV2 dataset is available in COCOTextV2.

python3 track_tools/convert_COCOText_to_coco.py

ICDAR2015 dataset is available in icdar2015.

python3 track_tools/convert_ICDAR15video_to_coco.py
  1. Pre-train on COCOTextV2
python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_track.py  --output_dir ./output/Pretrain_COCOTextV2 --dataset_file pretrain --coco_path ./Data/COCOTextV2 --batch_size 2  --with_box_refine --num_queries 500 --epochs 300 --lr_drop 100 --resume ./output/Pretrain_COCOTextV2/checkpoint.pth

python3 track_tools/Pretrain_model_to_mot.py

The pre-trained model is available Baidu Netdisk, password:59w8. Google Netdisk

And the MOTA 44% can be found here password:xnlw. Google Netdisk

  1. Train TransVTSpotter
python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_track.py  --output_dir ./output/ICDAR15 --dataset_file text --coco_path ./Data/ICDAR2015_video --batch_size 2  --with_box_refine  --num_queries 300 --epochs 80 --lr_drop 40 --resume ./output/Pretrain_COCOTextV2/pretrain_coco.pth
  1. Inference and Visualize TransVTSpotter
# Inference
python3 main_track.py  --output_dir ./output/ICDAR15 --dataset_file text --coco_path ./Data/ICDAR2015_video --batch_size 1 --resume ./output/ICDAR15/checkpoint.pth --eval --with_box_refine --num_queries 300 --track_thresh 0.3

# Visualize
python3 track_tools/Evaluation_ICDAR15_video/vis_tracking.py

License

TransVTSpotter is released under MIT License.

Citing

If you use TranVTSpotter in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{wu2021opentext,
  title={A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer},
  author={Weijia Wu, Debing Zhang, Yuanqiang Cai, Sibo Wang, Jiahong Li, Zhuang Li, Yejun Tang, Hong Zhou},
  journal={35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks},
  year={2021}
}

transvtspotter's People

Contributors

weijiawu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

transvtspotter's Issues

RuntimeError: median cannot be called with empty tensor

Traceback (most recent call last):
File "main_track.py", line 363, in
main(args)
File "main_track.py", line 326, in main
model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm)
File "TransVTSpotter/engine_track.py", line 41, in train_one_epoch
for _ in metric_logger.log_every(range(len(data_loader)), print_freq, header):
File "TransVTSpotter/util/misc.py", line 260, in log_every
meters=str(self),
File "TransVTSpotter/util/misc.py", line 210, in str
"{}: {}".format(name, str(meter))
File "TransVTSpotter/util/misc.py", line 109, in str
median=self.median,
File "TransVTSpotter/util/misc.py", line 88, in median
return d.median().item()
RuntimeError: median cannot be called with empty tensor

l think there might be something wrong with the datasets. My path of the datasets is as below:
image

Is that right? Can u give me some examples of the structure of the datasets or the solution to this error? Thanks!

No res_video_1.json after running "python track_tools/convert_ICDAR15video_to_coco.py"

Hi,

Thanks for your great work!

I am a bit confused after I run the
python track_tools/Evaluation_ICDAR15_video/vis_tracking.py
Then, I get
"No such file or directory: './output/ICDAR15/test/best_json_tracks/res_video_1.json'

I have seen issue #2 , and confirm I have run the
python track_tools/convert_ICDAR15video_to_coco.py
But, it seems that the "res_video_1.json" has not been generated successfully.
I only find "train.json" and "test.json" under the "annotations_coco_rotate/", should I name one of them to "res_video_1.json" and copy it to "./output/ICDAR15/test/best_json_tracks/res_video_1.json"?

Plz, help me! Thanks a lot!

Cannot reproduce results.

Thank you for the nice work! I'm having problems reproducing the results in your paper. I was hoping you can help.

I have done the following steps.

  1. Download ICDAR15 video training and official test video dataset.
  2. Prepare training and test dataset folder using: video2frames & convert_ICDAR15video_to_coco.
  3. Download pretrain_coco.pth from your Baidu drive.
  4. Train on ICDAR15 video using python -m torch.distributed.launch --nproc_per_node=8 --use_env main_track.py --output_dir ./output/icdar_tiv --dataset_file text --coco_path "${MY_DATA_DIR}/icdar_tiv" --batch_size 2 --with_box_refine --num_queries 300 --epochs 80 --lr_drop 40 --resume ./pths/pretrain_coco.pth.
  5. Generate inferences using trained model on official test set: python main_track.py --eval --output_dir ./output/icdar_tiv_submit --resume ./output/icdar_tiv/checkpoint0079.pth --dataset_file text --coco_path "${MY_DATA_DIR}/icdar_tiv_test" --batch_size 1 --with_box_refine --num_queries 300
  6. Zip up the results in output/icdar_tiv_submit/text/xml_dir.
  7. Submit results to official ICDAR2015.

The resulting MOTA is 2.08% and very far from the expected ~45%. Note that the "Mostly Matched" is 842 matching reported results, so it seems that the object detection is working, but tracking is failing. Am I missing something from the code? Thanks for any help.

Couldn't get the json file

there was an error "FileNotFoundError: [Errno 2] No such file or directory: './output/ICDAR15/test/best_json_tracks/res_video_1.mp4.json"

I downloaded the IC15 video dataset and run "python track_tools/convert_ICDAR15video_to_coco.py".

And I couldn't find files in json or jpg format downloaded from the icdar website https://rrc.cvc.uab.es/?ch=3&com=downloads.
Unziped files only have '.mp4' or '.xml' and '***.txt'

How could I get the json annotatation files such as 'res_video_1.mp4.json'?

incompatible function arguments

image
您好,我们准备好了数据集,训练过程中,在模型推理的时候出现了上述错误,请问是什么原因呢?

About Recognition model

Hi, the recognition model in your paper is MASTER. l know for some reasons u can't open the recognition code. Could you please tell me whether u use the Vanilla MASTER or the modified one?Thanks!

关于icdar2015_video的问题

作者你好,我最近在跟进TransVTSpotter,从官网下载的icdar2015_video测试集中并未提供标注结果,但在你们提供的convert_ICDAR2015video_to_coco.py代码里面有处理测试集xml的代码,想问一下如果你们有测试集标注结果的话,能否提供一下

The appendix

Great Work! Can you provide the url to the appendix?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.