Git Product home page Git Product logo

bert_feature_extract's Introduction

An Example to use BERT to extract Features of ANY Text

This repo aims at providing an easy to use and efficient code for extracting text features using BERT pre-trained models.

It has been originally designed to extract features of text instructions in the R2R dataset for Visual and Language Navigation Task in an efficient manner.

This repo provides a simple python script for the BERT Feature Extraction: Just imitate the instr_loader.py to design another PyTorch dataset class for your text data (mainly your text data reading method) if necessary and import your dataset class in extract.py, and the script will take care of the BERT text data preprocessing (e.g. BERT tokenization, adding special keys to each sentence, padding, etc) and feature extraction using state-of-the-art models.

Requirements

Quick Start

Take R2R_test.json annotation file as an example:

python extract.py \
    --input data/raw_data/R2R_test.json \
    --num_workers 2 \
    --bert_model bert-base-uncased

Please note that the script is intended to be run on ONE single GPU only. If multiple GPUs are available, please make sure that only one free GPU is set visible by the script with the CUDA_VISIBLE_DEVICES variable environment for example.

Downloading pre-trained models (Optional)

Since the BERT pre-trained model's download speed of the transformers package is not fast enough in some areas of the world, we also create a mirror on Baidu Drive (i.e., Baidu PAN). Some BERT pre-trained models cache listed below can be downloaded with the shared link https://pan.baidu.com/s/1CFVzy5we8JM2PP-TPFq_Sg and access code xl9v.

  • Model List
    • bert-base-uncased
    • bert-large-uncased

Put the pre-trained model cache file in ~/.cache/torch/transformers/ and you can load a model directly via transformers API without modifying any package code.

bert_feature_extract's People

Contributors

jyotidabass avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.