Git Product home page Git Product logo

unsupervisedxpr's Introduction

Cross-lingual Phrase Retriever

This repository contains the code and pre-trained models for our paper XPR: Cross-lingual Phrase Retriever.

**************************** Updates ****************************

Overview

We propose a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences named XPR.

Dataset

We also create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs.

Getting Started

In the following sections, we describe how to use our XPR.

Requirements

  • First, install PyTorch by following the instructions from the official website. To faithfully reproduce our results, please use the correct torch==1.8.1+cu111 version corresponding to your platforms/CUDA versions. PyTorch version higher than 1.8.1 should also work.
  • Then, run the following script to fetch the repo and install the remaining dependencies.
git clone [email protected]:cwszz/XPR.git
cd xpr
pip install -r requirements.txt
mkdir data
mkdir model
mkdir result

Dataset

Before using XPR, please process the dataset by following the steps below.

  • Download Our Dataset Here: link

  • Unzip our dataset and move dataset into data folder. (Make sure the path in bash file is the path of dataset)

Checkpoint

Before using XPR, please process the checkpoint by following the steps below.

  • Download Our Checkpoint Here: link

  • Get our checkpoint files and move the files in repo into model folder.

Train XPR

bash train.sh

Evaluation

Test our method:

  • Download the XPR checkpoint from Huggingface: [link]
  • Make sure the model path and dataset path in test.sh are correct
  • The output log can be found in log folder

Here is an example for evaluate XPR:

bash test.sh

or

export CUDA_VISIBLE_DEVICES='0'
python3 predict.py \
--lg $lg \
--test_lg $test_lg \
--dataset_path ./datset/ \
--load_model_path ./model/pytorch_model.bin \
--queue_length 0 \
--unsupervised 0 \
--wo_projection 0 \
--layer_id = 12 \
> log/test-${lg}-${test_lg}-32.log 2>&1
  • $lg: The language on which the model was trained
  • $test_lg: The language on which the model will be tested on
  • --dataset_path: The path of dataset folder
  • --load_model_path: The path of checkpoint folder
  • --queue_length: The length of memory queue
  • --unsupervised: Unsupervised mode
  • --wo_projection: Without SimCLR projection head
  • --layer_id: The layer to represent phrase

References

Please cite this paper, if you found the resources in this repository useful.

unsupervisedxpr's People

Contributors

cwszz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.