Git Product home page Git Product logo

prism's Introduction

Making Translators Privacy-aware on the User’s Side (TMLR 2024)

arXiv

We propose PRISM to enable users of machine translation systems to preserve the privacy of data on their own initiative.

Paper: https://arxiv.org/abs/2312.04068

✨ Summary

Overview of PRISM: PRISM converts the input sentence into a privacy-less sentence and sends it to the machine translation system. PRISM then converts the translated sentence back into the original sentence.

💿 Preparation

Install Poetry and run the following command:

$ poetry install
$ poetry run bash prepare.sh

Set an OpenAI API key in .env.

🧪 Evaluation

$ poetry run python eval.py --method prismstar --translator chatgpt
$ poetry run python eval.py --method prismr --translator chatgpt
$ poetry run python eval.py --method nodecode --translator chatgpt
$ poetry run python eval.py --method pup --translator chatgpt

Please refer to the help command for further options.

$ poetry run python eval.py -h
usage: eval.py [-h] [--lang LANG] [--basedir BASEDIR] [--rates RATES] [--method {pup,prismr,prismstar,nodecode}] [--translator {chatgpt,t5,t5-gpu}]

optional arguments:
  -h, --help            show this help message and exit
  --lang LANG
  --basedir BASEDIR
  --rates RATES
  --method {pup,prismr,prismstar,nodecode}
  --translator {chatgpt,t5,t5-gpu}

Results

Results. PRISM* strikes an excellent balance between privacy and translation quality.

Please refer to the paper for more details.

⛏️ How to Build a Dictionary by Yourself

Run the following command to extract candidate words from the corpus. It uses load_mctest() for the corpus. You can replace it with your own corpus. In general, it is recommended to use the same or similar corpus as the one used in the evaluation.

$ poetry run python extract_all_words.py

Then, run the following command to build a dictionary. It build a dictiory based on wmt14 dataset (i.e., a public news corpus).

$ poetry run python build_dict.py 1 -1 --target French
$ poetry run merge_cand_words.py cand_words_French_1000

Bulding the entire dictionary may take a long time. You can build each part separately (in separate machines) and merge them.

$ poetry run python build_dict.py 1 100 --target French
$ poetry run python build_dict.py 100 200 --target French
$ poetry run python build_dict.py 200 300 --target French
...
$ poetry run merge_cand_words.py cand_words_French_1000

🖋️ Citation

@article{sato2024making,
  author    = {Ryoma Sato},
  title     = {Making Translators Privacy-aware on the User’s Side},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
}

prism's People

Contributors

joisino avatar

Stargazers

Ryuichiro Hataya avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.