Git Product home page Git Product logo

tsurara's Introduction

tsurara

  1. pip install -r requirements.txt
  2. python -m unidic download (downloads a 500MB dictionary file)
  3. python -m tsurara review -i subtitle_file.srt -o output_file.csv

This will:

  • parse the subtitle file
  • tokenize the Japanese text into words
  • filter out words that you probably don't care about (particles, proper nouns, suffixes, etc.)
  • sort the words in descending order of frequency of appearance in an anime dataset
  • drop you into the main loop that iterates over words

While in the loop, tsurara incrementally saves data to a file called .tsurara.json in your home directory (which you don't need to touch) and your specified output csv. Even if you exit at any point, your progress until that point is saved.

In the main loop, you have the following options:

  • known: mark a word as known forever. you will never see it again
  • reveal meaning: show meaning of the word and redisplay the menu. if it revealed the meaning by default it would be hard to tell if you knew the word
  • add: add the word to your output file (there are a few submenus to choose the reading and definition). this also marks the word as known
  • skip: do nothing. go to next word
  • ignore: mark a word as ignored forever. this is does more or less the same thing as known but is separate because I'm going to add an option to clear ignored words later
  • quit: exit immediately. your progress excluding the current word is saved.

The first letter of each option corresponds to its keyboard shortcut.

The advantage of this tool is that as you process more files, your known words list will grow. Since words are sorted by descending frequency, loading a new file will show you the most common words you don't know. So even if you don't go through all the words in a file you're spending your time optimally.

tsurara's People

Contributors

sidmani avatar

Stargazers

 avatar Tommy Bui Nguyen avatar Rindy avatar Zubayr Ali avatar  avatar Hubert Rozmarynowski avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.