Git Product home page Git Product logo

finnvocabcollect's Introduction

FinnVocabCollect

This code was used in the collection of TallVocabL2Fi.

Robertson, F., Chang & L., Söyrinki, S. (2022).
TallVocabL2Fi: An Extensive Mapping of 15 Finnish L2 Learners' Vocabulary.
In Language Resources and Evaluation Conference (LREC 2022).

This code is not actively developed and is provided as-is for the purposes of providing complete information about TallVocabL2Fi. You are welcome to use use the code in your own research but it comes unsupported. If you want help developing something similar, I may be able to help. Please contact me and we can discuss collaboration.

License

Apache v2

Basic info

This project has two parts, the word list pipeline, which is a Snakemake project, and the second is the self assessment website itself which is a Quart (similar to Flask) project.

Instructions for the website, which has it's own set of dependencies/pyproject.toml/Dockerfile are in selfassess.

The Snakemake project is backed by the Python module in finnwordlist. To build the Snakemake project run:

$ poetry install
$ snakemake -j1 -C WIKIPARSE_DB=/path/to/wikiparse/defns.db

Where WIKIPARSE_DB is a parsed sqlite database from wikiparse.

Other things here:

  • proc_interest.py: Process the CSV of the expressions of interest CSV.
  • plot_en_wordlist_freqs.py: Plot a frequency histogram of the words in SVL12K and all English words known to wordfreq for comparison with the plots for the Finnish wordlist. You can run this with the Snakemake rule plot_en_wordlist_freqs.

finnvocabcollect's People

Contributors

frankier avatar

Stargazers

Reuben Thomas-Davis avatar

Watchers

 avatar Li-Hsin Chang avatar

finnvocabcollect's Issues

Output the final dataset

Should dump out:

  • L1
  • Other L2s
  • Self-rating of CEFR
  • CEFR according to proof
  • Type of proof? (Is this de-anonymizing?)
  • Mobile/tablet/desktop
  • Mini exam: queries/responses/grading
  • Responses to words
  • Session length/how words are assigned to sessions

SQLite => DuckDB => TSVs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.