Git Product home page Git Product logo

methodius's Introduction

Howdy ๐Ÿ‘‹

My name is Frank, but the internet knows me as Paceaux.

I'm a front-end developer, CMS consultant, software architect, JavaScript engineer, language learner (๓ ฟ๐Ÿด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ซ๐Ÿ‡ท๐Ÿ‡ต๐Ÿ‡น๐Ÿ‡ฎ๐Ÿ‡ช๐Ÿด๓ ง๓ ข๓ ท๓ ฌ๓ ณ๓ ฟ๐Ÿ‡ฎ๐Ÿ‡ฑ๐Ÿ‡ธ๐Ÿ‡พ๐Ÿ‡ฉ๐Ÿ‡ช๐Ÿ‡น๐Ÿ‡ท), aspiring linguist, semi-professional woodworker and semi-amateur writer.

You might know me for:

Silly Stats

Paceaux's GitHub stats

๐Ÿ’ป Currently Coding &

Making that ๐Ÿ’ฐ at

Red Hat, as a senior front-end Drupal engineer.

๐Ÿ”ญ working on

๐ŸŒฑ learning about

  • Linguistics, particularly phonotactics
  • TensorFlow and Machine learning.

Maybe it can help me find some common sound patterns in Germanic (๐Ÿด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ๐Ÿ‡ฉ๐Ÿ‡ช), Romance (๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ซ๐Ÿ‡ท๐Ÿ‡ต๐Ÿ‡น๐Ÿ‡ฆ๐Ÿ‡ฉ) and Celtic (๐Ÿ‡ฎ๐Ÿ‡ช๐Ÿด๓ ง๓ ข๓ ท๓ ฌ๓ ณ๓ ฟ) languages โ€ฆ

๐Ÿ‘ฏ seeking collaborators for

  • helping Methodius get better at identifying n-grams in non-western languages, like Arabic.
  • getting Isidore to do a better job at spotting verbs in English.

๐Ÿ’ฌ Answered asks are:

Where are you on the interwebz?

All over. It's probably best if you just checkout LinkTree

What are your pronouns?

\b((h(e|i(s|m(self|\b))))|((e|รฉ)l(e|\b))|il(s|\b)|s(on|u|eu)|l(o|e|ui(-mรชme|\b))|o)\b/gi

๐Ÿ“ซ How can I send you a very important message?

I occasionally blog. You can definitely get my attention there.

methodius's People

Contributors

paceaux avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

methodius's Issues

Numbers are treated as ngrams / words

When analyzing the text of the Great Gatsby, Methodius captured some things it interpreted as ngrams which were just numbers.

We need a test to eliminate numbers.

diacritic opt-in for frequencies

right now n-grams work by stripping diacritics; all frequencies/ results take away accent marks.

We should at least offer the ability to opt-in to diacritics.
Seeing confusing results on the demo:

in portuguese, รฃo is listed, but we can't highlight (a problem with the demo, but still)

With French, we have รฉรจรช to account for and it may be useful to see this separate.

n-gram combinations

Need ability to discover n-grams that commonly occur with other n-grams.

i.e. How would I discover "tion"

e.g.
"ion"
"tio"
nation => nat ati tio ion
vacation => vac aca cat tio ion
station => sta ati tio ion

Remove Rollup. Consider ESBuild?

According to a few diff folks, Rollup may be overkill. Remove it.

Possibly consider ESBuild? Leave Babel in place, though. Maybe.

Certain punctuation marks being considered bigrams

After analyzing Alice in Wonderland and Huckleberry Finn, some unusual bigrams popped up:

{
    "โ€˜e": 1,
    "โ€˜i": 10,
    "โ€œโ€˜": 3,
    "โ€˜โ€”": 1,
    "โ€œ_": 18,
    "_e": 3,
    "โ€”c": 3,
    "โ€˜l": 1,
    "โ€˜s": 1,
    "_โ€": 11,
    "w-": 1,
    "nโ€”": 12,
    "โ€”e": 10,
    "nโ€”": 57,
    "โ€œ_": 54,
    "_m": 53,
    "e-": 115,
    "-y": 11,
    "_โ€": 32,
    "โ€˜_": 1,
    "_-": 2,
    "โ€˜t": 4,
    "mโ€": 2,
    "wโ€": 1,
    "โ€˜h": 1,
    "lโ€": 1,
    "tb": 1,
    "lh": 1,
    "โ€˜e": 1,
}
  1. Need to analyze the texts and see what the context is for these cases
  2. determine if we can just do some sort of normalization (which is prob the case for the quotes)
  3. otherwise, exclude things like - and _

Word bigrams

Create properties/ methods for getting word bigrams. Maybe?

put a getNGrams on the member

Right now the class doesn't have many methods, but one thing it lacks is the ability to choose an arbitrary ngram (e.g. quadrigram). Create a member method for this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.