Git Product home page Git Product logo
🙂 Vincent D. Warmerdam
┣━━ 📦 Open Source Packages
┃   ┣━━ bulk              - simple bulk labelling interface
┃   ┣━━ embetter          - embeddings ready for sklearn
┃   ┣━━ doubtlab          - suite of tools to help find bad labels
┃   ┣━━ drawdata          - draw datasets in jupyter
┃   ┣━━ scikit-lego       - lego bricks for sklearn
┃   ┣━━ scikit-partial    - partial_fit() pipelines for sklearn
┃   ┣━━ scikit-bloom      - bloom transformers for sklearn
┃   ┣━━ human-learn       - rule-based components for sklearn
┃   ┣━━ sentence-models   - a different take on textcat
┃   ┣━━ mktestdocs        - turn markdown files into pytest tests
┃   ┣━━ lazylines         - lightweight utils for .jsonl wrangling
┃   ┣━━ cluestar          - inspiration for your first text labels
┃   ┣━━ durations         - pytest duration insights
┃   ┣━━ tuilwindcss       - tailwindcss for textual tui apps
┃   ┣━━ memo              - saves a whole log of time
┃   ┣━━ skedulord         - makes cron a bit more fun
┃   ┣━━ icepickle         - cool and safe storage for linear models
┃   ┗━━ evol              - grammar for genetic heuristics
┣━━ 👍 Project Contributions
┃   ┣━━ fairlearn         - contributed the CorrelationFilter
┃   ┣━━ polars            - contributed the .pipe() method
┃   ┗━━ BERTopic          - added lightweight sklearn pipeline support
┣━━ ⭐ Online Projects
┃   ┣━━ calmcode.io       - intermediate developer education
┃   ┣━━ koaning.io        - personal blog
┃   ┗━━ dearme.email      - reflection via a 30 day delay
┣━━ 🎙️ Popular Talks
┃   ┣━━ Natural Intelligence is All You Need
┃   ┣━━ Group-by statements that save the day
┃   ┣━━ Tools to Improve Training Data
┃   ┣━━ Optimal on Paper, Broken in Reality
┃   ┣━━ Playing by the Rules-Based-Systems
┃   ┣━━ How to Constrain Artificial Stupidity
┃   ┣━━ The Profession of Solving the Wrong Problem
┃   ┣━━ Winning with Simple, even Linear, Models
┃   ┗━━ Untitled12.ipynb
┣━━ 🔬 Random Experiments
┃   ┣━━ scikit-prune   - prune scikit learn pipelines
┃   ┣━━ gitlit         - tracking github action times across open source
┃   ┣━━ sentimany      - many sentiment models, one repo
┃   ┣━━ tokenwiser     - sklearn token tricks
┃   ┣━━ clumper        - functional API for lists of dicts
┃   ┗━━ whatlies       - exploration tools for word embeddings
┗━━ 👨‍💻 Employer
    ┣━━ 🎲 :probabl.   - scikit-learn and friends
    ┃   ┣━━ scikit-churn      - safety rails for churn work
    ┃   ┗━━ scikit-playtime   - rethinking pipelines
    ┣━━ 💥 Explosion   - developer tools for nlp
    ┃   ┣━━ prodigy-hf        - Prodigy integration for the HuggingFace stack
    ┃   ┣━━ prodigy-pdf       - Annotate PDFs via Prodigy
    ┃   ┣━━ prodigy-ann       - ANN techniques to find relevant subsets
    ┃   ┣━━ prodigy-segment   - Prodigy integration for Segment Anything
    ┃   ┣━━ prodigy-lunr      - Search techniques to find relevant subsets
    ┃   ┣━━ prodigy-whisper   - Transcribe audio with OpenAI's whisper models
    ┃   ┣━━ prodigy-tui       - Prodigy from the terminal
    ┃   ┗━━ cluestar          - inspiration for your first text labels
    ┗━━ 🤖 Rasa        - conversational software provider
        ┣━━ nlu examples      - custom nlu components for Rasa
        ┣━━ taipo             - data augmentation tools
        ┗━━ algo whiteboard   - nlp education

Follow me on twitter @fishnets88

vincent d warmerdam 's Projects

akin icon akin

Some text similarity utilities

apricot icon apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html

augmenty icon augmenty

Augmenty is an augmentation library based on spaCy for augmenting texts.

baseliner icon baseliner

baseliner offers simple models that can act as a baseline to compare against

benchy icon benchy

Fun datasets for some light benchmarks.

bertopic icon bertopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

blog icon blog

Public repo for HF blog posts

boondoc icon boondoc

lightweight Python API docs for markdown

breakout-garden icon breakout-garden

Documentation, software, and examples for the Breakout Garden ecosystem.

brent icon brent

bayesian graphical modelling and a bit of do-calculus for discrete data.

buggingface icon buggingface

Let's see what we can learn from poking huggingface models.

bulk icon bulk

A Simple Bulk Labelling Tool

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.