Git Product home page Git Product logo

Hi there! 👋

Links

⚡  Web   |   ✍  Blog   |   🐦  Twitter   |   🎞  Youtube   |   ☕  Coffee

Activity

🔭  Currently working on gathering texts on the Web and detecting word trends

Programming experience

🖩  First programs written on a TI-83 Plus in TI-BASIC

Top Langs


Most popular blog posts

Adrien Barbaresi's Projects

archiveis icon archiveis

A simple Python wrapper for the archive.is capturing service

btw21 icon btw21

Visualization of the most frequent words in the German federal election in 2021

coronakorpus icon coronakorpus

Material zum Aufbau eines deutschsprachigen COVID-19-Webkorpus / Building a corpus in German dedicated to coronavirus

courlan icon courlan

Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters

datatrove icon datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

dwdsmor icon dwdsmor

SFST/SMOR/DWDS-based German Morphology

flux-toolchain icon flux-toolchain

Filtering and Language-identification for URL Crawling Seeds (FLUCS) a.k.a. FLUX-Toolchain

geokelone icon geokelone

integrates spatial and textual data processing tools into a modular software package which features preprocessing, geocoding, disambiguation and visualization

german-nlp icon german-nlp

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

htmldate icon htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

jlcl-style icon jlcl-style

Experiments to modernize the LaTeX class of the JLCL

jparser icon jparser

A readability parser which can extract title, content, images from html pages

justext icon justext

Heuristic based boilerplate removal tool

laclos icon laclos

LAnguage-CLassified OpenSubtitles

microblog-explorer icon microblog-explorer

Perform crawls of social networks (identi.ca, reddit, friendfeed) to gather internal and external links and identify their language

py3langid icon py3langid

Faster, modernized fork of the language identification tool langid.py

python-readability icon python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

simplemma icon simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

toponyms icon toponyms

Old prototype for toponym extraction in historical texts written in German

trafilatura icon trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.