Git Product home page Git Product logo

Bvarian Oberland

👋 Hi there

I'm currently working on the awesome Flair library and love contributing to various open source projects.

📰 Latest news

Latest news of new language models, PRs and many more!

  • 28.03.2024: New project: NER models on the recently released CO-Fun NER dataset. Repo is here with a lot of fine-tuned models on the Model Hub.

  • 23.12.2023: New project: NER Datasets for Historical German (HisGermaNER) is out and available on the Model Hub here.

  • 11.10.2023: New launch of hmBench project: it benchmarks Historical Multilingual Language Models such as hmBERT, hmTEAMS and hmByT5, see here.

  • 25.05.2023: New project: Historical Multilingual and Monolingual ELECTRA Models is released here.

  • 25.05.2023: Several ByT5 Historical Language Models are released under hmByT5 Preliminary and hmByT5 are released on the Hugging Face Model Hub. More information can be found in this repository.

  • 06.03.2023: Updated Ukrainian ELECTRA repository, see here.

  • 05.02.2023: New repository on experiments for XLM-V 🤗 Transformers Integeration, see here.

  • 03.02.2023: New repository for on-going evaluation of German T5 models on the GermEval 2014 NER task is up now! See here.

  • 28.01.2023: Start of new language models trained on the British Library corpus (model size ranges from 110M to 1B!), repository is here.

  • 23.01.2023: New German T5 models are released (trained on the the head and middle of GC4 corpus) and are available here.

  • 09.06.2022: Preprint of our upcoming HIPE-2022 Working Notes paper is now available here: hmBERT: Historical Multilingual Language Models for Named Entity Recognition.

  • 20.02.2022: Check out our new GermanT5 organization - expect new T5 models for German soon!

  • 14.12.2021: New badge: Member of Hugging Face Supporter org now 🎉

  • 13.12.2021: Release of Historical Language Model for Dutch (trained on Delpher corpus) - see repo here.

  • 06.12.2021: Release of smaller multilingual Historical Language Models (ranging from 2-8 layers) - see repo here.

  • 18.11.2021: Release of new multilingual and monolingual Historical Language Models - as preparation for upcoming CLEF-HIPE 2022 - see repo here.

  • 23.09.2021: Release of ConvBERTurk (cased and uncased) and ELECTRA (uncased) trained on Turkish part of mC4 corpus - see repo here.

  • 07.09.2021: Release of new larger German GPT-2 model - see model hub card here.

  • 17.08.2021: Release of new re-trained German GPT-2 model - see repo here.

  • 05.07.2021: Preprint of the ICDAR 2021 paper "Data Centric Domain Adaptation for Historical Text with OCR Errors" together with Luisa März, Nina Poerner, Benjamin Roth and Hinrich Schütze is out now!

  • 24.06.2021: Turkish Language Model Zoo repo got a new logo from Merve Noyan, please follow her! Additionally, a new Turkish ELECTRA model was released, that was trained on the Turkish part of multilingual C4 dataset. More details here.

  • 03.05.2021: GC4LM: A Colossal (Biased) language model for German was released. Repo with more details here.

  • 27.04.2021: Our paper "Data Centric Domain Adaptation for Historical Text with OCR Errors" was accepted at ICDAR 2021. More details soon!

  • 16.03.2021: Turkish model zoo is still growing! Public release of ConvBERTurk - see repo here.

  • 07.02.2021: Public release of German Europeana DistilBERT and ConvBERT models. Repo with more information is here.

  • 28.01.2021: Expect a new German Europeana ELECTRA Large model incl. a distilled German Europeana BERT model soon 🤗

  • 16.11.2020: Public release of French Europeana BERT and ELECTRA models - see repository here.

  • 16.11:2020: Public release of a German GPT-2 model (incl. fine-tuned model on Faust I and II). Repo with more information is available here.

  • 11.11.2020: Public release of Ukrainian ELECTRA model. Repo is now available here.

  • 11.11.2020: New workstation build (RTX 3090 and Ryzen 9 5900X) has completed! Expect a lot of new Flair/Transformers models in near future!

  • 02.11.2020: Public release of Italian XXL ELECTRA model. New repo for Italian BERT and ELECTRA models is now available here 🎉

  • 22.10.2020: Preprint of "German's Next Language Model" is now available here. Models are also available on the Hugging Face model hub 🎉

  • 22.10.2020: Our shared task paper Triple E - Effective Ensembling of Embeddings and Language Models for NER of Historical German together with Luisa März is released 🎉

  • 30.09.2020: "German's Next Language Model" together with Branden Chan and Timo Möller was accepted at COLING 2020! Expect new language models for German on the Hugging Face model hub soon 🤗

  • 23.09.2020: Flair in version 0.6.1 is out now!

  • 02.09.2020: Slow response time - I'm currently focussing on EACL 2021. Expect great new things 😎

  • 18.08.2020: French BERT model, trained on Historical newspapers from Europeana: find the model here and the corresponding repository here.

📃 Publications

📃 Preprints

💬 Contact

Please open an issue in the corresponding repository, tag me (@stefan-it) in issues/prs/commits on GitHub or connect with me on LinkedIn :)

Stefan Schweter's Projects

adapters icon adapters

A Unified Library for Parameter-Efficient and Modular Transfer Learning

albert icon albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

asp icon asp

PyTorch implementation and pre-trained models for ASP - Autoregressive Structured Prediction with Language Models, EMNLP 22. https://arxiv.org/pdf/2210.14698.pdf

awesome-huggingface icon awesome-huggingface

🤗 A list of wonderful open-source projects & applications integrated with Hugging Face libraries.

bertram icon bertram

This repository contains the code for "BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Representations".

blbooks-lms icon blbooks-lms

Pretrained Language Models on British Library Corpus

bpemb icon bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

datasets icon datasets

🤗 Fast, efficient, open-access datasets and evaluation metrics in PyTorch, TensorFlow, NumPy and Pandas

delpher-lm icon delpher-lm

Language Model for Historic Dutch (Delpher Corpus)

demetsiiify icon demetsiiify

Web service for creating and hosting IIIF manifests from METS/MODS documents

electra icon electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

fairseq icon fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

farm icon farm

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry.

gc4lm icon gc4lm

GC4LM: A Colossal (Biased) language model for German

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.