Git Product home page Git Product logo

gender_bias_finbert's Introduction

“Pekka”s are 260 times more likely to be an engineer than “Emilia”s! And “Tiia”s are 410 times more likely to love shopping than “Matti”s!

Possible Gender Bias in Finnish BERT? ====================

The purpose of this mini-experiment is to:

  • quickly showcase how easy it is nowadays to load a pretrained language model and start using it for various tasks such as feature extraction, question answering, mask filling etc. in just a couple of lines
  • thank all the researchers and practitioners that made the abovementioned point possible
  • raise awareness to possible gender bias / stereotypical bias and other types of biases in these models
  • open up a constructive discussion around this topic and consequently learn from each other as a community
  • (hopefully) help the pursuit of inventing more equitable models without compromising accuracy

and is not to:

  • blame or accuse
  • play the victim (I honestly don't know how is this even possible but somehow got accused already in the first 24 hours ¯\_(ツ)_/¯)
  • drop the science hat and get political
  • get stuck in discussing specific examples or edge cases and lose the focus of the big picture

Note: this is just a quick & dirty demo to spark up some discussion. There are several studies for quantifying bias in contextual language models diligently, for example.

Google Colab Online Demo --------------

Open in Colab

Locally

Run Gender_Bias_FinBERT_showcase.ipynb

Remarks

1 - As expected from the training data, the model is obsessed with Finnish politicians:

sentence = f"{pipe.tokenizer.mask_token} sanoi."
pipe(sentence, top_k=10)

[{'score': 0.05185465142130852,
  'sequence': '[CLS] Tuomioja sanoi. [SEP]',
  'token': 15697,
  'token_str': 'Tuomioja'},
 {'score': 0.04348713159561157,
  'sequence': '[CLS] Soini sanoi. [SEP]',
  'token': 8574,
  'token_str': 'Soini'},
 {'score': 0.04261799901723862,
  'sequence': '[CLS] hän sanoi. [SEP]',
  'token': 361,
  'token_str': 'hän'},
 {'score': 0.04241926968097687,
  'sequence': '[CLS] Niinistö sanoi. [SEP]',
  'token': 5975,
  'token_str': 'Niinistö'},
 {'score': 0.042333755642175674,
  'sequence': '[CLS] Hän sanoi. [SEP]',
  'token': 737,
  'token_str': 'Hän'},
 {'score': 0.038560837507247925,
  'sequence': '[CLS] Vanhanen sanoi. [SEP]',
  'token': 9499,
  'token_str': 'Vanhanen'},
 {'score': 0.03149525821208954,
  'sequence': '[CLS] Stubb sanoi. [SEP]',
  'token': 12173,
  'token_str': 'Stubb'},
 {'score': 0.03149261325597763,
  'sequence': '[CLS] Lipponen sanoi. [SEP]',
  'token': 10367,
  'token_str': 'Lipponen'},
 {'score': 0.025139369070529938,
  'sequence': '[CLS] Sipilä sanoi. [SEP]',
  'token': 9517,
  'token_str': 'Sipilä'},
 {'score': 0.01755082793533802,
  'sequence': '[CLS] Katainen sanoi. [SEP]',
  'token': 11015,
  'token_str': 'Katainen'}]

2 - Multilingual BERT shows less inbalance in several examples but its Finnish token vocabulary is limited

Finnish BERT model - FinBERT v1 --------------

HuggingFace Transformers model card: TurkuNLP/bert-base-finnish-cased-v1

Publication

TurkuNLP FinBERT page

gender_bias_finbert's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.