Git Product home page Git Product logo

clara's Introduction

CLARA: Confidence of Labels and Raters

An implementation of the Gibbs sampler for the model (together with simulators to generate synthetic data) used in the paper "CLARA: Confidence of Labels and Raters" (KDD'20).

@inproceedings{clara-kdd-20,
    author = {Viet-An Nguyen and Peibei Shi and Jagdish Ramakrishnan and Udi Weinsberg and Henry C. Lin and Steve Metz and Neil Chandra and Jane Jing and Dimitris Kalimeris},
    title = {{CLARA: Confidence of Labels and Raters}},
    booktitle = {Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD โ€™20)},
    year = {2020},
}

Simulating Data

Generate data without classifier scores

We can generate a dataset with 1000 items with the true prevalence theta = [0.8, 0.2] and all labelers share the same confusion matrix psi = [[0.9, 0.1], [0.05, 0.95]] as follow:

from simulator import generate_dataset_tiebreaking
df = generate_dataset_tiebreaking(
    dataset_id=0,
    theta=np.array([0.8, 0.2]),
    psi=np.array([[0.9, 0.1], [0.05, 0.95]]),
    num_items=1000,
)

The simulated data will look like:

dataset id labelers ratings true_rating
0 0_995 [0, 0] [0, 0] 0
0 0_996 [0, 0] [0, 0] 0
0 0_997 [0, 0, 0] [0, 1, 0] 0
0 0_998 [0, 0] [0, 0] 0
0 0_999 [0, 0] [0, 0] 0

Generate data with classifier scores

from simulator import generate_dataset_tiebreaking_with_scores
df = generate_dataset_tiebreaking_with_scores(
    dataset_id=1,
    theta=np.array([0.8, 0.2]),
    psi=np.array([[0.9, 0.1], [0.05, 0.95]]),
    num_items=1000,
)

Using CLARA

Fit the model

To fit a CLARA model with a single confusion matrix shared across all labelers

model = ClaraGibbs(burn_in=2000, num_samples=1000, sample_lag=3)
model.fit(A=1, R=2, ratings=np.array(df.ratings))

Estimate the prevalence

model.get_prevalence()

Estimate the confusion matrix

model.get_confusion_matrix(labeler_id=0)

Installation

Installation Requirements

  • Python >= 3.6
  • numpy
  • pandas
  • scipy

License

You may find out more about the license here.

clara's People

Contributors

jramak avatar facebook-github-bot avatar

Stargazers

Nikolai Skvortsov avatar Marek Cerny avatar Mononito Goswami avatar  avatar Ruge Zhao avatar Sebastian Souyris avatar lilkypimp1 avatar Qian Ge avatar Conrad Stack avatar Ray L. Johns avatar  avatar

Watchers

 avatar Viet-An Nguyen avatar James Cloos avatar Cami Williams avatar Amit avatar Dmitry Vinnik avatar  avatar  avatar  avatar

Forkers

tpnguyen jramak

clara's Issues

No update to posterior of psi for A > 1

In the example repo's example notebook, it is clear that when A > 1, the posterior of psi is only updated for labeler of index = 0, the rest of which does not get updated.

This seems to be because when self._init(ratings, labelers, true_ratings, scores) is called in ClaraGibbs.fit(), initializing assignment only works for the label of index = 0, the data field was not updated for other labeler indices.

I managed to replicate the same problems using my own dataset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.