Git Product home page Git Product logo

spear4highfidelity's Introduction

#Running ipynb within visual studio and venv environment Link How to do it I assume that you are a tidy person: you have all your work sorted out in folders; one of these folders contains your very important project and it looks like this:

$ ls

project/ ├── data ├── docs ├── src └── test Inside this folder create a new virtual environment:

python -m venv projectname Then activate it:

source projectname/bin/activate Now, from inside the environment install ipykernel using pip:

pip install ipykernel And now install a new kernel:

ipython kernel install --user --name=projectname At this point, you can start jupyter, create a new notebook and select the kernel that lives inside your environment.

#Downloading CORD dataset You can download the CORD dataset from https://github.com/clovaai/cord Add it to the data folder Run cord.ipynb file

Lines of code visitors PyPI docs license website



Semi-Supervised Data Programming for Data Efficient Machine Learning

SPEAR is a library for data programming with semi-supervision. The package implements several recent data programming approaches including facility to programmatically label and build training data.

Pipeline

  • Design Labeling functions(LFs)
  • generate pickle file containing labels by passing raw data to LFs
  • Use one of the Label Aggregators(LA) to get final labels



SPEAR provides functionality such as

  • development of LFs/rules/heuristics for quick labeling
  • compare against several data programming approaches
  • compare against semi-supervised data programming approaches
  • use subset selection to make best use of the annotation efforts
  • facility to store and save data in pickle file

Labelling Functions (LFs)

  • discrete LFs - Users can define LFs that return discrete labels
  • continuous LFs - return continuous scores/confidence to the labels assigned

Approaches Implemented

You can read this paper to know about below approaches

  • Only-L
  • Learning to Reweight
  • Posterior Regularization
  • Imply Loss
  • CAGE
  • Joint Learning

Data folder for SMS & TREC can be found here. This folder needs to be placed in the same directory as notebooks folder is in, to run the notebooks or examples.

Direct download of the zip file can be done via wget using gdown library .

pip install gdown
gdown 1CJZ73nNa7Ho0BOSDgGx9CRvXoepVSpet

Installation

  • Install Submodlib library pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ submodlib

Method 1

To install latest version of SPEAR package using PyPI:

pip install decile-spear

Method 2

SPEAR requires Python 3.6 or later. First install submodlib. Then install SPEAR:

git clone https://github.com/decile-team/spear.git
cd spear
pip install -r requirements/requirements.txt

Citation

@misc{abhishek2021spear,
      title={SPEAR : Semi-supervised Data Programming in Python}, 
      author={Guttu Sai Abhishek and Harshad Ingole and Parth Laturia and Vineeth Dorna and Ayush Maheshwari and Ganesh Ramakrishnan and Rishabh Iyer},
      year={2021},
      eprint={2108.00373},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Quick Links

Acknowledgment

SPEAR takes inspiration, builds upon, and uses pieces of code from several open source codebases. These include Snorkel, Snuba & Imply Loss. Also, SPEAR uses SUBMODLIB for subset selection, which is provided by DECILE too.

Team

SPEAR is created and maintained by Ayush, Abhishek, Vineeth, Harshad, Parth, Pankaj, Rishabh Iyer, and Ganesh Ramakrishnan. We look forward to have SPEAR more community driven. Please use it and contribute to it for your research, and feel free to use it for your commercial projects. We will add the major contributors here.

Publications

[1] Abhishek et al. SPEAR : Semi-supervised Data Programming in Python, Demonstration Paper.

[2] Maheshwari et al. Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming, In Findings of ACL (Long Paper) 2022.

[3] Maheshwari, Ayush, et al. Data Programming using Semi-Supervision and Subset Selection, In Findings of ACL (Long Paper) 2021.

[4] Chatterjee, Oishik, Ganesh Ramakrishnan, and Sunita Sarawagi. Data Programming using Continuous and Quality-Guided Labeling Functions, In AAAI 2020.

[5] Sahay, Atul, et al. Rule augmented unsupervised constituency parsing, In Findings of ACL (Short Paper) 2021.

spear4highfidelity's People

Contributors

gsaiabhishek avatar ayushbits avatar dornavineeth avatar parthla avatar harshading avatar rishabhk108 avatar venkatapathy avatar sidhrth avatar prpankajsingh avatar ayushayush591 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.