Git Product home page Git Product logo

pydvl's Introduction

pyDVL Logo

A library for data valuation.

Build Status
PyPI - License

Docs

Installation

To install the latest release use:

$ pip install pyDVL

You can also install the latest development version from TestPyPI:

pip install pyDVL --index-url https://test.pypi.org/simple/

For more instructions and information refer to the Installing pyDVL section of the documentation.

Usage

pyDVL requires Memcached in order to cache certain results and speed-up computation.

You need to run it either locally or using Docker:

docker container run -it --rm -p 11211:11211 memcached:latest -v

Caching is enabled by default but can be disabled if not needed or desired.

Once that's done you should start by creating a Dataset object with your train and test splits. Then, you should create a model instance and a Utility object that will wrap the dataset, the model and the scoring function. Finally, you should use one of the methods defined in the library to compute the data valuation. Here we use Truncated Montecarlo Shapley because it is the most efficient.

Put all together:

import numpy as np
from pydvl.utils import Dataset, Utility
from pydvl.shapley.montecarlo import truncated_montecarlo_shapley
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

X, y = np.arange(100).reshape((50, 2)), np.arange(50)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=16
)
dataset = Dataset(X_train, X_test, y_train, y_test)
model = LinearRegression()
utility = Utility(model, dataset)
values, errors = truncated_montecarlo_shapley(u=utility, max_iterations=100)

For more instructions and information refer to the Getting Started section of the documentation

Refer to the Examples section of the documentation for more detailed examples.

Contributing

Please open new issues for bugs, feature requests and extensions. See more details about the structure and workflow in the developer's readme.

License

pyDVL is distributed under LGPL-3.0. A complete version can be found in two files: here and here.

All contributions will be distributed under this license.

pydvl's People

Contributors

xuzzo avatar mdbenito avatar github-actions[bot] avatar anesbenmerzoug avatar kosmitive avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.