Git Product home page Git Product logo

confounds's Introduction

Hi there ๐Ÿ‘‹

confounds's People

Contributors

dinga92 avatar jameschapman19 avatar jrasero avatar raamana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

confounds's Issues

Error fitting Residualize

  • confounds version: 0.1.1
  • Python version: 3.9.7
  • Operating System: macOS 11.6

Description

I tried to run the example code with some dummy data, but get an error when I try to fit Residualize

What I Did

# Using the diabetes dataset as an example
from sklearn import datasets

df = datasets.load_diabetes(as_frame=True)['data']
X = df[['bmi', 'age', 's1']].values # some predictors
y = df['s6'].values # the outcome variable
c = df['sex'].values # a confound - does not matter which

# Splitting into a training and a test set
from sklearn.model_selection import train_test_split

train_ind, test_ind = train_test_split(np.arange(0, len(y)), test_size=0.2)
train_X = X[train_ind, :]
train_y = y[train_ind]
train_C = c[train_ind]

test_X = X[test_ind, :]
test_y = y[test_ind]
test_C = c[test_ind]

# Fitting Residualize to remove the confound
from confounds import Residualize

resid = Residualize()
resid.fit(train_X, train_C)
deconf_train_X = resid.transform(train_X, train_C)

Error message:

TypeError: check_is_fitted() takes from 1 to 2 positional arguments but 3 were given
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/m0/mddm8pfx1vs3q52qvgx4mpxw0000gp/T/ipykernel_27134/3595471338.py in <module>
      1 resid = Residualize()
      2 resid.fit(train_X, train_C)
----> 3 deconf_train_X = resid.transform(train_X, train_C)

/opt/anaconda3/envs/brain_shadows/lib/python3.9/site-packages/confounds/base.py in transform(self, X, y)
    186         """Placeholder to pass sklearn conventions"""
    187 
--> 188         return self._transform(X, y)
    189 
    190 

/opt/anaconda3/envs/brain_shadows/lib/python3.9/site-packages/confounds/base.py in _transform(self, test_features, test_confounds)
    192         """Actual deconfounding of the test features"""
    193 
--> 194         check_is_fitted(self, 'model_', 'n_features_')
    195         test_features = check_array(test_features, accept_sparse=True)
    196 

TypeError: check_is_fitted() takes from 1 to 2 positional arguments but 3 were given

Comment

It looks like there is some incompatibility, but I'm not sure what package is causing the error. Any help would be greatly appreciated!

better validation of inputs to Deconfounders

the #19 reminds me of how some users can be confused given the code lets the second argument to .fit() and .transform() optional with y=None. The only reason we have y=None is to try follow sklearn conventions and to pass their tests, but given we can't pass them anyway, we should tighten them up and make it an error to not supply the second [necessary] input argument.

cc @jrasero @jameschapman19

drop-in replacements for cross_val_predict and cross_val_score etc

Pradeep,

could something like this be of interest for the library?

The idea would be to create a class that would do fit and predict including deconfounding and the use of the estimator in an encapsulated way.

Below is a skeleton example. This would only deconfound the input data.

cross_val_predict and cross_val_score functions could as well be implemented.

from sklearn.base import clone

class SklearnWrapper():

    def __init__(self,
                 deconfounder,
                 estimator):

        self.deconfounder = deconfounder
        self.estimator = estimator

    def fit(self,
            input_data,
            target_data,
            confounders,
            sample_weight=None):

        # clone input arguments
        deconfounder = clone(self.deconfounder)
        estimator = clone(self.estimator)

        # Deconfound input data
        deconf_input = deconfounder.fit_transform(input_data, confounders)
        self.deconfounder_ = deconfounder

        # Fit deconfounded input data
        estimator.fit(deconf_input, target_data, sample_weight)
        self.estimator_ = estimator

        return self

    def predict(self,
                input_data,
                confounders):

        deconf_input = self.deconfounder_.transform(input_data, confounders)

        return self.estimator_.predict(deconf_input)

Performance score stratified by confound

utils.score_stratified_by_confound()

Helper to summarize the performance score (accuracy, MSE, MAE etc) for each
level or variant of confound. This is helpful to assess any bias towards a
particular value when confounds are categorical (such as site or gender). So
if the MSE (of target) for Females is much lower compared to Males, then it
may indicate a potential bias of the model towards Females (due to imbalance in
size?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.