Comments (1)
Code to reproduce the Error :
import pandas as pd
from sklearn import model_selection
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
# To download and read the credit scoring dataset
url = 'https://raw.githubusercontent.com/Giskard-AI/examples/main/datasets/credit_scoring_classification_model_dataset/german_credit_prepared.csv'
credit = pd.read_csv(url, sep=',',engine="python") #To download go to https://github.com/Giskard-AI/giskard-client/tree/main/sample_data/classification
# Declare the type of each column in the dataset(example: category, numeric, text)
column_types = {'account_check_status':"category",
'duration_in_month':"numeric",
'credit_history':"category",
'purpose':"category",
'credit_amount':"numeric",
'savings':"category",
'present_employment_since':"category",
'installment_as_income_perc':"numeric",
'sex':"category",
'personal_status':"category",
'other_debtors':"category",
'present_residence_since':"numeric",
'property':"category",
'age':"numeric",
'other_installment_plans':"category",
'housing':"category",
'credits_this_bank':"numeric",
'job':"category",
'people_under_maintenance':"numeric",
'telephone':"category",
'foreign_worker':"category"}
# feature_types is used to declare the features the model is trained on
feature_types = {i:column_types[i] for i in column_types if i!='default'}
# Pipeline to fill missing values, transform and scale the numeric columns
columns_to_scale = [key for key in feature_types.keys() if feature_types[key]=="numeric"]
numeric_transformer = Pipeline([('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
# Pipeline to fill missing values and one hot encode the categorical values
columns_to_encode = [key for key in feature_types.keys() if feature_types[key]=="category"]
categorical_transformer = Pipeline([
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore',sparse=False)) ])
# Perform preprocessing of the columns with the above pipelines
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, columns_to_scale),
('cat', categorical_transformer, columns_to_encode)
]
)
# Pipeline for the model Logistic Regression
clf_logistic_regression = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression(max_iter =1000))])
# Split the data into train and test
Y=credit['default']
X= credit.drop(columns="default")
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=0.20,random_state = 30, stratify = Y)
# Fit and score your model
clf_logistic_regression.fit(X_train, Y_train)
clf_logistic_regression.score(X_test, Y_test)
# Prepare data to upload on Giskard
train_data = pd.concat([X_train, Y_train], axis=1)
test_data = pd.concat([X_test, Y_test ], axis=1)
"""# Upload the model in Giskard 🚀🚀🚀
### Initiate a project
"""
from giskard import GiskardClient
url = "http://localhost:19000" #if Giskard is installed locally (for installation, see: https://docs.giskard.ai/start/guides/installation)
#url = "http://app.giskard.ai" # If you want to upload on giskard URL
token = "YOUR GENERATED TOKEN" #you can generate your API token in the Admin tab of the Giskard application (for installation, see: https://docs.giskard.ai/start/guides/installation)
client = GiskardClient(url, token)
# your_project = client.create_project("project_key", "PROJECT_NAME", "DESCRIPTION")
# Choose the arguments you want. But "project_key" should be unique and in lower case
credit_scoring = client.create_project("credit_scoring", "German Credit Scoring", "Project to predict if user will default")
# If you've already created a project with the key "credit-scoring" use
#credit_scoring = client.get_project("credit_scoring")
"""### Upload your model and a dataset (see [documentation](https://docs.giskard.ai/start/guides/upload-your-model))"""
credit_scoring.upload_model_and_df(
prediction_function=clf_logistic_regression.predict_proba, # Python function which takes pandas dataframe as input and returns probabilities for classification model OR returns predictions for regression model
model_type='classification', # "classification" for classification model OR "regression" for regression model
df=test_data, # the dataset you want to use to inspect your model
column_types=column_types, # A dictionary with columns names of df as key and types(category, numeric, text) of columns as values
target='default', # The column name in df corresponding to the actual target variable (ground truth).
feature_names=list(feature_types.keys()), # List of the feature names of prediction_function
classification_labels=clf_logistic_regression.classes_ , # List of the classification labels of your prediction
model_name='logistic_regression_v1', # Name of the model
dataset_name='test_data' # Name of the dataset
)
from giskard.
Related Issues (20)
- Update dependencies to Pandas 2.X HOT 1
- Feature: add OCR typos transformation to transformation functions catalog HOT 2
- Feature: add speech-to-text typos transformation to transformation functions catalog HOT 2
- Feature: add number-to-word transformation to transformation functions catalog HOT 5
- Feature: Amazon Bedrock Integration HOT 2
- Feature: add data quality tests in Giskard HOT 5
- can this giskard be used for codellama? HOT 5
- [GSK-2179] Slicing function DB insert fails because of the long name HOT 1
- error while running LLM-as-a-judge test: Must provide an 'engine' or 'deployment_id' parameter to create a <class 'openai.api_resources.chat_completion.ChatCompletion'> HOT 17
- Broken Documentation link in Readme HOT 3
- Update password hashing to use SHA-256 for enhanced security! HOT 2
- Need a to_junit method to standardize the report format of TestSuiteResult HOT 8
- Reduce percentage float precision in display results details HOT 7
- Add text transform to remove accents HOT 3
- Refactor the `TestSuiteResult` structure HOT 1
- Scan: Overlapping slices should be grouped in the scan generated issues HOT 2
- Scan: Add a robustness detector to the scan that perturbs numerical values
- Scan: Add a robustness detector to the scan that perturbs categorial values HOT 7
- How does it help with Model Interpretability and Explainability for Black-box model architecture HOT 2
- Giskard Hub fails to start in MacOS with colima HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from giskard.