Comments (16)
from cv19index.
from cv19index.
I had similar issue when trying to load the xgboost model
Traceback (most recent call last):
File "", line 7, in
do_run(input_fpath, input_schema, model, output)
File "/home/user1/.local/lib/python3.6/site-packages/cv19index/predict.py", line 360, in do_run
model = read_model(model_fpath)
File "/home/user1/.local/lib/python3.6/site-packages/cv19index/io.py", line 19, in read_model
return pickle.load(fobj)
File "/home/user1/.local/lib/python3.6/site-packages/xgboost/core.py", line 1093, in setstate
_LIB.XGBoosterUnserializeFromBuffer(handle, ptr, length))
File "/usr/local/lib/python3.6/ctypes/init.py", line 361, in getattr
func = self.getitem(name)
File "/usr/local/lib/python3.6/ctypes/init.py", line 366, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/local/xgboost/libxgboost.so: undefined symbol: XGBoosterUnserializeFromBuffer
and I'm running that with 1.0.1
from cv19index.
Ok, good to know that the version isn't the issue. Were you trying to load the "xgboost" model?
from cv19index.
Yes, just trying to run the example notebook using python kernel for the notebook
from cv19index.
These are definitely some kind of issue with the XGBoost install. It looks like the C libraries aren't installed correctly.
from cv19index.
I think we are going to recommend using the conda xgboost install. We will add in some directions on that.
from cv19index.
I was also trying to run the XGBoost model in the Tutorial notebook. I upgraded XGBoost to version 1.0.1 and that seems to have resolved the issue. Thanks!
Now if I want to the Logistic Regression model, do I simply need to replace the model reference in the Tutorial to the following?
model = resource_filename("cv19index", "resources/logistic_regression/lr.p")
from cv19index.
The logistic regression model isn't currently hooked up the same way. We are gong to address that with some new updates coming over the weekend.
from cv19index.
Ok thank you! I ran the XGBoost model on my data and am seeing a lot of
['Diagnosis of Respiratory signs and symptoms in the previous 12 months'] in the "neg factors" column and not at all in the "pos factors" column.
Also seeing a lot of ['Age', 'Diagnosis of Neoplasm-related encounters in the previous 12 months', 'Diagnosis of Benign neoplasms in the previous 12 months'] in the "pos factors" column.
Shouldn't we expect to see Respiratory issues show up in the positive factors column, since those would increase the patients' risk for COVID19? Am I interpreting the results correctly?
from cv19index.
In the output there should be a corresponding field called "pos_patient_values". This is an array that lines up with the pos_factors and gives you the actual value of the variable.
So if you see "Diagnosis of Respiratory signs and symptoms in the previous 12 months" as a negative factor, that should be paired with a value of "False". That means that the fact that a diagnosis wasn't seen contributed to a decrease in risk.
We will try to think about a more clear way to present this. In our application we have a UI that presents this more clearly, so we aren't as used to putting this all in a CSV.
from cv19index.
We are going to switch to having two output files.
A prediction_summary.csv file will contain one row per prediction and will have the overall number. It will contain 3 columns:
- personId - The personId from the input data
- percentile - Where this person fits into the overall population. 1 is the lowest risk and 100 is the highest risk
- probability - The probability of the predicted outcome (respiratory failures)
The prediction_factors.csv file contains information on the factors driving each prediction. There will be multiple rows per prediction, one row for each factor. Each row will have:
- personId - The personId from the input data
- sign - 1 for positive factors (increased risk), 1 for negative factors (decreased risk)
- rank - A number from 1 to 10 that ranks the multiple factors associated with a prediction. The most significant factor associated with a prediction is 1. 2 is second, etc.
- factor_name- The name of the risk factor
- factor_value - The value of the risk factor for this patient
- factor_score - The score of this factor. Scores with larger magnitudes are more significant. These scores are a normalized version of SHAP scores.
from cv19index.
I think I found the root source of my issue:
It was because there was an old version of xgboot installed 0.9.0 on the server and although my local folder had version 1.0.1. The core.py. file in xgboost tries to locate the libxgboost.so library file. It has a for loop for going over the paths and it doesn't exit the for loop after finding the correct libxgboost.so file. In my case what happened it find the library version of 1.0.1 then had overriden the file with another one it found 0.9.0 which cause the issue
from cv19index.
Thanks. I'm going to close this then.
from cv19index.
We are going to switch to having two output files.
A prediction_summary.csv file will contain one row per prediction and will have the overall number. It will contain 3 columns:
- personId - The personId from the input data
- percentile - Where this person fits into the overall population. 1 is the lowest risk and 100 is the highest risk
- probability - The probability of the predicted outcome (respiratory failures)
The prediction_factors.csv file contains information on the factors driving each prediction. There will be multiple rows per prediction, one row for each factor. Each row will have:
- personId - The personId from the input data
- sign - 1 for positive factors (increased risk), 1 for negative factors (decreased risk)
- rank - A number from 1 to 10 that ranks the multiple factors associated with a prediction. The most significant factor associated with a prediction is 1. 2 is second, etc.
- factor_name- The name of the risk factor
- factor_value - The value of the risk factor for this patient
- factor_score - The score of this factor. Scores with larger magnitudes are more significant. These scores are a normalized version of SHAP scores.
Hi Dave, please let me know when this update is expected to be in production. Looking forward to having greater interpretability in the output, Thanks!
from cv19index.
HI, we actually pushed a change last night that simplified the files. In the end, we decided against having two separate files, but made one file where the columns are laid out more clearly. All the columns now have simple values (rather than arrays) and the relevant values are next to each other.
See https://github.com/closedloop-ai/cv19index/blob/master/examples/xgboost/example_prediction.csv
from cv19index.
Related Issues (20)
- A script to get ROC AUC results without cv19index package HOT 3
- Flag in Person.csv HOT 1
- Share data! HOT 1
- Logistic_regression model has no example script HOT 2
- Add Conda package for windows HOT 7
- Support Python 3.5 HOT 3
- Age cap? HOT 2
- Gender label mapping HOT 2
- Errors installing python dependencies HOT 1
- ICD 10 Cleaning Code Removing Valid Codes HOT 5
- Error with _NA_VALUES
- No such file when running cv19index executable HOT 1
- Script crashes if there are 0 inpatient records w/out discharge date HOT 3
- urllib.parse.quote: improper usage? HOT 4
- Add BlueButton support HOT 1
- Error running inference, Python version: 3.6.9 HOT 2
- Questions to feature mapping? HOT 4
- Error with reorder_inputs HOT 3
- inpatient days mismatch HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cv19index.