Git Product home page Git Product logo

Comments (16)

DaveDeCaprio avatar DaveDeCaprio commented on June 6, 2024

from cv19index.

DaveDeCaprio avatar DaveDeCaprio commented on June 6, 2024

from cv19index.

islamgalileo avatar islamgalileo commented on June 6, 2024

I had similar issue when trying to load the xgboost model

Traceback (most recent call last):

File "", line 7, in
do_run(input_fpath, input_schema, model, output)

File "/home/user1/.local/lib/python3.6/site-packages/cv19index/predict.py", line 360, in do_run
model = read_model(model_fpath)

File "/home/user1/.local/lib/python3.6/site-packages/cv19index/io.py", line 19, in read_model
return pickle.load(fobj)

File "/home/user1/.local/lib/python3.6/site-packages/xgboost/core.py", line 1093, in setstate
_LIB.XGBoosterUnserializeFromBuffer(handle, ptr, length))

File "/usr/local/lib/python3.6/ctypes/init.py", line 361, in getattr
func = self.getitem(name)

File "/usr/local/lib/python3.6/ctypes/init.py", line 366, in getitem
func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/local/xgboost/libxgboost.so: undefined symbol: XGBoosterUnserializeFromBuffer

and I'm running that with 1.0.1

from cv19index.

DaveDeCaprio avatar DaveDeCaprio commented on June 6, 2024

Ok, good to know that the version isn't the issue. Were you trying to load the "xgboost" model?

from cv19index.

islamgalileo avatar islamgalileo commented on June 6, 2024

Yes, just trying to run the example notebook using python kernel for the notebook

from cv19index.

DaveDeCaprio avatar DaveDeCaprio commented on June 6, 2024

These are definitely some kind of issue with the XGBoost install. It looks like the C libraries aren't installed correctly.

from cv19index.

DaveDeCaprio avatar DaveDeCaprio commented on June 6, 2024

I think we are going to recommend using the conda xgboost install. We will add in some directions on that.

from cv19index.

chesh27 avatar chesh27 commented on June 6, 2024

I was also trying to run the XGBoost model in the Tutorial notebook. I upgraded XGBoost to version 1.0.1 and that seems to have resolved the issue. Thanks!
Now if I want to the Logistic Regression model, do I simply need to replace the model reference in the Tutorial to the following?
model = resource_filename("cv19index", "resources/logistic_regression/lr.p")

from cv19index.

DaveDeCaprio avatar DaveDeCaprio commented on June 6, 2024

The logistic regression model isn't currently hooked up the same way. We are gong to address that with some new updates coming over the weekend.

from cv19index.

chesh27 avatar chesh27 commented on June 6, 2024

Ok thank you! I ran the XGBoost model on my data and am seeing a lot of

['Diagnosis of Respiratory signs and symptoms in the previous 12 months'] in the "neg factors" column and not at all in the "pos factors" column.

Also seeing a lot of ['Age', 'Diagnosis of Neoplasm-related encounters in the previous 12 months', 'Diagnosis of Benign neoplasms in the previous 12 months'] in the "pos factors" column.

Shouldn't we expect to see Respiratory issues show up in the positive factors column, since those would increase the patients' risk for COVID19? Am I interpreting the results correctly?

from cv19index.

DaveDeCaprio avatar DaveDeCaprio commented on June 6, 2024

In the output there should be a corresponding field called "pos_patient_values". This is an array that lines up with the pos_factors and gives you the actual value of the variable.

So if you see "Diagnosis of Respiratory signs and symptoms in the previous 12 months" as a negative factor, that should be paired with a value of "False". That means that the fact that a diagnosis wasn't seen contributed to a decrease in risk.

We will try to think about a more clear way to present this. In our application we have a UI that presents this more clearly, so we aren't as used to putting this all in a CSV.

from cv19index.

DaveDeCaprio avatar DaveDeCaprio commented on June 6, 2024

We are going to switch to having two output files.

A prediction_summary.csv file will contain one row per prediction and will have the overall number. It will contain 3 columns:

  • personId - The personId from the input data
  • percentile - Where this person fits into the overall population. 1 is the lowest risk and 100 is the highest risk
  • probability - The probability of the predicted outcome (respiratory failures)

The prediction_factors.csv file contains information on the factors driving each prediction. There will be multiple rows per prediction, one row for each factor. Each row will have:

  • personId - The personId from the input data
  • sign - 1 for positive factors (increased risk), 1 for negative factors (decreased risk)
  • rank - A number from 1 to 10 that ranks the multiple factors associated with a prediction. The most significant factor associated with a prediction is 1. 2 is second, etc.
  • factor_name- The name of the risk factor
  • factor_value - The value of the risk factor for this patient
  • factor_score - The score of this factor. Scores with larger magnitudes are more significant. These scores are a normalized version of SHAP scores.

from cv19index.

islamgalileo avatar islamgalileo commented on June 6, 2024

I think I found the root source of my issue:
It was because there was an old version of xgboot installed 0.9.0 on the server and although my local folder had version 1.0.1. The core.py. file in xgboost tries to locate the libxgboost.so library file. It has a for loop for going over the paths and it doesn't exit the for loop after finding the correct libxgboost.so file. In my case what happened it find the library version of 1.0.1 then had overriden the file with another one it found 0.9.0 which cause the issue

from cv19index.

DaveDeCaprio avatar DaveDeCaprio commented on June 6, 2024

Thanks. I'm going to close this then.

from cv19index.

chesh27 avatar chesh27 commented on June 6, 2024

We are going to switch to having two output files.

A prediction_summary.csv file will contain one row per prediction and will have the overall number. It will contain 3 columns:

  • personId - The personId from the input data
  • percentile - Where this person fits into the overall population. 1 is the lowest risk and 100 is the highest risk
  • probability - The probability of the predicted outcome (respiratory failures)

The prediction_factors.csv file contains information on the factors driving each prediction. There will be multiple rows per prediction, one row for each factor. Each row will have:

  • personId - The personId from the input data
  • sign - 1 for positive factors (increased risk), 1 for negative factors (decreased risk)
  • rank - A number from 1 to 10 that ranks the multiple factors associated with a prediction. The most significant factor associated with a prediction is 1. 2 is second, etc.
  • factor_name- The name of the risk factor
  • factor_value - The value of the risk factor for this patient
  • factor_score - The score of this factor. Scores with larger magnitudes are more significant. These scores are a normalized version of SHAP scores.

Hi Dave, please let me know when this update is expected to be in production. Looking forward to having greater interpretability in the output, Thanks!

from cv19index.

DaveDeCaprio avatar DaveDeCaprio commented on June 6, 2024

HI, we actually pushed a change last night that simplified the files. In the end, we decided against having two separate files, but made one file where the columns are laid out more clearly. All the columns now have simple values (rather than arrays) and the relevant values are next to each other.

See https://github.com/closedloop-ai/cv19index/blob/master/examples/xgboost/example_prediction.csv

from cv19index.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.