Running the CV19 Index Predictor about cv19index HOT 16 CLOSED

chesh27 commented on June 6, 2024

Running the CV19 Index Predictor

from cv19index.

Comments (16)

DaveDeCaprio commented on June 6, 2024

Hi, which model are you using? Dave From: Cheshta Dhingra <[email protected]> Sent: Thursday, March 19, 2020 2:41 PM To: closedloop-ai/cv19index <[email protected]> Cc: Subscribed <[email protected]> Subject: [closedloop-ai/cv19index] Running the CV19 Index Predictor (#3) do_run(input_fpath, input_schema, model, output) Traceback (most recent call last): File "", line 1, in do_run(input_fpath, input_schema, model, output) File "..\cv19index\predict.py", line 360, in do_run model = read_model(model_fpath) File "..\cv19index\io.py", line 19, in read_model return pickle.load(fobj) File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 981, in setstate _check_call(_LIB.XGBoosterLoadModelFromBuffer(handle, ptr, length)) File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 176, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError())) XGBoostError: [15:34:02] C:\Jenkins\workspace\xgboost-win64_release_0.90\src\gbm\gbm.cc:20: Unknown gbm type — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGNLOWTOIJQN5EIPNNS7HTRIJYK5ANCNFSM4LPSXGMQ> . <https://github.com/notifications/beacon/AAGNLOW7NU7UFCQ7UJ25D7LRIJYK5A5CNFSM4LPSXGM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IWZDILQ.gif>

from cv19index.

DaveDeCaprio commented on June 6, 2024

Hi, It might be worth upgrading to xgboost version 1.0.1 or greater as well. Not sure if that's the issue but we wrote this against version 1.0.1. Thanks, Ben Tuttle

…

On Thu, Mar 19, 2020 at 3:19 PM ***@***.***> wrote: Hi, which model are you using? Dave *From:* Cheshta Dhingra ***@***.***> *Sent:* Thursday, March 19, 2020 2:41 PM *To:* closedloop-ai/cv19index ***@***.***> *Cc:* Subscribed ***@***.***> *Subject:* [closedloop-ai/cv19index] Running the CV19 Index Predictor (#3) do_run(input_fpath, input_schema, model, output) Traceback (most recent call last): File "", line 1, in do_run(input_fpath, input_schema, model, output) File "..\cv19index\predict.py", line 360, in do_run model = read_model(model_fpath) File "..\cv19index\io.py", line 19, in read_model return pickle.load(fobj) File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 981, in *setstate* _check_call(_LIB.XGBoosterLoadModelFromBuffer(handle, ptr, length)) File "C:\Users\cdhingr1\AppData\Local\Continuum\anaconda3\envs\fastai\lib\site-packages\xgboost\core.py", line 176, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError())) XGBoostError: [15:34:02] C:\Jenkins\workspace\xgboost-win64_release_0.90\src\gbm\gbm.cc:20: Unknown gbm type — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGNLOWTOIJQN5EIPNNS7HTRIJYK5ANCNFSM4LPSXGMQ> .

from cv19index.

islamgalileo commented on June 6, 2024

I had similar issue when trying to load the xgboost model

Traceback (most recent call last):

File "", line 7, in
do_run(input_fpath, input_schema, model, output)

File "/home/user1/.local/lib/python3.6/site-packages/cv19index/predict.py", line 360, in do_run
model = read_model(model_fpath)

File "/home/user1/.local/lib/python3.6/site-packages/cv19index/io.py", line 19, in read_model
return pickle.load(fobj)

File "/home/user1/.local/lib/python3.6/site-packages/xgboost/core.py", line 1093, in setstate
_LIB.XGBoosterUnserializeFromBuffer(handle, ptr, length))

File "/usr/local/lib/python3.6/ctypes/init.py", line 361, in getattr
func = self.getitem(name)

File "/usr/local/lib/python3.6/ctypes/init.py", line 366, in getitem
func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/local/xgboost/libxgboost.so: undefined symbol: XGBoosterUnserializeFromBuffer

and I'm running that with 1.0.1

from cv19index.

DaveDeCaprio commented on June 6, 2024

Ok, good to know that the version isn't the issue. Were you trying to load the "xgboost" model?

from cv19index.

islamgalileo commented on June 6, 2024

Yes, just trying to run the example notebook using python kernel for the notebook

from cv19index.

DaveDeCaprio commented on June 6, 2024

These are definitely some kind of issue with the XGBoost install. It looks like the C libraries aren't installed correctly.

from cv19index.

DaveDeCaprio commented on June 6, 2024

I think we are going to recommend using the conda xgboost install. We will add in some directions on that.

from cv19index.

chesh27 commented on June 6, 2024

I was also trying to run the XGBoost model in the Tutorial notebook. I upgraded XGBoost to version 1.0.1 and that seems to have resolved the issue. Thanks!
Now if I want to the Logistic Regression model, do I simply need to replace the model reference in the Tutorial to the following?
model = resource_filename("cv19index", "resources/logistic_regression/lr.p")

from cv19index.

DaveDeCaprio commented on June 6, 2024

The logistic regression model isn't currently hooked up the same way. We are gong to address that with some new updates coming over the weekend.

from cv19index.

chesh27 commented on June 6, 2024

Ok thank you! I ran the XGBoost model on my data and am seeing a lot of

['Diagnosis of Respiratory signs and symptoms in the previous 12 months'] in the "neg factors" column and not at all in the "pos factors" column.

Also seeing a lot of ['Age', 'Diagnosis of Neoplasm-related encounters in the previous 12 months', 'Diagnosis of Benign neoplasms in the previous 12 months'] in the "pos factors" column.

Shouldn't we expect to see Respiratory issues show up in the positive factors column, since those would increase the patients' risk for COVID19? Am I interpreting the results correctly?

from cv19index.

DaveDeCaprio commented on June 6, 2024

In the output there should be a corresponding field called "pos_patient_values". This is an array that lines up with the pos_factors and gives you the actual value of the variable.

So if you see "Diagnosis of Respiratory signs and symptoms in the previous 12 months" as a negative factor, that should be paired with a value of "False". That means that the fact that a diagnosis wasn't seen contributed to a decrease in risk.

We will try to think about a more clear way to present this. In our application we have a UI that presents this more clearly, so we aren't as used to putting this all in a CSV.

from cv19index.

DaveDeCaprio commented on June 6, 2024

We are going to switch to having two output files.

A prediction_summary.csv file will contain one row per prediction and will have the overall number. It will contain 3 columns:

personId - The personId from the input data
percentile - Where this person fits into the overall population. 1 is the lowest risk and 100 is the highest risk
probability - The probability of the predicted outcome (respiratory failures)

The prediction_factors.csv file contains information on the factors driving each prediction. There will be multiple rows per prediction, one row for each factor. Each row will have:

personId - The personId from the input data
sign - 1 for positive factors (increased risk), 1 for negative factors (decreased risk)
rank - A number from 1 to 10 that ranks the multiple factors associated with a prediction. The most significant factor associated with a prediction is 1. 2 is second, etc.
factor_name- The name of the risk factor
factor_value - The value of the risk factor for this patient
factor_score - The score of this factor. Scores with larger magnitudes are more significant. These scores are a normalized version of SHAP scores.

from cv19index.

islamgalileo commented on June 6, 2024

I think I found the root source of my issue:
It was because there was an old version of xgboot installed 0.9.0 on the server and although my local folder had version 1.0.1. The core.py. file in xgboost tries to locate the libxgboost.so library file. It has a for loop for going over the paths and it doesn't exit the for loop after finding the correct libxgboost.so file. In my case what happened it find the library version of 1.0.1 then had overriden the file with another one it found 0.9.0 which cause the issue

from cv19index.

DaveDeCaprio commented on June 6, 2024

Thanks. I'm going to close this then.

from cv19index.

chesh27 commented on June 6, 2024

We are going to switch to having two output files.

A prediction_summary.csv file will contain one row per prediction and will have the overall number. It will contain 3 columns:

personId - The personId from the input data

percentile - Where this person fits into the overall population. 1 is the lowest risk and 100 is the highest risk

probability - The probability of the predicted outcome (respiratory failures)

The prediction_factors.csv file contains information on the factors driving each prediction. There will be multiple rows per prediction, one row for each factor. Each row will have:

personId - The personId from the input data

sign - 1 for positive factors (increased risk), 1 for negative factors (decreased risk)

rank - A number from 1 to 10 that ranks the multiple factors associated with a prediction. The most significant factor associated with a prediction is 1. 2 is second, etc.

factor_name- The name of the risk factor

factor_value - The value of the risk factor for this patient

factor_score - The score of this factor. Scores with larger magnitudes are more significant. These scores are a normalized version of SHAP scores.

Hi Dave, please let me know when this update is expected to be in production. Looking forward to having greater interpretability in the output, Thanks!

from cv19index.

DaveDeCaprio commented on June 6, 2024

HI, we actually pushed a change last night that simplified the files. In the end, we decided against having two separate files, but made one file where the columns are laid out more clearly. All the columns now have simple values (rather than arrays) and the relevant values are next to each other.

See https://github.com/closedloop-ai/cv19index/blob/master/examples/xgboost/example_prediction.csv

from cv19index.

Running the CV19 Index Predictor about cv19index HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent