eveningdong / new_york_state_inpatients_medical_treatment_and_hospital_recommender_system_design Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 52.38 MB

Python 41.43% TeX 57.84% R 0.73%

new_york_state_inpatients_medical_treatment_and_hospital_recommender_system_design's People

Contributors

Watchers

Forkers

changy12

new_york_state_inpatients_medical_treatment_and_hospital_recommender_system_design's Issues

Midterm Report Review: ajw238

Overall, the steps of the project so far and how the team is doing is clearly conveyed in the report. Everything is to the point, and the visuals are very helpful in terms of understanding different components of your data. However, there may be a bit more to show here; perhaps plots that tie more insight into analysis.

Also, why Naive Bayes? I would have liked to see more justification for this type of model. I think a linear regression itself would also produce some interesting features, perhaps with a regularization that promotes sparse data in order to single out features. Also, does your cross validation properly distribute the small number of positive labels when resampling again and again? This may potentially skew your model if not. It does, however, seem from your future work that you are looking into ways to address it.

Best of luck!

Midterm Report Review

New_York_State_Inpatients_Mortality_Prediction

Midterm Report Review

Reviewer: Zilong Wang (zw243)

Introduction:

Context and introduction is clearly introduced.
Modelling this problem as a binary classification problem makes sense
as well.

Data Preprocessing:

Authors should list what were some of the variables that were deleted using
"common sense" that weren't relevant. (Not all, but just some would do)

Authors should mention that deletion of 100130 patients out of 2544731 patients
is around 4% or so of data deleted to remind readers of scale of problem,
as 10000~ entries sound like a lot

Grammatical nitpick, variable "die" should be renamed to "died".

Preliminary Observations:

For ease of review, "Table 1" should be placed right in this section,
as this is where the majority of the information is.

This is a simple yet brilliant way of immediately assessing the importance
and relevancy of each variable to the final outcome, and those variables
getting higher scores makes general sense as well.

Details of Analysis:

The methodology used is described in great detail, and justifications on
why the F1 score was used instead made sense.

However, with regards to the correlation between the amount of smoothing
(regularization?) and the F1 score, there seems to be diminishing returns
for the naive classifiers so I would caution against making the statement:
"... it is likely to get more accurate prediction with larger lambda"

Overall Comments:

The report is simple and straight to the point. I like how the methodology is
simple and not overly complicated for an initial preliminary analysis.

The authors have a clear idea of what their future plan should be and I am
excited to see what new results they will report in the near future.

Final Review -eee37

I found the project’s goal to be very practical since it has the potential to save people’s life. I wonder if there was anything in common between the patients with missing values that were deleted from this experiment (i.e maybe they came from the same hospital). I like that your report was very detailed. For example, you included a table with every feature and a reason for it including or not including it from the project; I agree with all of your reasoning. I liked that you used one-hot encoding in your feature transformation step. I also liked your approach to select a representative sample (i.e selecting a county with a large number of observations and limiting the values of CCS Procedure Code). I liked how you created you tweaked the test error equation to better measure the quality of your model. I liked that you used your extended knowledge in statistics to apply methods outside the scope of the class, like random forest and neural network, and that you provided a description of each of these methods. I think that your report was very well organized. Your representation of your results was very concise and easy to read. I found the analysis of your results to be very helpful. I also liked that you acknowledged you resources. Great job!

Midterm review cfw56

Firstly I'm a little confused with this report as it's titled New York State Inpatients Morality Prediction but it's in a file called Movie or Earthquakes.
I would go in and update your project name to minimize confusion. Overall your midterm report is well researched and provides a lot of background on your thought process. You also make great use of the tables and graphs and providing evidence for your decision making. I think it would be even better if you provided more information on each graph/table so that I know what I'm looking for a little better.
I'd also like a little bit more information on how you determined which variables were irrelevant initially and dive into your thought process a little bit more.
The overall structure of your report is well organized and the purpose of your project is very clear. To make the report flow a little better, I would suggest using a paragraph form for your next steps section as the list isn't explained and feels unfinished.
Overall, Great job!

Final Report Review -- hd324

It is an interesting topic in this project, and meaningful in real world application in saving people’s live.
I liked your project in the following ways:

A clear presentation of your work through tables (feature engineering), also clear explanation of what models you have used in the report.
A clear explanation of assumptions and justifications.
Great work on result presentation and comparison.

A few things I would suggest to move forward:

Apart from the equations for each of the model, you may want to visualize some of the analysis in each of the model
To be more practical, I would recommend consider some time dependency in the model apart from simply supervise learning methods. Some unsupervised learning methods can also be included in dimensionality reduction and clustering
Great work and good luck in moving forward.

Midterm Review wx44

Your report was really clear and to the point. I like how there's a lot of future work that you listed to improve on your findings and how you guys included graphs to show visuals from your data analysis. With that however, I don't think the plots show all that much information about the question you guys are trying to answer; they only show the percentages of people with that feature.

I do believe there is a lot more variety of analysis that can be done aside from Naive Bayes. Did you guys try anything else? Running regression on your data might give some interesting results. Think about the advantages/disadvantages of both and see which models give you more accurate predictions.

Comments on proposal

I was surprised to hear that you have two different ideas first of all.

I thought the movie proposal was more interesting but I could see the Earthquake proposal being more meaningful. I like that you were able to identify a computational issue in structural analysis and determine that you could solve the problem with learning!

My concern with the movie proposal is that it might be overdone and the data might not be terribly big with only 5000 movies. It sounds messy because as you mentioned it has several different types of features that might require some feature engineering.

Final Report- mgd67

I am really impressed with how comprehensive your project is and how thoroughly you explored the problem of recommending hospitals and treatments. I think the team did a great job of providing a purpose for every action they took. For example, you had great justifications for why you chose to clean the data (deleting rows and using one-hot encoding), feature selection, and choosing a regulizer in response to the sparsity of your data matrix. You also implemented a wide variety of supervised learning models for binary classification and did a good job of validating them through the use of F1 scoring and using training/validating/testing sets and cross-validation and bootstrapping methods.

I would also like to provide a few suggestions for your project. Moving forward, the team might need to refine its assumption about the self-containment of patients within counties. While in upstate New York, counties might be far apart and thus patients would only go to hospitals in their own county, in New York City, I feel like there is less of a barrier for a patient from Queens to go to a hospital in Manhattan. Also, for your presentation, it might help to include the names of diseases and treatments, not just their codes, to provide readers with a more intuitive sense of the recommendations the system makes. Nevertheless, you guys did great work this semester!

Final Review

Thank you for submitting your final project! You clearly contributed a lot of hard work and I enjoyed reading the result. Particularly, I really enjoyed your thorough justification of the decisions you made in data preprocessing and feature engineering. No assumptions or decisions were left unexplained, which was beneficial to me as a reader. Furthermore, your analyses were extremely well thought out and computationally robust. Lastly, the organization of your report made it very easy to follow.

In terms of areas for improvement, I think the report could have benefited from some more visuals. Although your explanations were very thorough, as mentioned, I think that some more graphs could have added clarity to your analysis. Also, I think some of your recommendation tool is limited by geography. For more sparse regions of NYS, I'm sure hospitals are further spread out and hospital choice is not really available. And lastly, although minor, I am curious about how you handled patients that were discharged to hospice care -- is this deemed as desirable as release home? Overall, you showed analytical excellence throughout your project, and I thoroughly enjoyed reading it. Great job, and good luck moving forward!

Midterm Report Review from jx255

The goal of this project is to predict whether an inpatient is likely to survive. The team used naïve bayes classification method to make prediction. In general, the team shows a deep and sophisticated understanding about the algorithm. Also, it has clear and detailed plan for further development.

I am confused as to what the plots are in the report. Also, it would be great if you would include any plots for the correlation of the data set, demonstrating any relationships among these variables. For the data preprocessing part, the description of the original data could be more specific, which can be helpful for understanding feature selections.

After feature selection, the team can demonstrate more on model selection. For example, explain why you choose naïve bayes classification instead of other classification models. The advantages and disadvantages between naïve bayes and other classifiers, etc.

Final Report Review--gad87

Overall, amazing work! You implemented a lot of sophisticated models to attack this big, messy problem and it's very impressive. I liked how you didn't just use the misclassification rate to determine how good the model was. Since you mentioned that overall the number of deaths was low (around 2% for Queens area patients, at least) then the model could theoretically just predict that everyone lives, making the misclassification rate to around 2%. However, in practice that would obviously never work and we care more about preventing overall deaths.

Some other aspects of the problem that I didn't expect was using information from the initial diagnosis (or that could be obtained from an initial diagnosis) at a clinic/local family doctor as the data. Using that takes into account more the "diagnosis" part as doctors in the hospital probably don't have time to redo all that work and just want to focus on the disease treatment and patient recovery. In that case, there may also be a factor between the diagnosis at the clinic to when the patient checks into the hospital for treatment. Maybe this is already a factor but I don't recall it being explicitly mentioned. Obviously if there is a long period of time between diagnosis to treatment but the disease severity or type is not updated, then that may cause some skewness in the mortality rate so that time period should be taken into account.

Potential Next Steps:
I think it would have been beneficial to compute the accuracy/error rates for other combinations of diseases instead of just the one listed in the results section, especially if it is not that much work to re-evaluate the training/test data on their model. There were some other wording/grammar issues throughout the paper but it did not detract too much from the content. In addition, it would be interesting to see if other linear algebra techniques can be used for dealing with such large and sparse matrices to eliminate noise and also improve computational time.

Also for this specific problem about hospitals, it would be interesting to detect clustering between hospitals and treatments. For example, say there is a hospital that specializes in childhood diseases. Thus, the algorithm should send a majority of the children patients to this hospital by default as the doctors there are more specialized in those diseases. Similarly for other diseases in other hospitals since it makes more sense to initially assign the patient to the correct hospital than to move the surgeon/specialist around, especially halfway through a lengthy treatment!

Comments on project proposal

I really like the project proposals for the following 3 reasons:

It's concise, straightforward and well-written;
It contains technical detail that looks really interesting;
You included links to the dataset so that people learn more about the project or dataset if they want to.
However, I think there is still room for improvement:
It will be nicer if you can point out the value of this project to the company, since we are assuming we are writing this proposal to a manager;
Introduce a little more background will be helpful;
Make it look nicer probably? especially for the movie one. It will be great if you can include a title and list out team members' name in the proposal.

eveningdong / new_york_state_inpatients_medical_treatment_and_hospital_recommender_system_design Goto Github PK

new_york_state_inpatients_medical_treatment_and_hospital_recommender_system_design's People

Contributors

Watchers

Forkers

new_york_state_inpatients_medical_treatment_and_hospital_recommender_system_design's Issues

New_York_State_Inpatients_Mortality_Prediction

Midterm Report Review

Reviewer: Zilong Wang (zw243)

Introduction:

Data Preprocessing:

Preliminary Observations:

Details of Analysis:

Overall Comments:

Recommend Projects

Recommend Topics

Recommend Org