In 2018, $3.6 trillion was spent on health care in the United States, representing billions health insurance claims. It is an undisputed reality that some of these claims are fraudulent. Although they constitute only a small fraction, those fraudulent claims carry a very high price tag, both financially and in how they impact our perception of the integrity and value of our health care system.
The National Health Care Anti-Fraud Association (NHCAA) estimates that the financial losses due to health care fraud are in the tens of billions of dollars each year. A conservative estimate is 3% of total health care expenditures, while some government and law enforcement agencies place the loss as high as 10% of our annual health outlay, which could mean more than $300 billion.
Given the importance of this issue, the team worked on a project to try to predict potentially fraudulent claims from different providers.
- Numpy - Numerical Computing Tool
- Pandas - Data Analysis and Manipulation Tool
- Scikit-learn - Machine Learning library
- Seaborn - Data Visualization
- Python version 3.0 and above
$python --version
Python 3.7.4
- Jupyter notebook