Git Product home page Git Product logo

drug-finder-tool's Introduction

MedFind

A Multi-Output Text Classifier Model

Background

The "Drug Finder" project addresses the challenge of efficiently and accurately extracting drug details, including name, uses, dosage, and side effects, from textual drug descriptions. Currently, healthcare professionals and patients face a time-consuming and error-prone process of manually searching for this information. This project aims to develop a multioutput classifier model, utilizing logistic regression and trained on a diverse dataset, to predict drug details from natural language descriptions. The model will be deployed as an API, integrated into a user-friendly web application, and hosted on a server, facilitating easy access. By providing a reliable tool for obtaining drug information, this project seeks to improve healthcare decision-making, reduce medication errors, and enhance patient care in the pharmaceutical domain. The project's success will be assessed based on prediction accuracy, usability, and user feedback.

Overview

The Multi-Output Text Classifier Model is an advanced machine learning solution designed for the simultaneous prediction of multiple output variables based on input text data. This versatile model is particularly valuable when tasks involve classifying a single input into multiple categories or labels. In this context, our model excels at predicting various drug-related details, such as drug names, uses, dosages, and side effects, based on input descriptions.

How It Operates

The Multi-Output Text Classifier Model harnesses the combined power of text processing techniques and machine learning algorithms to perform its predictions. Here's an overview of its key operational steps:

  1. Data Collection: Our primary dataset was meticulously sourced from 2 esteemed websites Data Source 1 and Data Source 2. The data collection process involved two pivotal stages:

    i) Drug URL Scraping: We systematically gathered drug URLs, along with their corresponding names and the files are available in the scraper folder.

    ii) Drug Details Scraping: Leveraging the acquired URLs, we extracted comprehensive drug descriptions, uses, dosages, and side effects. These scripts are also available in the scraper folder. The resulting data was then stored in data\drug_dataset_big.csv.

    In total, we amassed over 65k drug details through scraping, which, after thorough data cleaning, yielded a dataset of 45,898 drug entries, complete with detailed descriptions, uses, dosages, and side effects (with a file size of 78 MB).

  2. Data Preprocessing: We subjected the input text data (in this case, drug descriptions) to preprocessing. This encompassed tasks such as text normalization, handling missing values, and converting the data into a suitable format for machine learning.

  3. Feature Extraction: Text data underwent a transformation into numerical features. In our implementation, we employed the CountVectorizer technique to convert text descriptions into numerical feature vectors.

  4. Multi-Output Classification: At the core of the model lies a powerful multi-output classifier (MultiOutputClassifier). This classifier excels at predicting multiple target variables simultaneously. We leveraged algorithms like Logistic Regression and K-Nearest Neighbors, streamlining the entire process using pipelines. You can find the model files, "drug_finder_big_knn.joblib" in the "models" folder.

  5. Prediction and Evaluation: Once trained, the model is ready for making predictions. Given a new drug name, the model predicts uses and dosage for that particular drug. The model's predictive accuracy is assessed using appropriate metrics. Notably, the accuracy score reached 1.0. You can review the training code in the "pynotebook/drug_finder_kaggle.ipynb" file.

Use Cases

The Multi-Output Text Classifier Model finds applications in a wide range of domains where a single input text needs to be classified into multiple categories or labels. Our model serves diverse use cases, including aiding doctors in researching specific medications, providing knowledge to the general public about particular drugs, and more. However, it is essential to note that this model's predictions should not replace professional medical advice. Users should exercise caution and manually verify predicted outputs, especially if there are discrepancies for a specific input description.

How to Use

Using the Multi-Output Text Classifier Model is straightforward:

  1. Input: Simply provide a drug description that you wish to classify. For example: "Abilify is a generic medicine used to treat mood disorders, depression, and mental health conditions."

  2. Output: Receive predictions for drug name, uses, dosage, and side effects. An example output might include: Name: Abilify, Uses: Used to treat anxiety/depression..., Dosage: Your doctor may recommend a specific dosage..., Side effects: Side effects can range from mild to serious...

Dependencies

To utilize this model, you'll need:

  • Python 3.x
  • The following libraries: pandas, scikit-learn, joblib, gradio (for user interface)

drug-finder-tool's People

Contributors

aryanmurthy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.