This project aims to detect fake news using machine learning techniques. It utilizes a dataset containing news articles labeled as either real or fake.
The dataset used for this project can be found here. It consists of news articles along with their corresponding labels indicating whether they are real (class 1) or fake (class 0).
Before running the code, make sure to download the dataset and save it as News.csv
in the project directory. The code automatically reads this CSV file and preprocesses the text data for analysis.
To run the project, follow these steps:
-
Clone the repository:
git clone <repository-url>
-
Install the required dependencies:
pip install pandas seaborn matplotlib tqdm nltk wordcloud scikit-learn
-
Download NLTK resources:
python -m nltk.downloader punkt stopwords
-
Run the main script:
python main.py
Upon running the script, it performs the following tasks:
- Loads the dataset and preprocesses the text data.
- Generates word clouds for real and fake news articles.
- Displays a bar chart of the top words frequency.
- Splits the data into training and testing sets.
- Trains logistic regression and decision tree classifiers.
- Evaluates the performance of the classifiers using accuracy scores and confusion matrices.