Git Product home page Git Product logo

finance-training-data-generator's Introduction

Price Data Generator

The Price Data Generator is a Python script designed to fetch, process, and store historical stock price data from the S&P 500 index. This data is primarily used for training Neural Networks. The script extracts daily opening, closing, highest, and lowest prices, along with dividends, stock splits, and trading volume for stocks listed on the S&P 500 within a specified time period.

Table of Contents

Features

  • Fetches historical price data for all tickers on the S&P 500.
  • Validates data integrity by ensuring the correct number of rows and columns in each CSV file, raising appropriate errors for any discrepancies.
  • Automatically removes unobtainable data (e.g., data for delisted tickers).
  • Normalizes data using MinMaxScaler to transform features by scaling each feature to a given range, typically between 0 and 1, which is essential for Neural Network training.
  • Stores output data in CSV format for easy integration with various data analysis tools and software.

Prerequisites

Before using this script, ensure the following are installed on your system:

Installation

  1. Install the required libraries using pip:

    pip install yfinance pandas pandas_market_calendars scikit-learn
  2. Clone or download this repository to your local machine.

Usage

  1. Modify the start_date and end_date variables to your desired time period (optional).

  2. Execute the script with the following command:

    python PriceDataGenerator.py
  3. The script fetches historical stock price data, performs data validation, normalization, and saves it to CSV files in a directory named TrainingData.

  4. When prompted, enter 'Y' (yes) or 'N' (no) to remove invalid tickers.

Errors and Exceptions

  • SampleSizeError: Raised if the number of rows or columns in the dataset does not match expected values, indicating data size inconsistencies.

  • InvalidDataError: Raised when data cannot be retrieved for a stock, such as when the symbol is delisted or there is no data available from the API.

Output

The script saves the generated price data for each stock in CSV format in the TrainingData directory. Each CSV file contains the following columns:

  • Open: Opening price of the stock on a given date.
  • High: Highest price reached by the stock on that date.
  • Low: Lowest price of the stock on that date.
  • Close: Closing price of the stock on that date.
  • Volume: Trading volume of the stock on that date.
  • Dividends: Dividends paid by the stock on that date (if any).
  • Stock Splits: Stock splits occurring on that date (if any).
  • Normalized Data: Data columns normalized using MinMaxScaler.

Notes

  • An active internet connection is required to fetch data from Yahoo Finance.
  • This script is optimized for weekly and daily revenue calculations. It can be modified to suit specific requirements.
  • Adhere to the terms of service of the data provider when using this data.

Disclaimer

This script is intended for educational and research purposes only and should not be used for actual trading or financial analysis. Verify the accuracy of the data before making any decisions based on it.

USE AT YOUR OWN RISK! The authors and contributors of this script are not responsible for any financial losses or damages resulting from its

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.