Git Product home page Git Product logo

sec-web-scraper-13f's Introduction

Yale University CPSC 437 Database SEC Python Web Scraper

This repository contains a Python Web scraper for parsing NPORT-P filings (fund holdings) from SEC's website, EDGAR. We (Jason Wu, Michael Lewkowicz, Kevin Zhang - Yale undergrads) forked this repository from CodeWritingCow.

In this fork, we modified the original code to work with the new Edgar website (as of Dec 5th, 2020). These modifications were very signficant. If you look at our web scraper next to the original repo, the code is almost completely different, though we re-use some of the original helper functions written by CodeWritingCow stored in helper.py. In addition, we have made the following modifications:

  • Exclusively target and scrape NPORT-P filings
  • Collect issuers, total value at the time of filing, number of shares, and other relevant data
  • Directly insert this data into a mySQL database instead of a tsv file

In addition, note that the documentation is a mix of the original documentation by Gary Pang (CodeWritingCow) and new documentation we've written.

Requirements

Getting Started

  • Make sure you have pipenv set up on your machine.
  • Edit the contents of db.py to match the database you are trying to connect to.
  • Run pipenv install.
  • Run python scraper.py within a pipenv shell (or pipenv run python scraper.py).
  • When prompted, enter the 10-digit CIK number of a mutual fund.
  • Happy investing! โค๏ธ ๐Ÿ’ต ๐Ÿ’ฐ

Key Dependencies

  • Requests, Python library for making HTTP requests
  • lxml, Python library for processing XML and HTML
  • Beautiful Soup, Python library for scraping information from Web pages
  • re, Python module for using regular expressions
  • MySQL Python Connector, Python module for connecting to a MySQL database.

Contributor

References

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.