Git Product home page Git Product logo

sec_edgar_cusip_cik_mapping's Introduction

This is the code I use to creat a ncusip (historical cusip) - cik (historical cik) mapping.

why I need this?

A referee asked us to incorporate historical headquarters location data. Note that Compustat only provides CURRENT CUSIPs, current CIKs, and current headquarters.

Where are alternatives?

  1. WRDS has a (historical) CIK-CUSIP (or even GVKEY-CIK) link table. But business schools (at least in the UK) seldom subscribe it.
  2. This code is inspired by Leo Liu's CIK-CUSIP mapping. I also borrow some codes from him. Thanks. https://github.com/leoliu0/cik-cusip-mapping You can also find a csv. file at Liu's webpage. But date stamps are moved, which make the mapping not enough for me.
  3. There is a paper studying this and creating a to make a mapping. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3530613 But I did not find their code or mapping file. It is still good to know more about this issue.

So what is the issue on earth?

  1. Compustat only provides CURRENT CUSIPs, current CIKs, and current headquarters
  2. Historical headquarters data is available here, however, CIK (historical CIK) is the only identifier. https://sraf.nd.edu/sec-edgar-data/10-x-header-data/
  3. Therefore, the requirement is to build a link from Compustat GVKEY/CUSIP (Current CUSIP) to historical CUSIP (e.g., CRSP NCUSIP) to historical CIK to historical HQ locations
  4. ( WRDS has a (historical) CIK-CUSIP (or even GVKEY-CIK) link table)
  5. Manually, we can use 13D and 13G reportings to collect the mapping. As this is also how WRDS creates the mapping. WRDS: "This web query provides the historical link between a company's CIK and GVKEY. We create this link by first getting the CUSIP from a company's Schedule 13D/G. We then use the CUSIP to link the CIK from the header of the Schedule 13D/G to GVKEY in the Compustat tables."

The function of this code

  1. Get all disclosures from SEC Edgar. => "Download_SEC_File_List_CIK.py"
  2. Select only 13D and 13G files. => "Download_SEC_13D13G_CUSIP-CIK_Mapping.py"
  3. Use regular expression (RE) to obtain the CUSIP and CIK for each 13D/13G. This is a pair and all pairs make a mapping. => "Download_SEC_13D13G_CUSIP-CIK_Mapping.py"

Some issues with this code

  1. The code costs time very much.

    1.1 I try to only use the first 200 lines of each report and remove all 13D/A and 13G/A (additional files)

    1.2 It still takes me more than 10 hours to run it locally, with two laptops (one M2 chip Macbook pro and one 16G memory AMD R9 Win)

  2. I see some missing values there, but it is not a big issue for me meanwhile. I will fix this later on.

  3. The code is difficult to deploy on the server.

    3.1 I tried to deploy it on AWS EC2 but failed every way. It seems EDGAR will detect this and ban the scraping. I will try to fix this later on.

  4. When using the mapping, it is notable that the date is not continuous, as these dates are only when 13D or 13G is published. So filling up is necessary.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.