Comments (1)
Hey @seanroades, thanks for reaching out!
Note
For most up to date version of this answer, check out https://github.com/Elijas/sec-parser-exploration/blob/main/02_other_sec_form_types.ipynb
How do I parse other form types with sec-parser, such as 10-K, 8-K, S-1, etc.?
As of April 2023, sec-parser supports parsing various SEC form types beyond just 10-Q, such as 10-K, 8-K, S-1, and more. Basically, it supports parsing any structured text forms.
However, currently only parsing 10-Q top level section types is supported.
This gives us two methods to parse other form types with sec-parser:
- Ignore the warnings from the parsing step that identifies the 10-Q top level section types
- Skip the parsing step that identifies the 10-Q top level section types
Method 1: Ignore warnings from the parsing step that identifies the 10Q top level section types
import warnings
from sec_downloader import Downloader
import sec_parser as sp
dl = Downloader("MyCompanyName", "[email protected]")
html = dl.get_filing_html(ticker="AAPL", form="10-K")
parser = sp.Edgar10QParser()
with warnings.catch_warnings():
warnings.filterwarnings("ignore", message="Invalid section type for")
elements: list = parser.parse(html)
tree: sp.SemanticTree = sp.TreeBuilder().build(elements)
demo_output_1: str = sp.render(elements)
demo_output_2: str = sp.render(tree)
Method 2: Skip the parsing steps related to the 10Q top level section types
from sec_downloader import Downloader
import sec_parser as sp
from sec_parser.processing_steps import TopSectionManagerFor10Q, IndividualSemanticElementExtractor, TopSectionTitleCheck
dl = Downloader("MyCompanyName", "[email protected]")
html = dl.get_filing_html(ticker="AAPL", form="10-K")
def without_10q_related_steps():
all_steps = sp.Edgar10QParser().get_default_steps()
# Change 1: Remove the TopSectionManagerFor10Q
steps_without_top_section_manager = [step for step in all_steps if not isinstance(step, TopSectionManagerFor10Q)]
# Change 2: Replace the IndividualSemanticElementExtractor with a new one that has the top section checks removed
def get_checks_without_top_section_title_check():
all_checks = sp.Edgar10QParser().get_default_single_element_checks()
return [check for check in all_checks if not isinstance(check, TopSectionTitleCheck)]
return [
IndividualSemanticElementExtractor(get_checks=get_checks_without_top_section_title_check)
if isinstance(step, IndividualSemanticElementExtractor)
else step
for step in steps_without_top_section_manager
]
parser = sp.Edgar10QParser(get_steps=without_10q_related_steps)
elements: list = parser.parse(html)
tree: sp.SemanticTree = sp.TreeBuilder().build(elements)
demo_output_1: str = sp.render(elements)
demo_output_2: str = sp.render(tree)
Feedback
Let us know if it helped or if you have any further questions!
Thanks,
Elijas
from sec-parser.
Related Issues (20)
- Welcome to sec-parser! Start Here for Contributing
- Create a visualisation tool that overlays parsed elements with semi-transparent boxes HOT 8
- Request for Feedback: Architectural Design Proposal for Standardized Parsing of SEC EDGAR Tables HOT 1
- Table not identified HOT 1
- Parse Page Numbers and Page Separators HOT 4
- Implement "Open in GitHub Codespaces" button in our README HOT 3
- Make HighlightedTextClassifier work with `<b>` tags HOT 4
- Fix the TopSectionTitle being split in MSFT filing
- Can't enter MSFT/0000950170-23-014423 to dashboard app HOT 1
- Page headers should be identified as PageHeaderElement HOT 4
- SupplementaryText with repeating text is being identified as IrrelevantElement HOT 6
- Annotate more filings and add them to the accuracy test datasets
- The task snapshot-verify fails HOT 1
- Incorrect mock variable name HOT 1
- Refactor top section manager for 10 q
- Singular Visual Line Should Be Identified as a Single TextElement
- Adjusting Top Section Title Regular Expression to Handle Accented Characters
- Parsers for other type of docs HOT 3
- Can I get 8K - 10K report? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sec-parser.