Git Product home page Git Product logo

openedgar's Introduction

Build Status Coverage Status

Logo

OpenEDGAR is a comprehensive framework for building databases from EDGAR, and can automate the retrieval and parsing of EDGAR forms. OpenEDGAR uses the same software that powers many of our data products, including the LexPredict Agreement Database.

As with our pioneering ContraxSuite contract analytics platform, OpenEDGAR is open source. OpenEDGAR can be used freely under the MIT License.

How to use OpenEDGAR

Related Information

Licensing, Support, and Customization

OpenEDGAR is available under a simple, permissive MIT license. If your organization would like to discuss alternative licensing, requires support, or is interested in customization, please contact us at [email protected].

Releases

  • 1.0.0: May 2018 - First public release; code

openedgar's People

Contributors

andreycorelli avatar lrparser avatar mjbommar avatar test0076 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openedgar's Issues

Package requirement conflicts with lexnlp 0.1.8

Hi,

I encountered at least two python package version conflicts when trying to set this up.

Based on code here, lexnlp needs 0.1.8

https://github.com/LexPredict/lexpredict-lexnlp/archive/0.1.8.zip

This is conflicting with pandas and requests. E.g. openedgar wants pandas 0.22, but lexnlp needs 0.21.

Can we use the latest lexnlp 1.8.0 with this release? Or should we stick with what 0.1.8 needs?

Thanks.

Migrations not version controlled?

Hi,
Just a quick question - I was wondering why the installation procedure recommends making the migrations during the install? Usually I have seen these files in version control, if they are not it can make things a bit harder so if there isn't a particular reason why I'm happy to submit a PR with them in!

Many thanks
Dom

What's expected as outcome of processing?

Hi,

I've gotten it installed per the instruction in local mode and downloaded filing index for 2018 and process_all_filing_index(year=2018, form_type_list=["10-Q"])

celery picked up and after some time ended up with a lot of txt (looks like mixture of txt and HTML) content in edgar/data folder and records in _companyinfo and _filing, _filingdata records. But no actual content broken down into sections/individual pieces. Is this expected outcome? Do I need to do additional processing to extract the actual content?

Also, the django app - is this just a skeleton and not supposed to do anything other than user registration and login/logout?

thanks!

Failure of Installation

  1. Update .env file. For local testing (downloading files locally, instead of to S3), set CLIENT_TYPE to LOCAL and DOWNLOAD_PATH to a local path

....

b. Update DATABASE_URL

c. Update CELERY_BROKER_URL

What does b an c mean?

Then I've finished set up S3 bucket, IAM and update. Then when I try
d. $ python manage.py migrate

I got the following error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "manage.py", line 29, in
execute_from_command_line(sys.argv)
File "/opt/openedgar/env/lib/python3.6/site-packages/django/core/management/init.py", line 371, in execute_from_command_line
utility.execute()
File "/opt/openedgar/env/lib/python3.6/site-packages/django/core/management/init.py", line 365, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/opt/openedgar/env/lib/python3.6/site-packages/django/core/management/base.py", line 288, in run_from_argv
self.execute(*args, **cmd_options)
File "/opt/openedgar/env/lib/python3.6/site-packages/django/core/management/base.py", line 335, in execute
output = self.handle(*args, **options)
File "/opt/openedgar/env/lib/python3.6/site-packages/django/core/management/commands/migrate.py", line 79, in handle
executor = MigrationExecutor(connection, self.migration_progress_callback)
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/migrations/executor.py", line 18, in init
self.loader = MigrationLoader(self.connection)
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/migrations/loader.py", line 49, in init
self.build_graph()
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/migrations/loader.py", line 206, in build_graph
self.applied_migrations = recorder.applied_migrations()
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/migrations/recorder.py", line 61, in applied_migrations
if self.has_table():
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/migrations/recorder.py", line 44, in has_table
return self.Migration._meta.db_table in self.connection.introspection.table_names(self.connection.cursor())
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/backends/base/base.py", line 255, in cursor
return self._cursor()
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/backends/base/base.py", line 232, in _cursor
self.ensure_connection()
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/backends/base/base.py", line 216, in ensure_connection
self.connect()
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/utils.py", line 89, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/backends/base/base.py", line 216, in ensure_connection
self.connect()
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/backends/base/base.py", line 194, in connect
self.connection = self.get_new_connection(conn_params)
File "/opt/openedgar/env/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 168, in get_new_connection
connection = Database.connect(**conn_params)
File "/opt/openedgar/env/lib/python3.6/site-packages/psycopg2/init.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5432?
FATAL: password authentication failed for user "openedgar"
FATAL: password authentication failed for user "openedgar"

Parsed Documents Have XML Tags

I notice a lot of my parsed documents (from documents/text folder) still have XML tags in them, for instance:

<Report instance="photozou-20161130.xml">
  <IsDefault>false</IsDefault>
  <HasEmbeddedReports>false</HasEmbeddedReports>
  <HtmlFileName>R11.htm</HtmlFileName>
  <LongName>00000011 - Disclosure - NOTE 5 INCOME TAXES</LongName>
  <ReportType>Sheet</ReportType>
  <Role>http://photozou.com/role/Note5IncomeTaxes</Role>
  <ShortName>NOTE 5 INCOME TAXES</ShortName>
  <MenuCategory>Notes</MenuCategory>
  <Position>11</Position>
</Report>

This is hurting trying to train a word embedding model. Any tips on how to avoid this? I definitely see Tika working hard but I still see a lot of XML in my processed results. Is this expected?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.