Git Product home page Git Product logo

patentpublicdata's Introduction

Patent Public Bulk Files

Tool kit to download, read, and utilize open patent data provided to the public.

Notice

This source code is a work in progress and has not been fully vetted for a production environment.

Two main modules

  • Bulk Downloader automates downloading of public bulk patent data
  • Patent Document provides the ability to iterate and read patents directly from the large bulk download files, supports reading patent documents from 1976 to current, which includes Greenbook, SGML, PAP, and all Redbook XML formats, into a normalized Patent Object Model.

Features

  • Download bulk patent grants and applications, as well as additional resources
  • View individual patent documents directly from the large bulk files
  • Read patent documents directly from the large bulk files, supports reading patent documents from 1976 to current (formats: Greenbook, SGML, PAP, Redbook XML) into a normalized Patent Object Model
  • Extract patent documents from bulk files
  • Normalize and transform patent data before loading into a data resource
  • Company Synonyms generated by removing prefixes and suffixes i.e. (Corp.|Corporation|Co.|Company|...)
  • NPL Citations extraction of US Patent Ids
  • Patent Claim Tree to facilitate analysis
  • Update Classifications from Master CPC File (current CPC classification for patents starting from patent number 1)
  • Include classification definitions from CPC Scheme
  • Build a corpus using Corpus Builder, which automates building a corpus by downloading and extracting patents/applications matching specified classifications, one bulk file at a time for a date range.

Public Patent Data

  • Rate of Release: Evey Tuesday, a new bulk file is released, which contains around two to five thousand patents granted on the same day as the release.
  • Releases are available on both the USPTO Bulkdata and Reedtech websites.
  • Receiving changes of patents after publication, note bulk files are not updated once published, updates can be received by indexing additional supplemental files which are also publicly available. The following are fields which periodically update after publication:
    Field Update available
    Assignee daily within Patent Assignment XML Dump files
    Classifications monthly within Master Classification File Dumps

Other Information

The United States Department of Commerce (DOC)and the United States Patent and Trademark Office (USPTO) GitHub project code is provided on an ‘as is’ basis without any warranty of any kind, either expressed, implied or statutory, including but not limited to any warranty that the subject software will conform to specifications, any implied warranties of merchantability, fitness for a particular purpose, or freedom from infringement, or any warranty that the documentation, if provided, will conform to the subject software. DOC and USPTO disclaim all warranties and liabilities regarding third party software, if present in the original software, and distribute it as is. The user or recipient assumes responsibility for its use. DOC and USPTO have relinquished control of the information and no longer have responsibility to protect the integrity, confidentiality, or availability of the information.

User and recipient agree to waive any and all claims against the United States Government, its contractors and subcontractors as well as any prior recipient, if any. If user or recipient’s use of the subject software results in any liabilities, demands, damages, expenses or losses arising from such use, including any damages from products based on, or resulting from recipient’s use of the subject software, user or recipient shall indemnify and hold harmless the United States government, its contractors and subcontractors as well as any prior recipient, if any, to the extent permitted by law. User or recipient’s sole remedy for any such matter shall be immediate termination of the agreement. This agreement shall be subject to United States federal law for all purposes including but not limited to the validity of the readme or license files, the meaning of the provisions and rights and the obligations and remedies of the parties. Any claims against DOC or USPTO stemming from the use of its GitHub project will be governed by all applicable Federal law. “User” or “Recipient” means anyone who acquires or utilizes the subject code, including all contributors. “Contributors” means any entity that makes a modification.

This agreement or any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not in any manner constitute or imply their endorsement, recommendation or favoring by DOC or the USPTO, nor does it constitute an endorsement by DOC or USPTO or any prior recipient of any results, resulting designs, hardware, software products or any other applications resulting from the use of the subject software. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, including USPTO, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC, USPTO or the United States Government.



CC0
To the extent possible under law, https://github.com/USPTO/PatentPublicData has waived all copyright and related or neighboring rights to Patent Public Data. This work is published from: United States.

patentpublicdata's People

Contributors

bgfeldm avatar figyelmesi avatar maduraimad avatar mustberuss avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.