Git Product home page Git Product logo

worldcat-scraper's Introduction

WorldCat Scraper

Scrapes WorldCat books and stores the results in a SQLite database.

Usage

$ scrapy crawl worldcat

To read the scraped data:

$ sqlite3 worldcat.db
sqlite>.headers on
sqlite>.mode markdown
sqlite>
SELECT
  oclc_id AS oclc,
  data ->> '$.isbn[1]' AS isbn,
  data ->> '$.title' AS title,
  data ->> '$.authors' AS authors
  FROM books
  WHERE length(title) < 50
  AND json_array_length(data ->> '$.isbn') > 1
  AND json_array_length(authors) < 3
  ORDER BY RANDOM()
  LIMIT 5;

| oclc |     isbn      |                   title                    |               authors               |
|------|---------------|--------------------------------------------|-------------------------------------|
| 1065 | 9780486620107 | Optical aberration coefficients            | ["H  A Buchdahl"]                   |
| 772  | 9780812275827 | Theodore Roosevelt : confident imperialist | ["David H Burton"]                  |
| 786  | 9780816502288 | Southeast Asia; a critical bibliography    | ["K  G Tregonning"]                 |
| 594  | 9780819143471 | The Cambridge Platonists                   | ["Gerald R Cragg"]                  |
| 999  | 9780813808000 | Modern sportswriting                       | ["Louis I Gelfand","Harry E Heath"] |

License

This project is licensed under the terms of the MIT license.

worldcat-scraper's People

Contributors

abevoelker avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.