Git Product home page Git Product logo

quotes's Introduction

CLI app to Scrape quotes from goodreads.com

Using Scrapy's module on python3, scrape quotes from famous authors and display them on the CLI.
This project could have probably be done by using Shell and Python alone. But I wanted to learn about Scrapy, sometimes it's helpful.

! I know there is a CLI available for Unix/Linux users (called fortune), however on MacOS no such thing* (I don't use homebrew or any package manager). Therefore I created my own.

On Goodreads.com there are lots of quotes arranged in pages and can be filtered by author. The HTML page has the class quoteText for the quot and a span tag authorOrTitle for the author.

Scrapy requires a URI to make a request an scrap data, this is defined at the beginning of the code,

tag_list=["friends","life","funny","science","computers"]
for j in tag_list:
    https://www.goodreads.com/quotes/tag/tag_list[j]

Here, Scrapy will search quotes from the Friends, Life, Funny, Science, and Computers tag of the source(GoodReads.com)

Installation:

! Python3 with Scrapy and JSON modules should be already installed.

  1. Run:

    $ scrapy crawl quotes -o output_json_file

The generated JSON file has some unpleasant characters from the webpage, such as single quots, return chars, “, to clean them I used sed.

  1. Open $HOME/.profile with VIM and add the following line at the very end of the file:

    ruby quotes/read_json.rb

To display(read) one randomly selected quote from the cleaned JSON file, Ruby is the perfect choice, Python would perform equally.

  1. Every time a new Terminal is open it will display a "randomly" selected quote, e.g.:

Well, Mr. Frankel, who started this program, began to suffer from the computer disease that anybody who works with computers now knows about. It’s a very serious disease and it interferes completely with the work. The trouble with computers is you play with them. They are so wonderful. You have these switches - if it’s an even number you do this, if it’s an odd number you do that - and pretty soon you can do more and more elaborate things if you are clever enough, on one machine.
Richard P. Feynman
95/195 -- computers

About output_json_file:

The file format is:

<quote>,<author>,<topic>

I also added some quotes from the following book:
"Astroparticle Physics" by Claus Grupen
and some dialogues/quotes from The Simpsons (No copyright infringement intended)


Languages: Python3, Ruby, and Shell
Environment: MacBookPro/MacOS 15.5
Editors: VIM and Python 3.8's IDLE

quotes's People

Contributors

ndlopez avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.