Git Product home page Git Product logo

tsumamigui's Introduction

Tsumamigui

Gem Version circleci Build Status Code Climate Test Coverage Dependency Status Inline docs codebeat badge

Tsumamigui(つまみぐい) is a simple and hussle-free Ruby web scraping library.

Requirement

Ruby 2.1+

Installation

Add this line to your application's Gemfile:

gem 'tsumamigui'

Or install it yourself as:

$ gem install tsumamigui

Usage

You just give it a URL(or URLs) and Xpath to data you want to get with its label as a hash. Then you can get scraped and parsed data as array.

Tsumamigui.scrape('http://example.com', {h1: 'html/body/div/h1/text()'})

# Returns:
# [
#   {h1: 'Example Domain', scraped_from: 'http://example.com'}
# ]

You can specify multiple URLs if you want to scrape different pages which they have the same HTML structure.

urls = ['http://example.com/page/1', 'http://example.com/page/2']
Tsumamigui.scrape(urls, {h1: 'html/body/div/h1/text()'})

# Returns:
# [
#   {h1: 'Example Domain 1', scraped_from: 'http://example.com/page/1'}
#   {h1: 'Example Domain 2', scraped_from: 'http://example.com/page/2'}
# ]

Important: Tsumamigui requests each urls at intervals of 1.0~3.0sec automatically.

TODO

  • Custom request headers.

etc...

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/obiyuta/tsumamigui. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

Guideline

  1. Fork it ( http://github.com/obiyuta/tsumamigui )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Write codes and specs.
    • Run test suite with bundle exec rspec and confirm that it passes
    • Run lint checker with the bundle exec rubocop and confirm that it passes
  4. Commit your changes (git commit -am 'Add some feature')
  5. Push to the branch (git push origin my-new-feature)
  6. Create new Pull Request

License

The gem is available as open source under the terms of the MIT License.

Copyright (c) 2017 Obi Yuta. See MIT-LICENSE for details.

tsumamigui's People

Contributors

obiyuta avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

perzival12

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.