Git Product home page Git Product logo

fetch's Introduction

#Fetch

Fetch is a domain-specific language, which allows for simple HTTP fetching and filtering. It’s feature-set can be described as a combination of HTTPie/curl, and simple filtering using libraries like BeautifulSoup and regex.

Fetch makes it possible to write a short input file, which describes what data to fetch from a web-page, and how it should be manipulated and formatted for output. Two sample fetch input files are provided further down in this README, along with their outputs.

It is built on top of Python, using the following libraries:

Program execution is performed in (up to) four steps:

  1. Fetch URL
  • Declare the target URL(s) to fetch
  • Declare headers, cookies and params for the URL(s)
  1. Filter results coarsely by doing:
  • Line-based filtering (keep lines specified function(s))
  • HTML Tag-based filtering (keep tags matching specified function(s))
  1. Filter the coarsly filtered lines/TAGs more finely by doing things like:
  • Strip away text matching a pattern from a line
  • Fetch attr/text from a TAG
  • or more… (currently existing filters described here)
  1. Format output for being read by external script
  • Output is normally JSON, but can be output in other formats (set by flags to the executable)
  • For simple outputs, just assigning a variable “output” is sufficient.

A more in-depth walkthrough of the Fetch language is available in this document “Fetch in-depth description”. This document covers examples of all different fetch methods, filters and output options. The grammar of the Fetch language is also present here.

A short guide for installing and running Fetch programs are available in the section “Installing and running Fetch” further below.

##Sample programs

###Program 1:

github <- 'https://github.com/buffis?tab=repositories'  # Fetch URL
repolist = [findall: '.repo-list-item'] github  # Filter coarsely
repolistnames = [findall: '.repo-list-name'] repolist  # Filter coarsely
repodescriptions = [children: '.repo-list-description'] repolist  # Filter coarsely
repolinks = [children: 'a'] repolistnames  # Filter coarsely
output = {text} repolinks   # Filter finely, output

Output:

[
  "swedbank-qif-export",
  "easyftpd",
  "danm8ku"
]

###Program 2:

github <- 'https://github.com/buffis?tab=repositories'  # Fetch URL
repolist = [findall: '.repo-list-item'] github  # Filter coarsely
repolistnames = [findall: '.repo-list-name'] repolist  # Filter coarsely
repodescriptions = [children: '.repo-list-description'] repolist  # Filter coarsely
repolinks = [children: 'a'] repolistnames  # Filter coarsely
names = {text: ''} repolinks  # Filter finely
hrefs = {attr: 'href'} repolinks  # Filter finely
descriptions = {text: ''} repodescriptions  # Filter finely
output = dict{'names': names, 'hrefs': hrefs, 'descriptions': descriptions}  # Output

Output:

{
  "hrefs": [
    "/buffis/swedbank-qif-export",
    "/buffis/easyftpd",
    "/buffis/danm8ku"
  ],
  "names": [
    "swedbank-qif-export",
    "easyftpd",
    "danm8ku"
  ],
  "descriptions": [
    "QIF export from Swedbank's internet bank service",
    "Automatically exported from code.google.com/p/easyftpd",
    "danm8ku"
  ]
}

##Downloading, installing and running Fetch

NOTE: Fetch has not yet been test on Python 3. Instructions are for 2.7, but any Python 2.X release above 2.5 should work.

####Download

The latest version of Fetch is available as a ZIP archive here.

... or you could instead just clone this git repository by running:

git clone https://github.com/buffis/fetch.git

####Installation on Ubuntu/Debian (primary target platform)

Install dependencies

sudo apt-get install python python-ply python-beautifulsoup python-requests unzip

Extract fetch and run

unzip fetch.zip
python fetch.py sample/github_buffis_repoinfo.fetch

####Installation on Windows (works fine, but a bit unusual for scripting stuff)

  • Install Python 2.X (includes pip)
  • In cmd.exe do
    • pip install beautifulsoup ply requests
  • Unzip fetch.zip somewhere
  • In cmd.exe do
    • python fetch sample/github_buffis_repoinfo.fetch

####Installation on OSX (untested)

I haven’t actually tried this on OSX, but installation should be similar to Windows.

Install Python, install deps through PIP, run.

##Other information

Fetch is currently in active development, and the syntax and features of the language may change at any time.

Fetch is developed by Björn Kempén (buffi). External code contributions, bug reports and feature requests will be appreciated.

It is available under the MIT License.

fetch's People

Contributors

buffis avatar

Stargazers

 avatar

Watchers

 avatar  avatar

fetch's Issues

out = x+y+z not working

out = x+y+z not working
--Parsed--
Syntax error in input!

Traceback (most recent call last):
File "D:\githome\projects\fetch\fetch.py", line 41, in
print_parsed()
File "D:\githome\projects\fetch\fetch.py", line 4, in print_parsed
for line in fetchparser.parse_input(open("reddit.fetch").read()):
TypeError: 'NoneType' object is not iterable

Clean up parser Warnings

WARNING: Token 'RPAREN' defined, but not used
WARNING: Token 'LPAREN' defined, but not used
WARNING: There are 2 unused tokens
Generating LALR tables
WARNING: 6 shift/reduce conflicts

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.