Git Product home page Git Product logo

jam-api's Introduction

This project is no longer active developement, please see CoolQLCool for a similar more active project (ref)

Jam API

Jam API is a service that allows you to turn any site into a JSON accessible api using CSS selectors. To get started simply run a post request to https://www.jamapi.xyz with formdata of "url" and "json_data", here's an example of what your data should look like:

{
  "title": "title",
  "logo": ".nav-logo img",
  "paragraphs": [{ "elem": ".home-post h1", "value": "text"}],
  "links": [{"elem": ".home-post > a:first-of-type", "location": "href"}]
}

Using API you can simply generate JSON data from any website.

Code samples

nodejs

const request = require('request');
request.post('https://www.jamapi.xyz/', {form: {url: 'http://www.gavin.codes/', json_data: '{"title": "title"}'}}, function(err, response, body) {
  console.log(body);
})  

Javascript

fetch('https://www.jamapi.xyz', {
    method: 'POST',
    headers: {
      'Accept': 'application/json',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      url: 'http://www.gavin.codes/',
      json_data: '{"title": "title"}'
    })
  }).then(function(response) {
    return response.json();
  }).then(function(json) {
    document.body.innerHTML = json;
  });  

Ruby

require 'httparty'
response = HTTParty.post("https://www.jamapi.xyz/",
  :body => { "url" => "http://www.gavin.codes/", "json_data" => "{'title': 'title'}"})  
puts response.to_json

Python

import requests
payload = {'url': 'http://www.gavin.codes/', 'json_data': '{"title": "title"}'}

r = requests.post("https://www.jamapi.xyz", data=payload)
print(r.json())

curl

curl -X POST \
  -F 'url=http://www.gavin.codes/' \
  -F 'json_data={"title":"title"}' \
  https://www.jamapi.xyz

Features

Will auto pull the img src on corresponding elements, will auto pull the href from links. If passing JSON, you must provide a "elem" property, and then the element attributes you want. When you pass an array with JSON you'll get a structure that looks as follows:

[
  {
      "index": 0,
      "value": {
          "value": "Porter Robinson – Sad Machine (Cosmo’s Midnight Remix)"
      }
  },
  {
      "index": 1,
      "value": {
          "value": "Listen to Rachel Platten’s “Stand By You”"
      }
  }]

All the attributes you provide as JSON will be put inside of the value property, and the index property is to be able to track what index it ocurred in the DOM. I nested JSON values into it's own so that you can still have an "index" property returned and not run into issues.

How it works

Main power of the program is in services/html_to_json.js. Start site with node index after doing npm install.

Suggested node version is at least 4.2.2

jam-api's People

Contributors

adamjgrant avatar dinubs avatar gautamkrishnar avatar mal avatar tarungarg546 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jam-api's Issues

Certificate error

`post_connection_check': hostname "www.jamapi.xyz" does not match the server certificate (OpenSSL::SSL::SSLError)

If I'm reading this right, there is an issue with the SSL certificate for jamapi.xyz?

This was after running just the demo code for ruby provided on the main page

require 'httparty'
require 'json'
response = HTTParty.post("https://www.jamapi.xyz/",
{ 
  :body => [ { "url" => "http://www.radcircle.com", "json_data" => "{'title': 'title'}"} ].to_json,
  :headers => { 'Content-Type' => 'application/json', 'Accept' => 'application/json'}
})  
puts JSON.parse(response)

Strip leading and trailing new lines

This is a very cool project, and simple enough to make scraping old HTML sites a breeze.

One thing I've noticed is that if you select an element in the target page that contains line breaks around the content inside the tags, they are also returned, e.g:

{
"title": "\nMax Glenister - Front-end developer from Oxford, UK\n"
}

Broken Example

The example of targeting the URL "http://www.producthunt.com/jobs"

with

{
"title": "title",
"companies": [{"elem": "ul.jobs li.job a", "link": "href", "name": "text"}],
"company_descriptions": [{"elem": "ul.jobs li.job .description", "name": "text"}]
}

Has valid JSON, but no longer populates companies or companies_descriptions

Allow nesting

So we can return child values inside of it's parent. Basically turn:

"paragraphs": [
        {
            "index": 0,
            "value": {
                "value": "Porter Robinson – Sad Machine (Cosmo’s Midnight Remix)"
            }
        }
],
"links": [
        {
            "index": 0,
            "value": {
                "location": "http://radcircle.com/2016/03/porter-robinson-sad-machine-cosmos-midnight-remix/"
            }
        }
]

into


"paragraphs": [
    {
      "index": 0,
      "value": {
        "value": "Porter Robinson – Sad Machine (Cosmo’s Midnight Remix)",
        "location": "http://radcircle.com/2016/03/porter-robinson-sad-machine-cosmos-midnight-remix/"

      }
    }
]

Support for more attributes?

Would it be possible to select by other tag attributes? I am currently looking at parsing sites that have a lot of unique itemprop attributes like: <td itemprop="description">IC MCU 8BIT 32KB FLASH 44TQFP</td> from this site but hardly any ids or classes. I think these are ASP sites.

More complex example please

I am struggling to do a more complex implementation of this... for example given the following HTML (and having multiples on the page)

<div class="card">
  <div class="card-title">Title</div>
  <div class="card-desc">Description text here</div>
  <a href="#link" class="card-link">More info</a>
</div>

How would you format your JSON request to get those cards? This is what I assume it would be but lack of docs has me guessing...

{
    "title": "title",
    "news": [{
        "elem": ".card",
        "cardTitle": ".card .card-title",
        "cardDesc": ".card .card-desc"
    }]
}

Also the example on jamapi.xyz should be like something above, and not rely on website that can change like it does.

License

Hi,

What license are you releasing this under?

Thank you

C# options

Hi,
Was wonder how I can use this within my c# application.

Hopefully there is an option..

Is this applicable client side ?

Hi there !

This idea looks pretty cool for micro scraping services.
But today, as I read, it works server side.

It could be more usefull to create a client side version of Jam to make possible to create many and many github pages who scrap the web.

For exemple, I am looking for a way to turn my city transportation service website into an api without hosting cost (thanks to github pages).

Cheers,

What is the correct syntax for python?

Hello, I cannot find the correct syntax, I have tried this:

import requests
r = requests.post('http://www.jamapi.xyz', 'url = "http://www.radcircle.com"', 'json_data = {"title": "title"}')
print r.text

and get the error message:
{
"statusCode": 400,
"error": "Bad Request",
"message": "Invalid request payload JSON format"
}

If I change it to this:

r = requests.post('http://www.jamapi.xyz', 'url = "http://www.radcircle.com"', json_data = '{"title": "title"}')

I get this:
TypeError: request() got an unexpected keyword argument 'json_data'

Certificate Issue

When I try to visit https://www.jamapi.xyz/ I get NET::ERR_CERT_DATE_INVALID - of course, this means I also cannot do api calls.

Any Example Code for Angular2 Typescript

Hi ! is there any code example available to implement in typescript!? I am hardly getting internal server(500) error!

so far i have tried.. and trying! any help would be appreciated

import { Http ,Headers, URLSearchParams,RequestOptions} from '@angular/http';
import { Observable } from 'rxjs/Rx';

getData():Observable {
let headers = new Headers({ 'Content-Type': 'application/json' , 'Accept': 'application/json'});
let options = new RequestOptions({
headers: headers
});
let body = JSON.stringify({form: {url: 'http://www.gavin.codes/', json_data: {'title': 'title'}}});
return this.http.post('https://www.jamapi.xyz/',body,options).
map(res => res.json())
.catch((error: any) => {
return Observable.throw(new Error(error.status));
});
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.