Git Product home page Git Product logo

snapper's Introduction

Snapper

snapper

A Web microservice for capturing a website's OpenGraph data built in Golang

Building Snapper

git clone https://github.com/owl-93/snapper
cd snapper
go build .

Optionally give the executable a name

go build -o <executable name> .

Running Snapper (Default Options)

By default, snapper will run on port 8888 and will try to set up caching with redis running on the localhost at the default redis port of 6379, and a cache TTL of 24 hours. See the below section on configuring snapper for more options to fit your use case.

./snapper

Running Snapper with Arguments & Options

Specifying a different port

By default, snapper runs on port 8888. You can tell snapper to use a different port by passing the port as a command line argument

./snapper --port 8081

Specifying a Redis instance to back caching

You can pass the --cache flag followed by a redis connection address to point to a redis instance to use. Note that an invalid connection URI does not exit the application it will run as if started with the --no-cache flag.

    ./snapper --cache "some-redis-instance:6379"

Setting the cache TTL

You can set the cache TTL for caching page metadata. The default cache TTL is 24 hours, and the cache TTL is specified in number of hours using the --cache-ttl option. The TTL applies to each fetched page, not the entire cache

running snapper with a cache life of 12 hours

    /.snapper --cache-ttl 12

Globally disabling the cache

You can disable caching entirely for the application by passing the --no-cache flag to snapper. Note that this is equivalent to passing the forceRefresh option in every request to snapper. However, this option also prevents snapper from storing data in the cache as well. With the forceRefresh option, the fetched data is simply not read from the cache, but it is still stored in the cache. This means that even after a request that specifies the forceRefresh option, subsequent requests to snapper for that page that don't specify the forceRefresh option will be read from the cache if there is a cache hit (the cache entry hasn't expired) **note that because --no-cache is an option and not an argument, it must come after any named arguments you specify

    ./snapper --no-cache

Using Snapper

To snap a webpage's Opengraph metadata, just make a http POST request to / with the target website specified in the request body using the key page. You can optionally pass the forceRefresh option in the request body to force snapper to fetch the latest metadata and not use any cached values if present, and the optional raw key to specify your desired response format.

Request Body Format

{
  page: string // the url of the page you wish to fetch metadata for,
  forceRefresh: boolean //(optional) - optionally tell snapper to ignore any cached data and fetch the latest page data (cache will be updated),
  raw: boolean //(optional) - optionally tell snapper that you want a response type with array of MetaTag objects containing the property names and content values
}

Response Body Formats

Default Format

The default format contains the 6 main Opengraph property types

{
    url: string // og:url
    title: string // og:title
    description: string // og:description
    image: string // og:image
    type: string //og:type
    locale: string //og:locale
}

Raw Format

The raw format contains the full array of Opengraph property tag names & values

[
    {
        name: string, // opengraph key
        value: string // value for that key
    },
    {
        name: string,
        value: string
    },
    //...
]

Examples

Default format request & response

curl --location --request POST 'http://localhost:8888/' 
--header 'Content-Type: application/json' 
--data-raw '{
    "page": "https://github.com/owl-93/snapper"
}'

Response Code 200
Response Body:

{
  "url": "https://github.com/owl-93/snapper",
  "title": "GitHub - owl-93/snapper: Golang based web site opengraph data scraper with caching",
  "description": "Golang based web site opengraph data scraper with caching - GitHub - owl-93/snapper: Golang based web site opengraph data scraper with caching",
  "image": "https://opengraph.githubassets.com/b63c65ebc5492a24715bae27d7efa53e333686a06cce9ab11ecc0c9ec64615ab/owl-93/snapper",
  "type": "object",
  "locale": ""
}

Raw format request & response

curl --location --request POST 'http://localhost:8888/' 
--header 'Content-Type: application/json' 
--data-raw '{
    "page": "https://github.com/owl-93/snapper",
    "raw" : true
}'

Response Code 200
Response Body:

Note that the Raw response type contains more tags and data than the default response type.

[
    {
        "name": "fb:app_id",
        "value": "1401488693436528"
    },
    {
        "name": "og:image",
        "value": "https://opengraph.githubassets.com/b63c65ebc5492a24715bae27d7efa53e333686a06cce9ab11ecc0c9ec64615ab/owl-93/snapper"
    },
    {
        "name": "og:image:alt",
        "value": "Golang based web site opengraph data scraper with caching - GitHub - owl-93/snapper: Golang based web site opengraph data scraper with caching"
    },
    {
        "name": "og:image:width",
        "value": "1200"
    },
    {
        "name": "og:image:height",
        "value": "600"
    },
    {
        "name": "og:site_name",
        "value": "GitHub"
    },
    {
        "name": "og:type",
        "value": "object"
    },
    {
        "name": "og:title",
        "value": "GitHub - owl-93/snapper: Golang based web site opengraph data scraper with caching"
    },
    {
        "name": "og:url",
        "value": "https://github.com/owl-93/snapper"
    },
    {
        "name": "og:description",
        "value": "Golang based web site opengraph data scraper with caching - GitHub - owl-93/snapper: Golang based web site opengraph data scraper with caching"
    }
]

snapper's People

Contributors

owl-93 avatar retrospct avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.