Git Product home page Git Product logo

url-inspector's Introduction

NPM

url-inspector

Get metadata about any URL.

Limited memory and network usage.

This is a node.js module.

It returns and normalizes information found in http headers or in the resource itself using exiftool (which knows almost everything about files but html), or a sax parser to read oembed, opengraph, twitter cards, schema.org attributes or standard html tags.

Both tools stop inspection when they gathered enough tags, or stop when a max number of bytes (depending on media type) have been downloaded.

A demo using this module is available, with url-inspector-daemon

  • url
    url of the inspected resource

  • title
    title of the resource, or filename, or last component of pathname with query

  • description
    optional longer description, without title in it

  • site
    the name of the site, or the domain name

  • mime
    RFC 7231 mime type of the resource (defaults to Content-Type)
    The inspected mime type could be more accurate than the http header.

  • ext
    the extension matching the mime type (not the file extension)

  • type
    what the resource represents
    image, video, audio, link, file, embed, archive

  • html
    a canonical html representation of the full resource,
    depending on the type and mime, could be img, a, video, audio, iframe tag.

  • size
    optional Content-Length; discarded when type is embed

  • icon
    optional link to the favicon of the site

  • width, height
    optional dimensions

  • duration
    optional

  • thumbnail
    optional a URL to a thumbnail, could be a data-uri for embedded images

  • source
    optional a URL that can go in a 'src' attribute; for example a resource can be an html page representing an image type. The URL of the image itself would be stored here; same thing for audio, video, embed types.

  • error
    optional an http error code, or string

  • all
    an object with all additional metadata that was found

Installation

npm install url-inspector

Add -g switch to install the executable.

exiftool executable must be available:

API

var inspector = require('url-inspector');

var opts = {
	all: false, // return all available non-normalized metadata
	ua: "Mozilla/5.0", // some oembed providers might not answer otherwise
	nofavicon: false, // disable any favicon-related additional request
	nosource: false, // disable any sub-source inspection for audio, video, image types
	providers: [{ // an array of custom OEmbed providers, or path to a module exporting such an array
		provider_name: "Custom OEmbed provider",
		endpoints: [{
			schemes: ["http:\/\/video\.com\/*"],
			builder: function(urlObj, obj) {
				// can see current obj and override arbitrary props
				obj.embed = "custom embed url";
			}
		}]
	}]
};
	}]
};

inspector(url, opts, function(err, obj) {

});

// or simply

inspector(url, function(err, obj) {...});

Command-line client

inspector-url <url>

Low resource usage

network:

  • a maximum of several hundreds of kilobytes (depending on resource type) is downloaded but it is usually much less, depending on connection speed.
  • inspection stops as soon as enough metadata is gathered

memory:

  • html is inspected using a sax parser, without building a full DOM.

exiftool:

  • runs using streat module, which keeps exiftool always open for performance

License

See LICENSE.

See also

https://github.com/kapouer/url-inspector-daemon

https://github.com/kapouer/node-streat

url-inspector's People

Contributors

kapouer avatar

Watchers

Eusthace Corin avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.