Git Product home page Git Product logo

metadata_fetch's Introduction

Metadata Fetch

A dart library for extracting metadata in web pages. Supports OpenGraph, Meta, Twitter Cards, and Structured Data (Json-LD)

Available on Pub Dev: Pub

Metadata Structure

Metadata:
  - title
  - description
  - image
  - url

Usage

Extract Metadata for a given URL

import 'package:metadata_fetch/metadata_fetch.dart';

main() async {
  final myURL = 'https://flutter.dev';

  // Use the `MetadataFetch.extract()` function to fetch data from the url
  var data = await MetadataFetch.extract(myURL); 

  print(data.title) // Flutter - Beautiful native apps in record time

  print(data.description) // Flutter is Google's UI toolkit for crafting beautiful...

  print(data.image) // https://flutter.dev/images/flutter-logo-sharing.png

  print(data.url) // https://flutter.dev/

  var dataAsMap = data.toMap();


}

Parsing Manually

Get aggregated Metadata from a document

This method prioritizes Open Graph data, followed by Twitter Card, JSON-LD and finally falls back to HTML metadata.

import 'package:metadata_fetch/metadata_fetch.dart';
import 'package:http/http.dart' as http;

void main () async {

  final myURL = 'https://flutter.dev';

  // makes a call
  var response = await http.get(myURL);

  // Convert Response to a Document. The utility function `MetadataFetch.responseToDocument` is provided or you can use own decoder/parser.
  var document = MetadataFetch.responseToDocument(response);


  // get aggregated metadata
  var data = MetadataParser.parse(document);
  print(data);


}

Manually specify which Metadata parser to use

import 'package:metadata_fetch/metadata_fetch.dart';
import 'package:http/http.dart' as http;

void main () async {

  final myURL = 'https://flutter.dev';

  // Makes a call
  var response = await http.get(myURL);

  // Convert Response to a Document. The utility function `responseToDocument` is provided or you can use own decoder/parser.
  var document = responseToDocument(response);


  // Get OpenGraph Metadata
  var ogData = MetadataParser.OpenGraph(document);
  print(ogData);

  // Get Html metadata
  var htmlData = MetadataParser.HtmlMeta(document);
  print(htmlData);

  // Get Structured Data
  var structuredData = MetadataParser.JsonLdSchema(document);
  print(structuredData);

  // Get Twitter Cards Data
  var  twitterCardData = MetadataParser.TwitterCard(document);
  print(twitterCardData);

}

Provide a fallback url when manually parsing

If the parsers cannot extract a URL from the document, you may optionally provide a URL in MetadataFetch.parse().

This URL will be added in the final Metadata structure, and is used to resolve images with relative URLs (non-absolute URLs).

import 'package:metadata_fetch/metadata_fetch.dart';
import 'package:http/http.dart' as http;

void main () async {

  final myURL = 'https://flutter.dev';

  // makes a call
  var response = await http.get(myURL);

  // Convert Response to a Document. The utility function `MetadataFetch.responseToDocument` is provided or you can use own decoder/parser.
  var document = MetadataFetch.responseToDocument(response);


  // get aggregated metadata, supplying a fallback URL
  // Used for images with relative URLs
  var data = MetadataParser.parse(document, url:myURL);
  print(data);

}

Credit

This library is inspired by open_graph_parser. However this one tries to be more general.

Roadmap

  • Weighted or Preferred Metadata. Can assign custom weights for each parser to provide a fallback priority sytem
  • Improve Documentation

Questions, Bugs, and Feature Requests

Please forward all queries about this project to the issue tracker.

metadata_fetch's People

Contributors

cornerman avatar hrx03 avatar j-j-gajjar avatar jetisr avatar jg-l avatar jose-almir avatar llucax avatar magtuxgit avatar numa08 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

metadata_fetch's Issues

Error: XMLHttpRequest error.

    var data = await MetadataFetch.extract(
      'https://flutter.dev'); // returns a Metadata object
          print(data); // Metadata.toString()
          print(data?.title); // Metadata.title
          print(data?.toMap()); // converts Metadata to map
          print(data?.toJson()); // converts Metadata to JSON

I tried to get the title tag of website in a web browser on my local host, but it failed. It also fails in the production environment because of this error. I'm sure the cross-domain error has something to do with it, but how can I get the meta tags in my web browser?

Error: XMLHttpRequest error.
    dart-sdk/lib/_internal/js_dev_runtime/patch/core_patch.dart 909:28                get current
packages/http/src/browser_client.dart 71:22                                       <fn>
dart-sdk/lib/async/zone.dart 1613:54                                              runUnary
dart-sdk/lib/async/future_impl.dart 155:18                                        handleValue
dart-sdk/lib/async/future_impl.dart 707:44                                        handleValueCallback
dart-sdk/lib/async/future_impl.dart 736:13                                        _propagateToListeners
dart-sdk/lib/async/future_impl.dart 533:7                                         [_complete]
dart-sdk/lib/async/stream_pipe.dart 61:11                                         _cancelAndValue
dart-sdk/lib/async/stream.dart 1219:7                                             <fn>
dart-sdk/lib/_internal/js_dev_runtime/private/ddc_runtime/operations.dart 324:14  _checkAndCall
dart-sdk/lib/_internal/js_dev_runtime/private/ddc_runtime/operations.dart 329:39  dcall
dart-sdk/lib/html/dart2js/html_dart2js.dart 37307:58                              <fn>


    at Object.createErrorWithStack (http://localhost:7357/dart_sdk.js:5054:12)
    at Object._rethrow (http://localhost:7357/dart_sdk.js:37670:16)
    at async._AsyncCallbackEntry.new.callback (http://localhost:7357/dart_sdk.js:37666:13)
    at Object._microtaskLoop (http://localhost:7357/dart_sdk.js:37526:13)
    at _startMicrotaskLoop (http://localhost:7357/dart_sdk.js:37532:13)
    at http://localhost:7357/dart_sdk.js:33303:9

responseToDocument always return null

final response = await http.get(Uri.parse(f.link)).timeout(
const Duration(seconds: 60));

var document = metadata_fetch.MetadataFetch.responseToDocument(response);

ALWAYS RETURN NULL

unable to convert response to document

HttpRequestData extension uses a hidden global

As shown by the test added in #17, using a global to fakely store the requestUrl in Document can have very nasty effects. Since the user of the library is the one really knowing where the data to parse is coming from, I don't think there is any need for having the extension at all.

Update HTTP dependency to ^0.13.0

Recently there was a big update in Firebase dependencies, making a lot of libraries having conflicts with other outdated dependencies.

Can we update this library to use http: ^0.13.0?

Also, really appreciate your work with this!

Doesn't work for Flutter Web, fails with XMLHttpRequest error

Stack Trace

` Error: XMLHttpRequest error.
dart-sdk/lib/_internal/js_dev_runtime/patch/core_patch.dart 906:28 get current
packages/http/src/browser_client.dart 71:22
dart-sdk/lib/async/zone.dart 1612:54 runUnary
dart-sdk/lib/async/future_impl.dart 152:18 handleValue
dart-sdk/lib/async/future_impl.dart 704:44 handleValueCallback
dart-sdk/lib/async/future_impl.dart 733:13 _propagateToListeners
dart-sdk/lib/async/future_impl.dart 530:7 [_complete]
dart-sdk/lib/async/stream_pipe.dart 61:11 _cancelAndValue
dart-sdk/lib/async/stream.dart 1219:7
dart-sdk/lib/_internal/js_dev_runtime/private/ddc_runtime/operations.dart 324:14 _checkAndCall
dart-sdk/lib/_internal/js_dev_runtime/private/ddc_runtime/operations.dart 329:39 dcall
dart-sdk/lib/html/dart2js/html_dart2js.dart 37307:58

at Object.createErrorWithStack (http://localhost:53250/dart_sdk.js:5348:12)
at Object._rethrow (http://localhost:53250/dart_sdk.js:39350:16)
at async._AsyncCallbackEntry.new.callback (http://localhost:53250/dart_sdk.js:39344:13)
at Object._microtaskLoop (http://localhost:53250/dart_sdk.js:39176:13)
at _startMicrotaskLoop (http://localhost:53250/dart_sdk.js:39182:13)
at http://localhost:53250/dart_sdk.js:34689:9`

I tried to run the sample code and it fails, Is this a known bug ?

Flutter Web tag

I read one of the discussion which suggested (by jg-I) using of proxy for flutter web, Could you just update the tags for metadata_fetch (pub.dev) which includes flutter web.
Thank you.

Not able to fetch data from some URL's

I am using this library and doing something like this .
try {
showProgress(context);
var data = await extract("https://www.nst.com.my/news/nation/2020/05/595467/man-who-rescued-monitor-lizard-becomes-internet-hero");
} catch (e) {
hideProgress(context);
}

I am getting an exception.

I checked inside the library and the code showing exception at
JsonLdSchema(document) .

The issue is coming when i am extracting image from class JsonLdParser . I have attached an snapshot of the exception.

Screenshot 2020-05-29 at 6 22 39 PM

Parsing HTML with the html package is not really supported

When using the html package (parse() function), the HtmlMetaParser() throws an exception because document.requestUrl is not set. Fixing this doesn't seem to be trivial, because it exposes another fundamental issue: document.requestUrl is implemented via an extension on html's Document class using a global variable (static class variable). This also means you can't work with two different Documents at the time. An ugly hack to fix this would be to at least use a Map to store the different Document instances requestUrl instead of only having one (using the object identity hash as key, using identityHashCode()).

The code throwing the exception (assuming Metadata.url will always be non-null) is here:

var pageUrl = Uri.parse(data.url);

Fetching wrong image from amazon URLs

I'm using this plugin to fetch (and display) images connected to some urls in my app.
I tried this plugin with many sites and works perfectly, but with Amazon it returns me a wrong link.

For example, passing this link it returns me this image url (which is not the correct image) while metadata-fetching APIs such as LinkPreview.net return this one

Why is this happening? And how can I solve it? I would prefer not to use a proprietary third-party API.
(By the way, congratulations on the excellent work that the plugin does with the other sites!)

FormatException: Bad UTF-8 encoding 0x20 (at offset 40453)

Hello,

The package has some issue with some UTF-8 encoded pages,
this is the code:

void main() async {
  String url = 'https://buff.ly/2CnEsKq';
  Metadata data = await extract(url);
  print(data.toMap());
  // output: {title: null, description: null, image: null, url: null}
}

after some debugging, I figure out that I was getting an error in the catch section in these lines

try {
document = parser.parse(utf8.decode(response.bodyBytes));
document.requestUrl = response.request.url.toString();
} catch (err) {
return document;
}

and the error message is FormatException: Bad UTF-8 encoding 0x20 (at offset 40453)

CORS issue on flutter web

is there a way to fix CORS issue on flutter web?

Access to XMLHttpRequest at 'https://pub.dev/' from origin 'https://example.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.

JsonLdParser is trying to parsing invalid data

An URL such as https://www.instagram.com/p/CmHTCMoJSV5/ emits empty string for data because the result of document?.head ?.querySelector("script[type='application/ld+json']") is <html script> at the following method in JsonLdParser.
It produces an exception, FormatException: Unexpected end of input (at character 1)

dynamic _parseToJson(Document? document) {
final data = document?.head
?.querySelector("script[type='application/ld+json']")
?.innerHtml;
if (data == null) {
return null;
}
var d = jsonDecode(data);
return d;
}

Google News link return Google News icon instead of article image

Make default metadata configurable

Currently, we return a default metadata document with the url as description and the domain name as title. But this is not always desired. As a user of this library, I want to be aware that the extraction did not work and provide my own default.

For me, it would be okay to return null or if that is not desired, we should have an optional argument in the extract method to provide our own default, like Future<Metadata> extract(String url, {Metadata Function() orElse})

Looking for a maintainer

This package deserves more attention but I dont have the bandwidth to service it.

Looking for maintainers. Please inquire in the comments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.