Git Product home page Git Product logo

Comments (2)

pdehaan avatar pdehaan commented on May 26, 2024

https://trackchanges.postlight.com/were-bullish-on-amp-abfc6e1f10a1#.cab8vkict was also a good read. Curious if we can find a few good pages w/amp support for testing/scraping.

from page-metadata-parser.

pdehaan avatar pdehaan commented on May 26, 2024

I think this is related use case. This one is a stretch, but mildly interesting.

Looking at https://vimeo.com/180763356, we have the following meta tags (with uninteresting <meta> tags removed):

$ meta-scraper -u "https://vimeo.com/180763356"

...
<link rel="apple-touch-icon-precomposed" href="https://i.vimeocdn.com/favicon/main-touch_180">
<link rel="canonical" href="/180763356">
<link rel="logo" type="image/svg" href="https://f.vimeocdn.com/logo.svg">
...
<link rel="shortcut icon" href="https://f.vimeocdn.com/images_v6/favicon.ico" data-play="https://i.vimeocdn.com/favicon/play_32" data-pause="https://i.vimeocdn.com/favicon/pause_32">
...
<meta charset="utf-8">
<meta name="description" content="There&apos;s no telling how many guns we have in America&#x2014;and when one gets used in a crime, no way for the cops to connect it to its owner. The only place&#x2026;">
<meta name="msapplication-TileColor" content="#00adef">
<meta name="msapplication-TileImage" content="https://i.vimeocdn.com/favicon/main-touch_144">
...
<meta name="twitter:card" content="player">
<meta name="twitter:description" content="There&apos;s no telling how many guns we have in America&#x2014;and when one gets used in a crime, no way for the cops to connect it to its owner. The only place&#x2026;">
<meta name="twitter:image" content="https://i.vimeocdn.com/video/589150572_1280x720.jpg">
<meta name="twitter:player" content="https://player.vimeo.com/video/180763356">
<meta name="twitter:player:height" content="720">
<meta name="twitter:player:width" content="1280">
<meta name="twitter:site" content="@vimeo">
<meta name="twitter:site" content="@vimeo">
<meta name="twitter:title" content="The Tracers - An inside look at the Real-Life Database of America&apos;s Firearms.">
...
<meta property="og:description" content="There&apos;s no telling how many guns we have in America&#x2014;and when one gets used in a crime, no way for the cops to connect it to its owner. The only place&#x2026;">
<meta property="og:image" content="https://i.vimeocdn.com/video/589150572_1280x720.jpg">
<meta property="og:image:height" content="720">
<meta property="og:image:secure_url" content="https://i.vimeocdn.com/video/589150572_1280x720.jpg">
<meta property="og:image:type" content="image/jpg">
<meta property="og:image:width" content="1280">
<meta property="og:site_name" content="Vimeo">
<meta property="og:title" content="The Tracers - An inside look at the Real-Life Database of America&apos;s Firearms.">
<meta property="og:type" content="video">
<meta property="og:updated_time" content="2016-09-01T19:56:39-04:00">
<meta property="og:url" content="https://vimeo.com/180763356">
<meta property="og:video:height" content="720">
<meta property="og:video:height" content="720">
<meta property="og:video:secure_url" content="https://player.vimeo.com/video/180763356?autoplay=1">
<meta property="og:video:secure_url" content="https://vimeo.com/moogaloop.swf?clip_id=180763356&amp;autoplay=1">
<meta property="og:video:type" content="application/x-shockwave-flash">
<meta property="og:video:type" content="text/html">
<meta property="og:video:url" content="https://player.vimeo.com/video/180763356?autoplay=1">
<meta property="og:video:url" content="https://vimeo.com/moogaloop.swf?clip_id=180763356&amp;autoplay=1">
<meta property="og:video:width" content="1280">
<meta property="og:video:width" content="1280">
<meta property="video:tag" content="ATF">
<meta property="video:tag" content="Firearms">
<meta property="video:tag" content="GQ">
<meta property="video:tag" content="Guns">
<meta property="video:tag" content="National Tracing Center">
<meta property="video:tag" content="The Tracers">
<title>The Tracers - An inside look at the Real-Life Database of America&apos;s Firearms. on Vimeo</title>

Plus this <script type="application/ld+json"> tag:

<script type="application/ld+json">...</script>

<script type="application/ld+json">
[
  {
    "url": "https://vimeo.com/180763356",
    "thumbnailUrl": "https://i.vimeocdn.com/video/589150572_1280x720.webp",
    "embedUrl": "https://player.vimeo.com/video/180763356",
    "name": "The Tracers - An inside look at the Real-Life Database of America&#039;s Firearms.",
    "description": "There&#039;s no telling how many guns we have in America&mdash;and when one gets used in a crime, no way for the cops to connect it to its owner. The only place&hellip;",
    "height": 1080,
    "width": 1920,
    "playerType": "HTML5 Flash",
    "videoQuality": "HD",
    "duration": "PT00H07M26S",
    "uploadDate": "2016-08-30T12:52:25-04:00",
    "thumbnail": {
      "@type": "ImageObject",
      "url": "https://i.vimeocdn.com/video/589150572_1280x720.webp",
      "width": 1280,
      "height": 720
    },
    "author": {
      "@type": "Person",
      "name": "Steven Brahms",
      "url": "https://vimeo.com/stevenbrahms"
    },
    "potentialAction": {
      "@type": "ViewAction",
      "target": "vimeo://app.vimeo.com/videos/180763356"
    },
    "interactionCount": 13055,
    "keywords": "[GQ,The Tracers,National Tracing Center,Firearms,ATF,Guns]",
    "@type": "VideoObject",
    "@context": "http://schema.org"
  },
  {
    "itemListElement": [
      {
        "@type": "ListItem",
        "position": 1,
        "item": {
          "@id": "https://vimeo.com/stevenbrahms",
          "name": "Steven Brahms"
        }
      },
      {
        "@type": "ListItem",
        "position": 2,
        "item": {
          "@id": "https://vimeo.com/stevenbrahms/videos",
          "name": "Videos"
        }
      }
    ],
    "@type": "BreadcrumbList",
    "@context": "http://schema.org"
  }
]
</script>

So our current parser cannot get any keywords unless we can parse the AMP <script type="application/ld+json"> block, or start parsing+merging multiple <meta property="video:tag" content=".."> tags (in addition to article:tag and book:tag tags, per http://ogp.me/).

from page-metadata-parser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.