Git Product home page Git Product logo

useragentparser's Introduction

Build Status Scrutinizer Code Quality Maintainability Test Coverage License Packagist

User-Agent parser for robot rule sets

Parser and group determiner optimized for robots.txt, X-Robots-tag and Robots-meta-tag usage cases.

SensioLabsInsight

Requirements:

  • PHP 5.5+, 7.0+ or 8.0+

Installation

The library is available for install via Composer. Just add this to your composer.json file:

{
    "require": {
        "vipnytt/useragentparser": "^1.0"
    }
}

Then run php composer update.

Features

  • Stripping of the version tag.
  • List any rule groups the User-Agent belongs to.
  • Determine the correct group of records by finding the group with the most specific User-agent that still matches.

When to use it?

  • When parsing robots.txt rule sets, for robots online.
  • When parsing the X-Robots-Tag HTTP header.
  • When parsing Robots meta tags in HTML / XHTML documents.

Note: Full User-agent strings, like them sent by eg. web-browsers, is not compatible, this is by design. Supported User-agent string formats are UserAgentName/version with or without the version tag. Eg. MyWebCrawler/2.0 or just MyWebCrawler.

Getting Started

Strip the version tag.

use vipnytt\UserAgentParser;

$parser = new UserAgentParser('googlebot/2.1');
$product = $parser->getProduct()); // googlebot

List different groups the User-agent belongs to

use vipnytt\UserAgentParser;

$parser = new UserAgentParser('googlebot-news/2.1');
$userAgents = $parser->getUserAgents());

array(
    'googlebot-news/2.1',
    'googlebot-news/2',
    'googlebot-news',
    'googlebotnews',
    'googlebot'
);

Determine the correct group

Determine the correct group of records by finding the group with the most specific User-agent that still matches your rule sets.

use vipnytt\UserAgentParser;

$parser = new UserAgentParser('googlebot-news');
$match = $parser->getMostSpecific(['googlebot/2.1', 'googlebot-images', 'googlebot'])); // googlebot

Cheat sheet

$parser = new UserAgentParser('MyCustomCrawler/1.2');

// Determine the correct rule set (robots.txt / robots meta tag / x-robots-tag)
$parser->getMostSpecific($array); // string

// Parse
$parser->getUserAgent(); // string 'MyCustomCrawler/1.2'
$parser->getProduct(); // string 'MyCustomCrawler'
$parser->getVersion(); // string '1.2'

// Crunch the data into groups, from most to less specific
$parser->getUserAgents(); // array
$parser->getProducts(); // array
$parser->getVersions(); // array

Specifications

useragentparser's People

Contributors

janpettermg avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.