Git Product home page Git Product logo

behatcrawler's Introduction

BehatCrawler

PHP Composer

The BehatCrawler is a Behat, MinkExtension and Selenium2Driver extension that crawls a given URL and executes user-defined functions in each crawled page.

Multiple options for crawling are available, see available options.

Installation

composer require piopi/behatcrawler

Usage

Start by importing the extension, to your Feature Context (or any of your Context):

use Behat\Crawler\Crawler;

Create your Crawler object with the default configuration:

The crawler is only compatible at this time with Selenium2Driver

//$crawler=New Crawler(BehatSession);
$crawler= New Crawler($this->getSession());

For custom settings (passed as an array), see the following table for all the available options.

$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20]);

Available options: (More functionalities coming soon)

Option Description Default Value
Depth Maximum depth that can be crawled from URL 0 (unlimited)
MaxCrawl Maximum number of crawls 0 (unlimited)
HTMLOnly Will only crawl HTML/xHTML pages true
internalLinksOnly Will crawl internal links only (links with same Domaine name as the initial URL) true
waitForCrawl Will wait for the crawler to finish crawling before throwing any exception originating from the user defined functions. (Compile a list of all exceptions found with their respective location) false

Option can either be set in the constructor or with the appropriate getters/setters:

 $crawler= New Crawler($this->getSession(),["MaxCrawl"=>10]);
//or
$crawler->setMaximumCrawl(10);

Start Crawling

After creating and setting up the crawler, you can start crawling by passing your function as an argument:

Please refer to the PHP Callables documentation for more details.

Examples:

Closure::fromCallable is used to pass by parameter private function

//function 1 is a private function
$crawler->startCrawling(Closure::fromCallable([$this, 'function1']));
//function 2 is a public class function
$crawler->startCrawling([$this, 'function1']);

For functions with one or more arguments, they can be passed as the following:

$crawler->startCrawling(Closure::fromCallable([$this, 'function3']),[arg1]);
$crawler->startCrawling(Closure::fromCallable([$this, 'function4']),[arg1,arg2]);

Usage Example

use Behat\Crawler\Crawler;
//Crawler with different settings
$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20,"waitForCrawl"=>true]);
//Function without arguments
$crawler->startCrawling(Closure::fromCallable([$this, 'function1'])); //Will start crawling
//Function with one or more argument
$crawler->startCrawling(Closure::fromCallable([$this, 'function2']),[arg1,arg2]);

In a Behat step function:

   /**
     * @Given /^I crawl the website with a maximum of (\d+) level$/
     */
    public function iCrawlTheWebsiteWithAMaximumOfLevel($arg1)
    {
        $crawler= New Crawler($this->getSession(),["Depth"=>$arg1]);
        $crawler->startCrawling([$this, 'test']);
    }

Copyright

Copyright (c) 2020 Mostapha El Sabah [email protected]

Maintainers

Mostapha El Sabah Piopi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.