Git Product home page Git Product logo

htmldoc's Introduction

HTMLDoc: PHP HTML Document Parser and Minifier

A tokeniser based HTML document parser and minifier, written in PHP.

Licence Status: Stable Tests Status Code Coverage

Description

An HTML parser, primarily designed for minifying HTML documents, it also enables the document structure to be queried allowing attribute and textnode values to be extracted.

The parser is designed around a tokeniser to make the document processing more reliable than regex based minifiers, which are a bit blunt and can be problematic if they match patterns in the wrong places.

The software is also capable of processing and minifying SVG documents.

Usage

To minify an HTML document:

use hexydec\html\htmldoc;

$doc = new htmldoc();

// load from a variable
if ($doc->load($html) {

	// minify the document
	$doc->minify();

	// compile back to HTML
	echo $doc->save();
}

You can test out the minifier online at https://hexydec.com/apps/minify-html/, or run the supplied index.php file after installation.

To extract data from an HTML document:

use hexydec\html\htmldoc;

$doc = new htmldoc();

// load from a URL this time
if ($doc->open($url) {

	// extract text
	$text = $doc->find('.article__body')->text();

	// extract attribute
	$attr = $doc->find('.article__author-image')->attr('src');

	// extract HTML
	$html = $doc->find('.article__body')->html();
}

Installation

The easiest way to get up and running is to use composer:

$ composer install hexydec/htmldoc

HTMLdoc requires \hexydec\token\tokenise to run, which you can install manually if not using composer. Optionally you can also install CSSdoc and JSlite to perform inline CSS and Javascript minification respectively.

All these dependencies will be installed through composer.

Test Suite

You can run the test suite like this:

Linux

$ vendor/bin/phpunit

Windows

> vendor\bin\phpunit

Documentation

Support

HTMLdoc supports PHP version 8.0+.

Contributing

If you find an issue with HTMLdoc, please create an issue in the tracker.

If you wish to fix an issue yourself, please fork the code, fix the issue, then create a pull request, and I will evaluate your submission.

Licence

The MIT License (MIT). Please see License File for more information.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.