masterminds / html5-php Goto Github PK
View Code? Open in Web Editor NEWAn HTML5 parser and serializer for PHP.
Home Page: http://masterminds.github.io/html5-php/
License: Other
An HTML5 parser and serializer for PHP.
Home Page: http://masterminds.github.io/html5-php/
License: Other
Some tag attributes are case sensitive. This happens when something like svg is embedded. So, not all attribute names should be converted to lowercase.
For ref on SVG see http://www.w3.org/Graphics/SVG/WG/wiki/SVG_in_HTML5
hello,
i just started using your html5-parser and i'm trying to load a fragment.
using the code shown in the wiki it's no problem:
require "vendor/autoload.php";
use Masterminds\HTML5;
$html5 = new HTML5();
// An example HTML fragment:
$fragment = "<p>This is a test of the HTML5 parser.<p>";
$dom = $html5->loadHTMLFragment($fragment);
but when i try to parse other tags it's not working.
for example this code
require "vendor/autoload.php";
use Masterminds\HTML5;
$html5 = new HTML5();
// An example HTML fragment:
$fragment = "<td>This is a test of the HTML5 parser.<td>";
$dom = $html5->loadHTMLFragment($fragment);
error shown:
Notice: Undefined property: DOMDocumentFragment::$tagName in C:\xampp\htdocs\html5\vendor\masterminds\html5\src\HTML5\Parser\TreeBuildingRules.php on line 138
stacktrace:
Function | Location | |
---|---|---|
1 | {main}( ) | ..\index.php:0 |
2 | Masterminds\HTML5->loadHTMLFragment( ) | ..\index.php:21 |
3 | Masterminds\HTML5->parseFragment( ) | ..\HTML5.php:128 |
4 | Masterminds\HTML5\Parser\Tokenizer->parse( ) | ..\HTML5.php:181 |
5 | Masterminds\HTML5\Parser\Tokenizer->consumeData( ) | ..\Tokenizer.php:83 |
6 | Masterminds\HTML5\Parser\Tokenizer->tagOpen( ) | ..\Tokenizer.php:126 |
7 | Masterminds\HTML5\Parser\Tokenizer->tagName( ) | ..\Tokenizer.php:269 |
8 | Masterminds\HTML5\Parser\DOMTreeBuilder->startTag( ) | ..\Tokenizer.php:371 |
9 | Masterminds\HTML5\Parser\TreeBuildingRules->evaluate( ) | ..\DOMTreeBuilder.php:398 |
10 | Masterminds\HTML5\Parser\TreeBuildingRules->closeIfCurrentMatches( ) | ..\TreeBuildingRules.php:90 |
Hi, I encounter some problems when parsing html which contains certain elments:
HTML:
<!DOCTYPE html>
<html>
<body>
<div>
<table style="width: 520px; height: 361px;" border="1px solid">
<tbody>
<tr>
<td>a</td>
<td>b</td>
<td>c</td>
<td>d</td>
<td>d</td>
<td>f</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
PHP:
require_once(__DIR__ . "/vendor/autoload.php");
$html = file_get_contents("1.html");
$dom = HTML5::loadHTML($html); //DOMDocument
echo HTML5::saveHTML($dom);
What I get is a wrong result:
<html><body>
<div>
<table style="width: 520px; height: 361px;" border="1px solid"></table>
<tbody></tbody>
<tr></tr>
<td></td>a
<td></td>b
<td></td>c
<td></td>d
<td></td>d
<td></td>f
</div>
</body>
</html>
It works well using DOMDocument::loadHTML parsing the same test html file.
phpdocumentor/phpdocumentor
in compser.json
is really neded?
It requires a lot of dependencies to be downloaded in build phase...
PHP Strict Standards: Non-static method DOMImplementation::createDocumentType() should not be called statically, assuming $this from incompatible context in /Users/mfarina/Code/HTML5-PHP/src/HTML5/Parser/DOMTreeBuilder.php on line 60
Strict Standards: Non-static method DOMImplementation::createDocumentType() should not be called statically, assuming $this from incompatible context in /Users/mfarina/Code/HTML5-PHP/src/HTML5/Parser/DOMTreeBuilder.php on line 60
PHP Strict Standards: Non-static method DOMImplementation::createDocument() should not be called statically, assuming $this from incompatible context in /Users/mfarina/Code/HTML5-PHP/src/HTML5/Parser/DOMTreeBuilder.php on line 62
Strict Standards: Non-static method DOMImplementation::createDocument() should not be called statically, assuming $this from incompatible context in /Users/mfarina/Code/HTML5-PHP/src/HTML5/Parser/DOMTreeBuilder.php on line 62
I've been getting these errors when running example.php.
$html5 = new HTML5();
$doc = $html5->loadHTML( "<!DOCTYPE html>\n<html><head><title></title></head><body><p>0</p><p>1</p></body></html>" );
echo $html5->saveHTML( $doc );
Result:
<!DOCTYPE html>
<html><head><title></title></head><body><p></p><p>01</p></body></html>
I'm using html5-php 2.0.0 and PHP 5.3.3-7+squeeze15.
What about change the coding standard? (With the incoming 2.0 release?)
The require-dev
elements listed in the composer.json
file are installed by default. That means every place someone installed this library they will also be installing phpunit and the symfony yaml parser in the vendor directory.
@technosophos should we pull or leave require-dev
for phpunit?
The HTML5 spec supports CDATA sections, but the parser converts CDATA (incorrectly) into a comment section:
1) HTML5\Tests\SerializerTest::testCDATA
Failed asserting that '<!DOCTYPE html>
<html><head></head><body>a<!--[CDATA[ This <is--> a test. ]]>b</body></html>
' matches PCRE pattern "|<![CDATA[ This <is> a test. ]]>|".
/Users/mattbutcher/Code/HP/HTML5-PHP/test/HTML5/SerializerTest.php:115
Hi,
I discovered some weird behavior at this page http://rayer.g6.cz/. I also pasted source HTML here http://pastebin.com/FQjSEGCK .
Everything from the text in html > head > title
is escaped (even </TITLE>
tag). I find out that if I use function strtolower
like this \HTML5::loadHTML(strtolower($html))
HTML is parsed correctly. Can you look at this please?
Thank you for your work - I can parse HTML also in PHP finally :)
Over at https://www.drupal.org/node/1333730, we're working on pulling in masterminds/html5
via Composer into Drupal 8 core. But we're running into a problem: this library doesn't seem to support elements with dashes.
Elements with dashes are necessary for Web Components support (http://w3c.github.io/webcomponents/spec/custom/). However, technically, the Web Components spec is non-normative (http://www.w3.org/TR/html5/references.html#references), so it's not necessary — strictly speaking.
That being said, I think most people would argue Web Components are clearly going to become an important aspect of web development in the not-too-distant future, and hence we want to make sure Drupal 8 doesn't break them, and hence it'd necessary for this library not to break them, if Drupal 8 wants to use this library.
Would you be willing to add support for Web Components, and hence elements with dashes?
Sporadically I have seen XML namespace errors in the parser, or cases where the parser ignores an XML namespace declaration.
Hi!
I suggest to change error handling in DOMTreeBuilder
. (
https://github.com/Masterminds/html5-php/blob/master/src/HTML5/Parser/DOMTreeBuilder.php#L79)
Injecting a errors
property to DOMDocument
is not so clean and won't always work, especially on HHVM.
This appears to only be a issue before the tag, this:
<!--[if lte IE 8]> <html class="no-js lt-ie9" lang="en"> <![endif]-->
<!--[if gt IE 8]> <!--><html class="no-js" lang="en"><!--<![endif]-->
is being turned into this:
<html class="no-js" lang="en"><!--<![endif]-->
The serializer currently is not encoding data properly for output. This enables certain documents to be crafted which can expose XSS vulnerabilities. For example, the cdata serializer just outputs the text directly. When crafted with a malicious payload, this results in an attack vector:
$html = "<!DOCTYPE html>
<html>
<head>
<title>TEST</title>
</head>
<body id='foo'>
<h1>Hello World</h1>
<p>This is a test of the HTML5 parser.</p>
</body>
</html>";
// Parse the document. $dom is a DOMDocument.
$dom = \HTML5::loadHTML($html);
$els = $dom->getElementsByTagName('h1');
$els->item(0)->appendChild(new DomCDataSection('this ]]><script>alert(hi!);</script><![CDATA[ is injected'));
var_dump(\HTML5::saveHTML($dom));
This will output:
<!DOCTYPE html>
<html><head>
<title>TEST</title>
</head>
<body id="foo">
<h1>Hello World<![CDATA[this ]]><script>alert(hi!);</script><![CDATA[ is injected]]></h1>
<p>This is a test of the HTML5 parser.</p>
</body>
</html>
Which is obviously bad.
Comments and raw text fields both suffer a similar problem as well (from a quick glance).
Reading http://www.w3.org/TR/html51/syntax.html#the-before-html-insertion-mode
Especially:
A start tag whose tag name is "html"
Create an element for the token in the HTML namespace, with the Document as the intended parent. Append it to the Document object. Put this element in the stack of open elements.
HTML namespace should be: http://www.w3.org/1999/xhtml
Does this implies that line https://github.com/Masterminds/html5-php/blob/master/src/HTML5/Parser/DOMTreeBuilder.php#L227 should became:
$ele = $this->doc->createElement($lname, $htmlNs);
?
The parser is confused if you add whitespaces into </title>
, like:
<title>Note the space after "title"</title >
<title>Another example<title
>
Both examples above are valid according to the W3 Validator.
This behaviour is caused by Tokenizer.php which assumes the end tag is always exactly </title>
.
<?php
require_once __DIR__ . "/vendor/autoload.php";
$html = <<<EOF
<!doctype html>
<html>
<head>
<title>This is valid, really.</title >
</head>
<body></body>
</html>
EOF;
$parser = new Masterminds\HTML5;
$dom = $parser->loadHTML( $html );
echo $parser->saveHTML( $dom );
<!DOCTYPE html>
<html><head>
<title>This is valid, really.</title >
</head>
<body></body>
</html></title></head></html>
$html5 = new HTML5();
$htmlStr = <<<HERE
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<p>Testing</p>
</body>
HERE;
$doc = $html5->loadHTML( $htmlStr );
$xPath = new DOMXPath( $doc );
echo $xPath->query( '//p' )->length; // "0" in 2.0.0; "1" in 1.0.3
PHP 5.3.3.
Hello,
Thanks for the nice library. I installed and tried this HTML5 lib.
And founded that handling of entity references in title element is wrong like this:
<?php
// entityref-in-title.php
require_once 'vendor/autoload.php';
$html = <<<EOH
<!doctype html>
<title>'</title>
<p>'</p>
EOH;
echo \HTML5::loadHTML($html)->saveHTML();
$ php -v
PHP 5.6.0beta1 (cli) (built: Apr 17 2014 15:46:38)
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.6.0-dev, Copyright (c) 1998-2014 Zend Technologies
$ php ./entityref-in-title.php
<!DOCTYPE html>
<html><title>&#x27;</title>
<p>'</p></html>
In example above, '
should be decoded as '
(quotation) but actually doesn't.
If I set text mode for title
element to 81, the entity ref is decoded properly:
<?php
require_once 'vendor/autoload.php';
$html = <<<EOH
<!doctype html>
<title>'</title>
<p>'</p>
EOH;
\HTML5\Elements::$html5['title'] = 81;
echo \HTML5::loadHTML($html)->saveHTML();
$ php ./entityref-in-title.php
<!DOCTYPE html>
<html><title>'</title>
<p>'</p></html>
I've intended to send a pull request but I couldn't because I didn't know why ¥HTML5¥Elements::$html5['title']
was set to 5.
Could you consider about this?
In the current Drupal 8 test coverage, which still uses PHP's DomDocument
(and hence makes assertions based on a XHTML POV), we have the following two assertions:
$f = Html::normalize('<p>line1<br/><hr/>line2</p>');
$this->assertEqual($f, '<p>line1<br></p><hr>line2', 'HTML corrector -- Move non-inline elements outside of inline containers.');
$f = Html::normalize('<p>line1<div>line2</div></p>');
$this->assertEqual($f, '<p>line1</p><div>line2</div>', 'HTML corrector -- Move non-inline elements outside of inline containers.');
The second still works with HTML5
. The first doesn't.
Instead of moving the <hr>
outside of the <p>
, it keeps it inside:
<p>line1<br><hr>line2</p>
Looking at \MasterMinds\HTML5\Elements
, I see:
"hr" => 73, // NORMAL | VOID_TAG | BLOCK_TAG
So it's definitely marked as a block-level element. Which makes me suspect that HTML5
simply doesn't do this kind of clean-up, and that it's merely by accident (as a side-effect of some other parsing aspect) that the second test case is handled correctly.
Which makes me wonder if this is behavior only required for XHTML parsers and not HTML5 parsers?
The codebase is now running through Travis CI. And, it shows some tests are failing in PHP 5.4. See https://travis-ci.org/Masterminds/html5-php/jobs/7584913 for more details.
Switch all classes from HTML5
namespace to Masterminds\HTML5
namespace
Is there a good reason why we doesn't use \SplFileObject or \php_stream_filter?
It could probably make the parser less complex; perhaps starting from the next major update.
What's your thoughts?
noticed an issue on our site today.
we had the characters R&D in our html. Obviously this should be R &a mp; D to be accurate, however when html5-php parses this, it parses it to R&
I'm assuming this is because &D isn't an html entity so it defaults to the & I would have expected the output to be R &a mp; D though.
$in = "<!DOCTYPE html>
<html>
<head>
<title>My Webpage</title>
</head>
<body>foo</body>
</html>";
$dom = \HTML5::loadHTML($in);
$out = \HTML5::saveHTML($dom);
($out == $in); // false < should be true
The value of $out is:
<!DOCTYPE html>
<html><head>
<title>My Webpage</title>
</head>
<body>foo</body>
</html>
Spaces between html
and head
has been removed
Neither MathML nor SVG have been fully tested.
Years ago, I was using simple_html_dom to read in an HTML file and convert it to raw text for indexing in Apache Solr.
I'm in the process of converting those code to your library - is there a similar mechanism? If not, how do you recommend adding this functionality? Do I add a class that implements the RulesInterface?
The parser fails when it encounters a tag name with strange capitalisation (e.g. <Title>
, <titlE>
, etc). For example, this script
<?php
require_once __DIR__ . "/vendor/autoload.php";
$html = <<< 'HERE'
<!doctype html>
<html>
<head>
<Title>Hello, world!</Title>
</head>
<body></body>
</html>
HERE;
$parser = new Masterminds\HTML5;
$dom = $parser->loadHTML( $html );
echo "== HTML5 rendering ==\n";
echo $parser->saveHTML( $dom );
echo "== XPath queries ==\n";
$xpath = new DOMXPath( $dom );
$xpath->registerNamespace( "x", "http://www.w3.org/1999/xhtml" );
echo "=== Value of <title> ===\n";
echo $xpath->query( "//x:title" )->item( 0 )->nodeValue;
outputs:
== HTML5 rendering ==
<!DOCTYPE html>
<html><head>
<title>Hello, world!</Title>
</head>
<body></body>
</html></title></head></html>
== XPath queries ==
=== Value of <title> ===
Hello, world!</Title>
</head>
<body></body>
The HTML supplied is valid.
I'm trying to use your library to validate a HTML5 string. DOMDocument::validate()
is the method I would be using.
$parser = new \HTML5;
$dom = $parser->loadHTML("<html><head><title>Herro</title></head><body></body></html>");
var_dump($dom->validate());
I get the following error:
Warning: DOMDocument::validate(): No declaration for element html
I presume this is something to do with requiring a dtd schema, although I presumed that (as your library is specific to HTML5), this would be handled. Can you tell me if it's possible to use your library to achieve what I require and if so, how? Thanks.
Hello,
After install, running an extreme test with an empty input string leads to the following error:
Notice: Trying to get property of non-object in [...]\vendor\masterminds\html5\src\HTML5\Serializer\Traverser.php on line 96
Call Stack:
0.0020 127680 1. {main}() [...]\test.php:0
0.0550 873336 2. Masterminds\HTML5->saveHTML() [...]\test.php:15
0.0550 874376 3. Masterminds\HTML5->save() [...]\vendor\masterminds\html5\src\HTML5.php:238
0.0620 965688 4. Masterminds\HTML5\Serializer\Traverser->walk() [...]\vendor\masterminds\html5\src\HTML5.php:215
0.0620 965736 5. Masterminds\HTML5\Serializer\OutputRules->document() [...]\vendor\masterminds\html5\src\HTML5\Serializer\Traverser.ph
p:68
0.0620 966000 6. Masterminds\HTML5\Serializer\Traverser->node() [...]\vendor\masterminds\html5\src\HTML5\Serializer\OutputRules.php:11
8
<!-- Skipped --><!DOCTYPE html>
Here's the script executed with a PHP 5.5 CLI interpreter:
<?php
// Assuming you installed from Composer:
require __DIR__ . "/vendor/autoload.php";
use Masterminds\HTML5;
// An example HTML document:
$html = '';
// Parse the document. $dom is a DOMDocument.
$html5 = new HTML5();
$dom = $html5->loadHTML($html);
// Render it as HTML5:
echo $html5->saveHTML($dom);
To me, the library shall throw an exception when it is not able to deal with the input data. What if I push an object, integer, resource in loadHTML? What will happen?
Thanks for your help!
Vincent
I have the following style tag in a page that I'm parsing w/ html5-php: http://www.diffchecker.com/k3opxnf5
As you can see, the CSS is getting broken by the parser because it's encoding characters in the CSS into HTML entities (">" for example).
Any idea how I can work around this?
Thanks.
For reference: http://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#the-doctype
The doctype declaration should be case insensitive. But, the parser is currently case sensitive to uppercase.
Given:
Test
EOD;
$html5 = \HTML5::loadHTML($string);
print \HTML5::saveHTML($html5->getElementById('test'));
?>
Then:
$html5->getElementById('test') is empty.
(Do I use it wrong?)
$html5 = \HTML5::loadHTML($string);
$newelem = new \DOMText('Test2');
$oldnode = $html5->getElementById('test');
$newnode = $oldnode->cloneNode()->appendChild($newelem); // <<<<
$parent = $oldnode->parentNode;
$parent->replaceChild($newnode, $oldnode);
print \HTML5::saveHTML($newnode);
Error:
Warning: Couldn't fetch DOMText. Node no longer exists in C:\...\HTML5\Serializer\Traverser.php on line 93
Notice: Undefined property: DOMText::$nodeType in C:\...\HTML5\Serializer\Traverser.php on line 93
Hi,
I noticed some issues with pages that contain wrong tag names. I really don't know how to deal with the issue so maybe you find the solution. Below is the list of the pages with names of tag that are invalid. Exception DOMException#5: Invalid Character Error
is always thrown at DOMTreeBuilder.php:227
by method DOMDocument::createElement
. Every solution, except throwing the exception, is fine for me :) I can make the PR if you tell me what is proper fix for you.
a href="http:
what is weird because it's valid form in HTMLid="top_featured"
color="white"
class='neaktivni_stranka'
src=<a
bgcolor="white"
class="nom"
, here is also tag <p class="nom">
that is valid but also invalid one <p class="f-right"><class="nom">
br...<a
span<
noscript<img
br<br
p<
wordpress<
center<a
li"
p"
a�href="http:
b
static*all
h*0720
According to the spec, li
can contain any "flow content", i.e. practically anything. Why is it not categorized as a BLOCK_TAG
?
Hi
I have the following custom HTML5 file that I want to load and parse in PHP.
<html xmlns:tpl="http://cphwebsolutions.dk/2013/tpl">
<head>
<title tpl:replace="page.title">Welcome</title>
<link href="css/styles.css" type="text/css" rel="stylesheet"/>
</head>
<body>
<h1 id="test" tpl:replace="page.headline">Welcome!</h1>
<p tpl:replace="page.content"></p>
<audio><source src="test.txt"/></audio>
</body>
</html>
I want to be able to use XPath to search for different attributes in the custom HTML5 file e.g. $xpath->query("//*[@tpl:replace]");
But it will not find anything. When I used the loadHTML from the DOMDocument in PHP then it will find all three places where tpl:replace are present in the custom HTML5 document.
I can search for id attribute for example $xpath->query("//*[@id]"); and it works correctly. So my only problem is when I use the custom XML namespace in my HTML5 templates.
Br.
Rune Christensen
<?php
// Assuming you installed from Composer:
require "../vendor/autoload.php";
// An example HTML document:
$html = <<< 'HERE'
<!DOCTYPE html>
<html xmlns:tpl="http://cphwebsolutions.dk/2013/tpl">
<head>
<title tpl:replace="page.title">Welcome</title>
<link href="css/styles.css" type="text/css" rel="stylesheet"/>
</head>
<body>
<h1 id="test" tpl:replace="page.headline">Welcome!</h1>
<p tpl:replace="page.content"></p>
<audio><source src="test.txt"/></audio>
</body>
</html>
HERE;
// Parse the document. $dom is a DOMDocument.
$dom = HTML5::loadHTML($html);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace("tpl", "http://cphwebsolutions.dk/2013/tpl");
// example 1: for everything with an id
//$elements = $xpath->query("//*[@id]");
// example 2: for node data in a selected id
//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
// example 3: same as above with wildcard
$elements = $xpath->query("//*[@tpl:replace]");
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<br/>[". $element->nodeName. "] ";
echo "<br/> ". $element->attributes->length. "\n";
for ($i=0; $i<$element->attributes->length;$i++) {
echo " ".$element->attributes->item($i)->nodeName."<br/>\n";
}
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeName. "\n";
}
}
}
echo "<br/><br/>\n";
// Render it as HTML5:
print HTML5::saveHTML($dom);
// Or save it to a file:
HTML5::save($dom, 'out.html');
_PHPPowertools/DOM-Query_ is the first component of the _PHPPowertools_ framework that has been released to the public. It's purpose is similar to that of _technosophos/querypath_ but it's implementation is far more true to both jQuery's syntax and its semantics. For example, _PHPPowertools/DOM-Query_ lets you do stuff like this :
// Add a span tag with classes 'icon' and 'icon-printer' to all buttons
$H->select('body')->select('button')->add('span')->addClass('icon icon-printer');
// Use a lambda function to set the data-val attribute of all gallery images
$H->select('.gallery li img')->attr('data-val', function( $i, $val) {
return $i . " - " . $val->attr('class') . " - photo by Kelly Clark";
});
What's lacking so far, is proper support for HTML5. I've been considering using _Masterminds/html5-php_ to do the DOM parsing.
The most elegant way to implement the feature, would be by adding a target
option to the supported options for \Masterminds\HTML5\Parser\DOMTreeBuilder::__construct
with support for following datatypes :
\DOMDocument
or subclasses of \DomDocument
\DOMImplementation
or subclasses of \DOMImplementation
I would like to use this feature as follows :
namespace PowerTools;
use \Symfony\Component\CssSelector\CssSelector as CssSelector;
use \Masterminds\HTML5 as HTML5;
class DOM_Document extends \DOMDocument {
protected $_isHTML = false;
public function __construct($data = false, $version = null, $encoding = null) {
parent::__construct($version, $encoding);
$data = trim($data);
if ($data && $data != '') {
if ($this->_isHTML) {
$html5 = new HTML5();
@$html5->loadHTML($data, array('target' => $this));
} else {
@$this->loadXML($data);
}
}
}
[ ... ]
}
I've tried adding a simple if(){}else{}
statement to \Masterminds\HTML5\Parser\DOMTreeBuilder::__construct
to replace $this->doc
with $options['target']
if a value for $options['target']
has been set, but that doesn't seem to do it.
As an alternative, I've also considered reïmplementing \PowerTools\DOM_Document
as a subclass of \DOMImplementation
, but this is a far less elegant approach that introduces too many new issues to go any further in that area.
Any feedback would be appreciated!
See also PHPPowertools/DOM-Query#1
Hi
I tried the example from the README file and the result was that the line break after the tag was remove:
Input:
<head>
<title>TEST</title>
</head>
<body id='foo'>
<h1>Hello World</h1>
<p>This is a test of the HTML5 parser.</p>
</body>
</html>
Output:
<html><head>
<title>TEST</title>
</head>
<body id="foo">
<h1>Hello World</h1>
<p>This is a test of the HTML5 parser.</p>
</body>
</html>
It looks like it was the only line where input and output were different.
Br.
Rune Christensen
We need documentation.
In TEXT_RCDATA fields like <title>
it is not possible to use processing instructions.
Could this be the sole exception for RCDATA fields or is this against the spec?
In a few places there is a debug mode that prints to standard out. We can expose this in the Html5 and use the PSR logger interface (still printing to standard out by default).
Hi,
when I'm trying to parse URL http://e107.funsite.cz/ I get DOMException("Invalid Character Error", 5)
because of one unclosed tag in the markup. The snippet below causes the exception. It is caused by trying to set attribute with name <div
in DOMTreeBuilder.php. As I understand from the doc all errors should be recorded in property $dom->errors
. Can you fix this please?
<div class="wrapper"
<div class="fleft">
Hi
I was trying to use the function quotedString from Tokenizer.php but it failed and I changed line 717 from:
if ($tok == '"' || "'") {
to:
if ($tok == '"' || $tok == "'") {
And now it works correctly in my PHP script.
Br.
Rune
Hi
Was trying to use the processor instruction functionality but got a fatal error
Fatal error: Call to a member function process() on a non-object in /home/www/cphwebsolutions.dk/cms/vendor/HTML5/Parser/DOMTreeBuilder.php on line 364
It looks like the error is placed in line 364:
$res = $processor->process($this->current, $name, $data);
I think that it should be changed to
$res = $this->processor->process($this->current, $name, $data);
Br.
Rune
Hello,
Sometimes i get strange error:
PHP Warning: DOMElement::setIdAttribute(): ID loading already defined in /home/xxx/domains/xxx/public_html/xxx/core/framework/Masterminds/HTML5/Parser/DOMTreeBuilder.php on line 392
PHP Stack trace:
PHP 1. {main}() /home/xxx/domains/xxx/public_html/index.php:0
PHP 2. Core\xxx->loadPage() /home/xxx/domains/xxx/public_html/index.php:147
PHP 3. Modules\xxx\Main->load() /home/xxx/domains/xxx/public_html/xxx/core/xxx.php:619
PHP 4. Modules\xxx\Main->_startEngine() /home/xxx/domains/xxx/public_html/xxx/modules/xxx/main.php:301
PHP 5. Modules\xxx\Main->_runCrawler() /home/xxx/domains/xxx/public_html/xxx/modules/xxx/main.php:121
PHP 6. Core\Classes\Search\Crawler->visitPage() /home/xxx/domains/xxx/public_html/xxx/modules/xxx/main.php:154
PHP 7. Masterminds\HTML5->loadHTML() /home/xxx/domains/xxx/public_html/xxx/core/classes/search/crawler.php:28
PHP 8. Masterminds\HTML5->parse() /home/xxx/domains/xxx/public_html/xxx/core/framework/Masterminds/HTML5.php:94
PHP 9. Masterminds\HTML5\Parser\Tokenizer->parse() /home/xxx/domains/xxx/public_html/xxx/core/framework/Masterminds/HTML5.php:165
PHP 10. Masterminds\HTML5\Parser\Tokenizer->consumeData() /home/xxx/domains/xxx/public_html/xxx/core/framework/Masterminds/HTML5/Parser/Tokenizer.php:83
PHP 11. Masterminds\HTML5\Parser\Tokenizer->tagOpen() /home/xxx/domains/xxx/public_html/xxx/core/framework/Masterminds/HTML5/Parser/Tokenizer.php:126
PHP 12. Masterminds\HTML5\Parser\Tokenizer->tagName() /home/xxx/domains/xxx/public_html/xxx/core/framework/Masterminds/HTML5/Parser/Tokenizer.php:269
PHP 13. Masterminds\HTML5\Parser\DOMTreeBuilder->startTag() /home/xxx/domains/xxx/public_html/xxx/core/framework/Masterminds/HTML5/Parser/Tokenizer.php:371
PHP 14. DOMElement->setIdAttribute() /home/xxx/domains/xxx/public_html/xxx/core/framework/Masterminds/HTML5/Parser/DOMTreeBuilder.php:392
PHP Warning: DOMElement::setIdAttribute(): ID placeholder already defined in /home/xxx/domains/xxx/public_html/xxx/core/framework/Masterminds/HTML5/Parser/DOMTreeBuilder.php on line 392
I don't know what the input is, but i hope you can help me?
Create a CREDITS file and add entry for #9 .
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.