Git Product home page Git Product logo

anglesharp.xpath's Introduction

logo

AngleSharp

CI GitHub Tag NuGet Count Issues Open Gitter Chat StackOverflow Questions CLA Assistant

AngleSharp is a .NET library that gives you the ability to parse angle bracket based hyper-texts like HTML, SVG, and MathML. XML without validation is also supported by the library. An important aspect of AngleSharp is that CSS can also be parsed. The included parser is built upon the official W3C specification. This produces a perfectly portable HTML5 DOM representation of the given source code and ensures compatibility with results in evergreen browsers. Also standard DOM features such as querySelector or querySelectorAll work for tree traversal.

⚡⚡ Migrating from AngleSharp 0.9 to AngleSharp 0.10 or later (incl. 1.0)? Look at our migration documentation. ⚡⚡

Key Features

  • Portable (using .NET Standard 2.0)
  • Standards conform (works exactly as evergreen browsers)
  • Great performance (outperforms similar parsers in most scenarios)
  • Extensible (extend with your own services)
  • Useful abstractions (type helpers, jQuery like construction)
  • Fully functional DOM (all the lists, iterators, and events you know)
  • Form submission (easily log in everywhere)
  • Navigation (a BrowsingContext is like a browser tab - control it from .NET!).
  • LINQ enhanced (use LINQ with DOM elements, naturally without wrappers)

The advantage over similar libraries like HtmlAgilityPack is that the exposed DOM is using the official W3C specified API, i.e., that even things like querySelectorAll are available in AngleSharp. Also the parser uses the HTML 5.1 specification, which defines error handling and element correction. The AngleSharp library focuses on standards compliance, interactivity, and extensibility. It is therefore giving web developers working with C# all possibilities as they know from using the DOM in any modern browser.

The performance of AngleSharp is quite close to the performance of browsers. Even very large pages can be processed within milliseconds. AngleSharp tries to minimize memory allocations and reuses elements internally to avoid unnecessary object creation.

Simple Demo

The simple example will use the website of Wikipedia for data retrieval.

var config = Configuration.Default.WithDefaultLoader();
var address = "https://en.wikipedia.org/wiki/List_of_The_Big_Bang_Theory_episodes";
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(address);
var cellSelector = "tr.vevent td:nth-child(3)";
var cells = document.QuerySelectorAll(cellSelector);
var titles = cells.Select(m => m.TextContent);

Or the same with explicit types:

IConfiguration config = Configuration.Default.WithDefaultLoader();
string address = "https://en.wikipedia.org/wiki/List_of_The_Big_Bang_Theory_episodes";
IBrowsingContext context = BrowsingContext.New(config);
IDocument document = await context.OpenAsync(address);
string cellSelector = "tr.vevent td:nth-child(3)";
IHtmlCollection<IElement> cells = document.QuerySelectorAll(cellSelector);
IEnumerable<string> titles = cells.Select(m => m.TextContent);

In the example we see:

  • How to setup the configuration for supporting document loading
  • Asynchronously get the document in a new context using the configuration
  • Performing a query to get all cells with the content of interest
  • The whole DOM supports LINQ queries

Every collection in AngleSharp supports LINQ statements. AngleSharp also provides many useful extension methods for element collections that cannot be found in the official DOM.

Supported Platforms

AngleSharp has been created as a .NET Standard 2.0 compatible library. This includes, but is not limited to:

  • .NET Core (2.0 and later)
  • .NET Framework (4.6.2 and later)
  • Xamarin.Android (7.0 and 8.0)
  • Xamarin.iOS (10.0 and 10.14)
  • Xamarin.Mac (3.0 and 3.8)
  • Mono (4.6 and 5.4)
  • UWP (10.0 and 10.0.16299)
  • Unity (2018.1)

Documentation

The documentation of AngleSharp is located in the docs folder. More examples, best-practices, and general information can be found there. The documentation also contains a list of frequently asked questions.

More information is also available by following some of the hyper references mentioned in the Wiki. In-depth articles will be published on the CodeProject, with links being placed in the Wiki at GitHub.

Use-Cases

  • Parsing HTML (incl. fragments)
  • Parsing CSS (incl. selectors, declarations, ...)
  • Constructing HTML (e.g., view-engine)
  • Minifying CSS, HTML, ...
  • Querying document elements
  • Crawling information
  • Gathering statistics
  • Web automation
  • Tools with HTML / CSS / ... support
  • Connection to page analytics
  • HTML / DOM unit tests
  • Automated JavaScript interaction
  • Testing other concepts, e.g., script engines
  • ...

Vision

The project aims to bring a solid implementation of the W3C DOM for HTML, SVG, MathML, and CSS to the CLR - all written in C#. The idea is that you can basically do everything with the DOM in C# that you can do in JavaScript (plus, of course, more).

Most parts of the DOM are included, even though some may still miss their (fully specified / correct) implementation. The goal for v1.0 is to have all practically relevant parts implemented according to the official W3C specification (with useful extensions by the WHATWG).

The API is close to the DOM4 specification, however, the naming has been adjusted to apply with .NET conventions. Nevertheless, to make AngleSharp really useful for, e.g., a JavaScript engine, attributes have been placed on the corresponding interfaces (and methods, properties, ...) to indicate the status of the field in the official specification. This allows automatic generation of DOM objects with the official API.

This is a long-term project which will eventually result in a state of the art parser for the most important angle bracket based hyper-texts.

Our hope is to build a community around web parsing and libraries from this project. So far we had great contributions, but that goal was not fully achieved. Want to help? Get in touch with us!

Participating in the Project

If you know some feature that AngleSharp is currently missing, and you are willing to implement the feature, then your contribution is more than welcome! Also if you have a really cool idea - do not be shy, we'd like to hear it.

If you have an idea how to improve the API (or what is missing) then posts / messages are also welcome. For instance there have been ongoing discussions about some styles that have been used by AngleSharp (e.g., HTMLDocument or HtmlDocument) in the past. In the end AngleSharp stopped using HTMLDocument (at least visible outside of the library). Now AngleSharp uses names like IDocument, IHtmlElement and so on. This change would not have been possible without such fruitful discussions.

The project is always searching for additional contributors. Even if you do not have any code to contribute, but rather an idea for improvement, a bug report or a mistake in the documentation. These are the contributions that keep this project active.

Live discussions can take place in our Gitter chat, which supports using GitHub accounts.

More information is found in the contribution guidelines. All contributors can be found in the CONTRIBUTORS file.

This project has also adopted the code of conduct defined by the Contributor Covenant to clarify expected behavior in our community.

For more information see the .NET Foundation Code of Conduct.

Funding / Support

If you use AngleSharp frequently, but you do not have the time to support the project by active participation you may still be interested to ensure that the AngleSharp projects keeps the lights on.

Therefore we created a backing model via Bountysource. Any donation is welcome and much appreciated. We will mostly spend the money on dedicated development time to improve AngleSharp where it needs to be improved, plus invest in the web utility eco-system in .NET (e.g., in JavaScript engines, other parsers, or a renderer for AngleSharp to mention some outstanding projects).

Visit Bountysource for more details.

Development

AngleSharp is written in the most recent version of C# and thus requires Roslyn as a compiler. Using an IDE like Visual Studio 2019+ is recommended on Windows. Alternatively, VSCode (with OmniSharp or another suitable Language Server Protocol implementation) should be the tool of choice on other platforms.

The code tries to be as clean as possible. Notably the following rules are used:

  • Use braces for any conditional / loop body
  • Use the -Async suffixed methods when available
  • Use VIP ("Var If Possible") style (in C++ called AAA: Almost Always Auto) to place types on the right

More important, however, is the proper usage of tests. Any new feature should come with a set of tests to cover the functionality and prevent regression.

Changelog

A very detailed changelog exists. If you are just interested in major releases then have a look at the GitHub releases.

.NET Foundation

This project is supported by the .NET Foundation.

License

AngleSharp is released using the MIT license. For more information see the license file.

anglesharp.xpath's People

Contributors

denis-ivanov avatar florianrappl avatar rocik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

anglesharp.xpath's Issues

MoveToParent doesn't take attribute navigation into account

It looks as if MoveToParent doesn't take any attribute navigation in the navigator into account as in that case the "move" should go back to the element parent of the attribute, i.e. in this case of having the attributes in a list and not exposed as nodes the method should check whether a move was done (which I think can be checked by comparing _attrIndex > - 1) and then to only reset that _attrIndex = -1 and return true but the _currentNode should not be set to the Parent.

Based on that I think the method should be

public override bool MoveToParent()
{
            if (HasAttributes && _attrIndex > -1)
            {
                _attrIndex = -1;
                return true;
            }

	if (_currentNode.Parent == null)
	{
		return false;
	}

	_currentNode = _currentNode.Parent;
	return true;
}

and a test case would be

        [Test]
        public void TestAttributeNavigation()
        {
            var xml = @"<root att1='value 1' att2='value 2'><child>foo</child></root>";

            var parser = new XmlParser();

            var doc = parser.ParseDocument(xml);

            var nav = doc.CreateNavigator(false);
            
            Assert.AreEqual(nav.Name, "root");
            if (nav.MoveToFirstAttribute())
            {
                do
                {
                    Assert.AreEqual(nav.NodeType, XPathNodeType.Attribute);
                }
                while (nav.MoveToNextAttribute());
                nav.MoveToParent();
            }
            Assert.AreEqual(nav.Name, "root");
            
        }

v2.0.0 regression with XPaths ending on `/@attr`

Hello.

Consider this example HTML that I've extracted from one of the websites I'm using:

<html>
<body>
	<table class= "accountTable" >
			<thead><tr>
				<th>  </th>
				<th> DEVICE NAME </th>
				<th> Sharing Status </th>
				<th> LIBRARY LAST ACCESSED </th>
				<th> BY STEAM USER </th>
			</tr>
			</thead>
			<tbody>

			
					<tr data-panel="{&quot;maintainY&quot;:true,&quot;flow-children&quot;:&quot;row&quot;}">
						<td> 1 </td>
						<td> redacted </td>
						<td>
															Authorized (
								<a href="javascript:DeauthorizeDevice( 'redacted', 'redacted' );">Revoke</a> )
													</td>
						<td> redacted </td>
						<td> <a href="https://steamcommunity.com/id/redacted/" data-miniprofile="redacted-test">redacted</a> </td>
					</tr>
							</tbody>
		</table>


		<br><br>
		You can allow up to 5 accounts to use your game library on any of your authorized computers.		<br><br>

		<table class= "accountTable" >
			<thead><tr>
				<th>  </th>
				<th> User Name </th>
				<th> Sharing Status </th>
				<th> Time added </th>
			</tr>
			</thead>

			<tbody>
			
					<tr data-panel="{&quot;maintainY&quot;:true,&quot;flow-children&quot;:&quot;row&quot;}">
						<td> 1 </td>
						<td> <a href="https://steamcommunity.com/id/redacted/" data-miniprofile="redacted-test">redacted</a> </td>
						<td>
													Authorized (
							<a href="javascript:DeauthorizeBorrower( 'redacted' );">Revoke</a> )
												</td>
						<td> redacted </td>
					</tr>
							</tbody>
		</table>
</body>
</html>

Until now with AngleSharp.XPath version 1.1.7, I've been using the following xpath expression which worked fine:

(//table[@class='accountTable'])[2]//a/@data-miniprofile

The objective of the above is to extract all a links containing data-miniprofile attribute, but only from second accountTable.

Unfortunately, the same xpath in version 2.0.0 doesn't return any rows. It worked fine as of version 1.1.7.

obraz

I consider this being a regression. If this is planned, may I know what is the recommended way to update in regards to match previous use case?

Thank you in advance for your time.

BUG: SelectSingleNode with search by attribute predicate only works if searched attribute is the first

Searching for an element by attribute name only works (apparently) if such attribute is the first.
A unit test is worth a thousand words:

    [Test]
    public void SelectSingleNodeTest_AttributesOrder()
    {
        // Arrange
        const string html =
        @"<body>
			<div id='div1'>First</div>
			<div id='div2' class='mydiv'>Second</div>
			<div class='mydiv' id='div3'>Third</div>
		</body>";
        var parser = new HtmlParser();
        var document = parser.ParseDocument(html);

        // Act
        var div1 = document.DocumentElement.SelectSingleNode("//div[@id='div1']");
        var div2 = document.DocumentElement.SelectSingleNode("//div[@id='div2']");
        var div3 = document.DocumentElement.SelectSingleNode("//div[@id='div3']");

        // Assert
        Assert.That(div1, Is.Not.Null);
        Assert.That(div2, Is.Not.Null);
        Assert.That(div3, Is.Not.Null); // currently fails
    }

In the unit test div2 and div3 sport an additional class attribute. When the class attribute is specified before the id attribute, the search fails.

How to locate child node based on node?

If I use a Selector query then it returns IElement and I can continue to query for child elements relative to the result, but using Body.SelectNodes("xpath") returns the INode type, which INode cannot do.

var recordNodes = document.Body.SelectNodes(TableRecordNodesXpath);

foreach (INode record in recordNodes)
{
    // record.SelectSingleNode("//relative to record"); // expected.
}

Problem with namespaces

Hi,

I have this XML (XHTML) "<div class="hidden"><ac:structured-macro ac:name="section"> <ac:parameter ac:name="border">true</ac:parameter> ac:rich-text-body <ac:structured-macro ac:name="column"> <ac:parameter ac:name="width">100px</ac:parameter> ac:rich-text-body

This is the content of column 1.

</ac:rich-text-body> </ac:structured-macro> <ac:structured-macro ac:name="column"> ac:rich-text-body

This is the content of column 2.

</ac:rich-text-body> </ac:structured-macro> </ac:rich-text-body></ac:structured-macro>".

I have a problem selecting all nodes "ac:structured-macro" in the document.

My code:

   ' var contextA = BrowsingContext.New(Configuration.Default.WithXml().WithXPath());

        var parser = new XmlParser(new XmlParserOptions
        {
            IsSuppressingErrors = true
        }, contextA);
        var xmlDoc = parser.ParseDocument(xml);
        XmlReaderSettings settings = new XmlReaderSettings
        {
            NameTable = new NameTable()
        };

        XmlNamespaceManager xmlns = new XmlNamespaceManager(settings.NameTable);
        xmlns.AddNamespace("ac", "http://atlassian.com/content");'

I have tried the following:

xmlDoc.DocumentElement.SelectNodes("//ac|structured-macro") - returns only first element
xmlDoc.DocumentElement.SelectNodes(".//ac|structured-macro") - returns only first element
xmlDoc.DocumentElement.SelectNodes(".//ac|structured-macro", xmlns, false) - returns only first element

xmlDoc.DocumentElement.SelectNodes(".//ac:structured-macro", xmlns, false) - returns zero elements

xmlDoc.DocumentElement.SelectNodes("//structured-macro") - this only returns 3 elements

Any suggestion on how to work with namespaces and get all nodes?

Thanks.

Josef

SelectSingleNode() doesn't select <tr>

In the provided code snippet, the SelectSingleNode() method is expected to return an instance of INode for the selected <tr> element within the HTML content. However, the method is returning null, which indicates a potential bug in the library.

Repro

var context = BrowsingContext.New(Configuration.Default);
var parser = context.GetService<IHtmlParser>();
var document = await parser.ParseDocumentAsync("<table><tr></tr></table>");

var trNode = document.Body.SelectSingleNode("//table/tr");

Actual Result

trNode is null

Expected Result

trNode is not null

Environment

.csproj

<PropertyGroup>
  <TargetFramework>net8.0</TargetFramework>
</PropertyGroup>

<ItemGroup>
  <PackageReference Include="AngleSharp" Version="1.1.2" />
  <PackageReference Include="AngleSharp.XPath" Version="2.0.4" />
</ItemGroup>

Interested in placing this repo in the AngleSharp organization?

Hi Denis - great project. Since AngleSharp v0.10 is around the corner I would love to keep it compatible. Also, this project fills a gap in AngleSharp so I was thinking about "making it official", i.e., mentioning it in the docs, on the webpage etc.

If you are interested we can discuss details. Long story short would be that you can keep publishing it under your NuGet / name, but the repo would be at the AngleSharp organization (with you having all admin rights for the repo). We could give it some more infrastructure (e.g., fill the readme, provide some CLA, refer to the core, ...).

What are your thoughts on the topic?

Is the latest release 1.1.7 https://www.nuget.org/packages/AngleSharp.XPath/1.1.7 from NuGet not tagged in this GitHub repository?

On NuGet, it seems the latest release of AngleSharp.XPath is 1.1.7, see https://www.nuget.org/packages/AngleSharp.XPath/1.1.7. However, when I look through this repository for tags https://github.com/AngleSharp/AngleSharp.XPath/tags, the latest I find is 1.1.5.

I use 1.1.7 in a .NET application and get a System.NullReferenceException on doing something like angleSharpDoc.CreateNavigator(false).SelectSingleNode("//p").Prefix or angleSharpDoc.CreateNavigator(true).SelectSingleNode("//p").Prefix.

Is there any place to see the sources for 1.1.7, to find out why that exception occurs?

XPath throws exceptions with namespaced xml tags

There seems to be a bug (or lack of compatibility) with namespaced tags like <xhtml:link />. If you search for that tag with XPath (like: //xhtml:link) an exception is thrown.
The xhtml namespace is defined in the xml tag, thus it should be detected. I see no API to add the namespace manually.

Exception:
Unhandled exception. System.Xml.XPath.XPathException: Namespace Manager or XsltContext needed. This query has a prefix, variable, or user-defined function.
at MS.Internal.Xml.XPath.CompiledXpathExpr.get_QueryTree()
at System.Xml.XPath.XPathNavigator.Evaluate(XPathExpression expr, XPathNodeIterator context)
at System.Xml.XPath.XPathNavigator.Evaluate(XPathExpression expr)
at System.Xml.XPath.XPathNavigator.Select(XPathExpression expr)
at AngleSharp.XPath.Extensions.SelectNodes(IElement element, String xpath)
at Program.Main()

I've made a small example app on Fiddle which you can find here:
https://dotnetfiddle.net/9hDhwH

Best regards,
Stefan

Xpath query support for Nodes?

Hi, I migrated from HtmlAgilityPack to your library and want to run different XPath queries based on some data like I did with the old library. The issues is we need to run different queries on children nodes and therefore cannot combine all in one big XPath

           var parser = new HtmlParser();
           var doc = parser.ParseDocument(parserPreviewRequest.Body);

           var htmlBody = doc?.Body;

           INode headerContentBody = htmlBody?SelectSingleNode("some XPath");
           // some logic
           var test1 = headerContentBody?.SelectSingleNode("another xpath");

The last line of code is not correct since Node dose not contain definition for such method.
How can I achieve this? I cannot find any ability to run XPath queries on Nodes which is very strange.

Thanks!

get attribute value

How to get the href attribute value of an element or other attribute value through anglesharp xpath?

Regression: `2.0` now returns `Attr` nodes instead of their parents on xpath ending with `/@attr`

Hey.

I've already reported this issue as #36 and thought it's fixed with 2.0.1, but unfortunately, I was testing against yet another problem, which was #37. The original #36 issue is not fixed and is still reproducible with version 2.0.1.

Original bug report below:


Consider this example HTML that I've extracted from one of the websites I'm using:

<html>
<body>
	<table class= "accountTable" >
			<thead><tr>
				<th>  </th>
				<th> DEVICE NAME </th>
				<th> Sharing Status </th>
				<th> LIBRARY LAST ACCESSED </th>
				<th> BY STEAM USER </th>
			</tr>
			</thead>
			<tbody>

			
					<tr data-panel="{&quot;maintainY&quot;:true,&quot;flow-children&quot;:&quot;row&quot;}">
						<td> 1 </td>
						<td> redacted </td>
						<td>
															Authorized (
								<a href="javascript:DeauthorizeDevice( 'redacted', 'redacted' );">Revoke</a> )
													</td>
						<td> redacted </td>
						<td> <a href="https://steamcommunity.com/id/redacted/" data-miniprofile="redacted-test">redacted</a> </td>
					</tr>
							</tbody>
		</table>


		<br><br>
		You can allow up to 5 accounts to use your game library on any of your authorized computers.		<br><br>

		<table class= "accountTable" >
			<thead><tr>
				<th>  </th>
				<th> User Name </th>
				<th> Sharing Status </th>
				<th> Time added </th>
			</tr>
			</thead>

			<tbody>
			
					<tr data-panel="{&quot;maintainY&quot;:true,&quot;flow-children&quot;:&quot;row&quot;}">
						<td> 1 </td>
						<td> <a href="https://steamcommunity.com/id/redacted/" data-miniprofile="redacted-test">redacted</a> </td>
						<td>
													Authorized (
							<a href="javascript:DeauthorizeBorrower( 'redacted' );">Revoke</a> )
												</td>
						<td> redacted </td>
					</tr>
							</tbody>
		</table>
</body>
</html>

Until now with AngleSharp.XPath version 1.1.7, I've been using the following xpath expression which worked fine:

(//table[@class='accountTable'])[2]//a/@data-miniprofile

The objective of the above is to extract all a links containing data-miniprofile attribute, but only from second accountTable.

Unfortunately, the same xpath in version 2.0.0 or 2.0.1 doesn't return any rows. It worked fine as of version 1.1.7.

obraz

I consider this being a regression. If this is planned, may I know what is the recommended way to update in regards to match previous use case?

Thank you in advance for your time.


/cc @denis-ivanov

Tag <br/> is ignored

<p
                    style="
                  margin-right: 0in;
                  margin-left: 0in;
                  font-size: 15px;
                  font-family: 'Arial', sans-serif;
                  line-height: 115%;
                ">
                    <h1>Heading 1</h1><br/><p>This is test line 1</p><p>This is test line 2</p><p>Body accepts HTML</p>
                </p>

In browser this looks like:

Heading 1


This is test line 1

This is test line 2

Body accepts HTML

when I try to get text content from it like this

        ```
        var parser = new HtmlParser();
        var doc = parser.ParseDocument(data.Body);

        result = doc.Body?.TextContent;


I see in the output this: "Heading 1This is test line 1This is test line 2Body accepts HTML"

It ignores all new lines. Is there any other approach to get the text like it is parsed in browser? 


Wrong code in Prefix getter

The line https://github.com/AngleSharp/AngleSharp.XPath/blob/master/src/AngleSharp.XPath/HtmlDocumentNavigator.cs#L80
has the code

        public override string Prefix =>
            _attrIndex != 1
                ? NameTable.GetOrAdd(CurrentElement.Attributes[_attrIndex].Prefix ?? string.Empty)
                : NameTable.GetOrAdd(CurrentElement?.Prefix ?? string.Empty);

The other getters compare _attrIndex != -1, I think the getter above for Prefix also needs the compare to -1, otherwise the attempt to look up a prefix throws a NullReferenceException

System.NullReferenceException
  HResult=0x80004003
  Nachricht = Object reference not set to an instance of an object.
  Quelle = AngleSharp.XPath
  Stapelüberwachung:
   at AngleSharp.XPath.HtmlDocumentNavigator.get_Prefix()

Change Name getter in HtmlDocumentNavigator to return lower-case variant of name for (X)HTML element nodes

Given that https://github.com/AngleSharp/AngleSharp.XPath/blob/master/src/AngleSharp.XPath/HtmlDocumentNavigator.cs#L57 simply does

        public override string Name =>
            _attrIndex != -1
                ? NameTable.GetOrAdd(CurrentElement.Attributes[_attrIndex].Name)
                : NameTable.GetOrAdd(_currentNode.NodeName);

and NodeName in the DOM for elements in the XHTML namespace https://dom.spec.whatwg.org/#element-html-uppercased-qualified-name uses the upper-cased name any XPath use on an HTML element (e.g. p element) will have name() give an upper-case name e.g. P while local-name() contradicts that as it gives a lower-case p.

That case difference in the DOM world is contrary to what you have in normal XPath as there local-name() and name() will always have the same case for the letters of the name.

Therefore I think, to have a consistent XPath(Navigator) API, the HtmlDocumentNavigator in its Name getter needs to ensure it lower-cases the names of HTML elements.

Will 2.0 only support .NET (Core) and not .NET framework?

@denis-ivanov , in dfd9d33#diff-615740254f47d7a152d8b39cff4d7b49d0aa3db2821f2f67ebe6f13ad0ab45beR10 it seems the target framework was changed from .NET Standard 2.0 to .NET Core 5. That way AngleSharp.XPath 2.0 would no longer be compatible and usable from any .NET framework app. Is that intended? Other AngleSharp packages seem to try to maintain .NET Standard 2.0 as the base to allow use from both .NET framework as well as .NET Core code.

HtmlDocumentNavigator is not correctly implemented.

While navigating a document, HtmlDocumentNavigator may clone itself hundreds of times (depend on the complexity of searching).
Every clone instantiates new NameTable which contain nothing. NameTable should be one for the document.
It must be something like that:

public override XmlNameTable NameTable => _document.NameTable;

public override XPathNavigator Clone()
{
    return new HtmlDocumentNavigator(this);
}

private HtmlDocumentNavigator(HtmlDocumentNavigator nav) {
    _document = nav._document
    _currentNode = nav._currentNode
    _ignoreNamespaces = nav._ignoreNamespaces
}
public static INode SelectSingleNode(this IElement element, string xpath, bool ignoreNamespaces = true)
{
    var nav = new HtmlDocumentNavigator(element.Owner, element, ignoreNamespaces);
    var manager = ignoreNamespaces ? null : new XmlNamespaceManager(nav.NameTable);
    var iterator = nav.SelectSingleNode(xpath, manager);
    return ((HtmlDocumentNavigator)iterator)?.CurrentNode;
}

Also, what's the point of using XmlNameTable extension method AddOrGet()? The Add() method does the same without double-checking.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.