Git Product home page Git Product logo

hazz's People

Contributors

atifaziz avatar johnrutherford avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hazz's Issues

Target .NET Standard

At the same time, the .NET Framework 3.5 target will be dropped otherwise Fizzler 1.2.0, which targets .NET Standard, can't be used as a dependency.

Class & "[att~=val]" selectors don't work when whitespace is not just spaces

What steps will reproduce the problem?

  1. Get HtmlDocument from http://shoryuken.com/forum/index.php?events/monthly
  2. Use document.CssSelect("td.primaryContent.weekends.nowWeek.nowToday")

What is the expected output? What do you see instead?

I expect one TD element to be returned. However, there are tabs, carriage returns, and linefeeds in the class attribute on the tag, and only the first class selector (td.primaryContent) works.

What version of the product are you using? On what operating system?

1.0.0.0 - Windows 7
Please provide any additional information below.


Originally reported on Google Code with ID 51

Reported by [email protected] on 2012-05-20 07:48:08

QuerySelector does not find class names with line breaks

I have to parse and query HTML that's formatted badly with line breaks in class attributes. This library does not appear to support those however:

string html = @"<html><body><div class=""class_1""><span class=""class_2
 class_3"">Text</span></body></html>";
HtmlDocument htmldom = new HtmlDocument();
htmldom.LoadHtml(html);

Console.WriteLine(JsonConvert.SerializeObject(htmldom.DocumentNode.QuerySelector(".class_1").FirstChild.GetClasses())); // Prints classes correctly
Console.WriteLine(htmldom.DocumentNode.QuerySelector(".class_1 > .class_2")); // Prints null

It works fine when the line break is removed.

The ":only-child" selector does not capture the HTML element

Code to reproduce (using version 1.2):

using var client = new HttpClient();
var doc = new HtmlDocument();
doc.LoadHtml(await client.GetStringAsync("https://www.example.com/"));
foreach (var e in doc.DocumentNode.QuerySelectorAll(":only-child"))
    Console.WriteLine(e.Name);

It finds only 2 elements:

div
a

whereas using document.querySelectorAll(':only-child') in Chrome finds html in addition to the above two.

"sourcelink test" fails for 2 documents

Running sourcelink test on Fizzler.Systems.HtmlAgilityPack.1.2.1-ci-20200407t1951.symbols.nupkg (also attached) for 2ca0a2c produces errors:

1 Documents without URLs:
8edef9391a47680caec71f4ded6c435fd60d04067808138edf6d8fb16a81b307 sha256 csharp C:\Users\appveyor\AppData\Local\Temp\1\.NETStandard,Version=v1.3.AssemblyAttributes.cs
1 Documents with errors:
b76e2bda9a5853a1c1e97a6badca21a35a6039bb21e06cf89156470b5a02ee1d sha256 csharp C:\projects\hazz\src\obj\Release\netstandard1.3\Fizzler.Systems.HtmlAgilityPack.AssemblyInfo.cs
https://raw.githubusercontent.com/atifaziz/Hazz/2ca0a2ceed2146d0049d81654cd66ea75da2e607/src/obj/Release/netstandard1.3/Fizzler.Systems.HtmlAgilityPack.AssemblyInfo.cs
error: url failed NotFound: Not Found
sourcelink test failed
failed for lib/netstandard1.3/Fizzler.Systems.HtmlAgilityPack.pdb
1 Documents without URLs:
a6e03ae4df13fe05345e9022d1f1cd24ecae4bfd66db4843697c855d9f9335f4 sha256 csharp C:\Users\appveyor\AppData\Local\Temp\1\.NETStandard,Version=v2.0.AssemblyAttributes.cs
1 Documents with errors:
b76e2bda9a5853a1c1e97a6badca21a35a6039bb21e06cf89156470b5a02ee1d sha256 csharp C:\projects\hazz\src\obj\Release\netstandard2.0\Fizzler.Systems.HtmlAgilityPack.AssemblyInfo.cs
https://raw.githubusercontent.com/atifaziz/Hazz/2ca0a2ceed2146d0049d81654cd66ea75da2e607/src/obj/Release/netstandard2.0/Fizzler.Systems.HtmlAgilityPack.AssemblyInfo.cs
error: url failed NotFound: Not Found
sourcelink test failed
failed for lib/netstandard2.0/Fizzler.Systems.HtmlAgilityPack.pdb
2 files did not pass in dist\Fizzler.Systems.HtmlAgilityPack.1.2.1-ci-20200407t1951.symbols.nupkg

nth-of-type

Hello there.

I am creating a WYSIWYG editor in Blazor and am using CSS nth-of-type to create a unique selector.
When I run my code I get the following error:

System.FormatException: Unknown functional pseudo 'nth-of-type'. Only nth-child and nth-last-child are supported.

For simple, non-nested elements, like SECTION H1:nth-of-type(1) I can use a work-around

HtmlNode element = null;
if (sel.Contains("nth-of-type"))
{
	var splitty = sel.Split(':');
	var elements = rootNode.QuerySelectorAll(splitty[0]);
	var n = int.Parse(Regex.Match(splitty[1], @"(\d)").Value);
	element = elements[n - 1];
}

But I hit a snag for more complex structures.
Any change of implementing nth-of-type?
You have nth-child, how much work would it be to have nth-of-type?

Thanks

Target .NET Standard 2.0

It would be nice if the NuGet package also targeted .NET Standard 2.0. Otherwise a lot of extra dependencies are added if you are using this package to target a platform that is .NET Standard 2.0 compliant.

HtmlNodeSelection.CachableCompile fails on secondary threads

HtmlNodeSelection.CachableCompile throws an instance of NullReferenceException when called from any secondary thread, as demonstrated below:

static void Main()
{
    void Test()
    {
        Console.WriteLine("Running on thread #" + Thread.CurrentThread.ManagedThreadId);
        HtmlNodeSelection.CachableCompile("p");
    }
    Test(); // succeeds
    var t = new Thread(Test);
    t.Start(); // fails
    t.Join();
}

QuerySelectorAll() is case-sensitive for element name (tagName)

Given a selector that includes an element name (i.e tagName), the method
IEnumerable<HtmlNode> HtmlNode.QuerySelectorAll(string selector)
will perform a case-sensitive search for the name.

In other words:

HtmlDocument document = new HtmlDocument();
document.LoadHtml("<html><body><a></a></body></html>");
document.DocumentNode.QuerySelectorAll("A"); // should select the <a> but will select nothing.
document.DocumentNode.QuerySelectorAll("a"); // works fine.

W3C docs specify CSS selectors are not case-sensitive, except for specific attributes such as class or ID.
This also how browsers' (at least my Chrome) querySelectorAll() will behave.

see also: https://stackoverflow.com/questions/7559205/are-css-selectors-case-sensitive/7559251

QuerySelectorAll doesn't validate arguments eagerly

From @raiytu4 on June 25, 2018 7:41

I demonstrate this problem here:
https://github.com/raiytu4/GetEnumeratorOnFizzler

for short:

        static void Main(string[] args)
        {
            var doc = Browser.GetDoc("http://truyenfull.vn/dai-dao-trieu-thien/").DocumentNode;
            var categoryTextNodes = FindLiStoryDes(doc, "thể loại").QuerySelectorAll("div.cp2 > a");
            if (categoryTextNodes != null)
            {
                // categoryTextNodes is not null
                // it's ok to call .GetEnumerator()
                // but call categoryTextNodes.GetEnumerator().MoveNext() will throws get NullReferenceException
                categoryTextNodes.GetEnumerator();

                // throws NullReferenceException too!
                categoryTextNodes.Count();
                
            }
            Console.ReadLine();

        }

Copied from original issue: atifaziz/Fizzler#66

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.