PowerHTML
HTML Agility Pack implementation in Powershell for parsing and manipulating HTML
Initially this project provides the ConvertFrom-HTML cmdlet, which can be used to parse HTML without requiring IE and HTML document.
HTML Agility Pack implementation in Powershell for parsing and manipulating HTML
License: MIT License
Update to latest version of the HtmlAgilityPack assemblies v1.11.16
Possible solution: Use package management to download NuGet package for HtmlAgilityPack assemblies.
Hi there. Really appreciate this module using PowerShell Core. Thank you for your work!
Scraping some European websites I came across an issue in regards to special characters, like ü, ä, ö, é, ß, etc.
Somehow ConvertFrom-Html cannot handle these characters and parses them as question marks. It seems to be related to the encoding which cannot be specified by any parameter.
Any ideas how to solve this?
Invoke-WebRequest content show the "ü" character correctly
$Result = Invoke-WebRequest -Uri "https://www.compart.com/en/unicode/U+00FC"
$Result.Content -split "<" | Where-Object {$_ -like '*span class="box">*'}
>> span class="box">ü
ConvertFrom-Html parses that into "??"
$Html = ConvertFrom-Html -Content $Result
$Html.SelectNodes('//span[@class="box"]')
>> NodeType Name AttributeCount ChildNodeCount ContentLength InnerText
>> -------- ---- -------------- -------------- ------------- ---------
>> Element span 1 1 2 ??
Return headers show correct content-type utf-8
$Result.Headers
>> Key Value
>> --- -----
>> Server {nginx}
>> Date {Sun, 03 Oct 2021 10:22:56 GMT}
>> Connection {keep-alive}
>> X-Powered-By {Express}
>> Accept-Ranges {bytes}
>> Cache-Control {public, max-age=0}
>> ETag {W/"aabd-17a2d88a25f"}
>> X-Response-Time {0}
>> Vary {Accept-Encoding}
>> Content-Type {text/html; charset=utf-8}
>> Content-Length {43709}
>> Last-Modified {Mon, 21 Jun 2021 07:46:07 GMT}
Version info
$PSVersionTable
>> Name Value
>> ---- -----
>> PSVersion 7.1.3
>> PSEdition Core
>> GitCommitId 7.1.3
>> OS Darwin 20.2.0 Darwin Kernel Version 20.2.0: Wed Dec 2 20:40:21 PST 2020; root:xnu-7195.60.75~1/RELEASE_ARM64_T8101
>> Platform Unix
>> PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
>> PSRemotingProtocolVersion 2.3
>> SerializationVersion 1.1.0.1
>> WSManStackVersion 3.0
Get-Module PowerHTML
>> ModuleType Version PreRelease Name ExportedCommands
>> ---------- ------- ---------- ---- ----------------
>> Script 0.1.7 PowerHTML ConvertFrom-Html
Please, update to latest version of the HtmlAgilityPack assemblies v1.11.46.
As you said here #4 (comment), it would be wonderful to automatically rev the dependency as new versions are released
How am I able to wait for javascript to load/run?
I'm using PowerHTML is one of my modules, my module contains a Types
folder with multiple .ps1xml files. When I am in my module's root directory and try to import the PowerHTML module, it fails with this error:
PS /Users/evan.yeung/git/OktaPS> Import-Module PowerHTML
Import-Module: The member 'FormatsToProcess' in the module manifest is not valid: The path cannot be processed because it resolved to more than one file; only one file at a time can be processed.. Verify that a valid value is specified for this field in the '/Users/evan.yeung/.local/share/powershell/Modules/PowerHTML/0.1.7/PowerHTML.psd1' file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.