Comments (5)
Yes, it is a limitation of Go's html package, it only supports UTF-8 encoded documents. From the docs:
It is the caller's responsibility to ensure that the Reader provides UTF-8 encoded HTML.
I believe CJK requires UTF-16, right? I'll leave this issue open for future reference, as this is an issue that needs to be resolved at a lower level than goquery.
from goquery.
you can use iconv for that
from goquery.
Thanks for the info, following on this I found the library https://code.google.com/p/go-charset/ (which has an iconv interface) by Roger Peppe, so this should be of high quality. Full doc: http://godoc.org/code.google.com/p/go-charset/charset
If someone feels like writing a little example, drop me a line and I'll add it to the doc (or the wiki).
from goquery.
There is no "CJK encoding".
UTF-16, some variant of Shift-JIS or EUC-JP, and one of myriad Chinese encodings all all different encodings.
For UTF16, check syscall_windows.go.
Otherwise (such as GBK), you'll need iconv, go-charset doesn't have them.
from goquery.
By the way, I found this package: https://code.google.com/p/go.text
from goquery.
Related Issues (20)
- How to count the number of words before and after a selection across a document? HOT 2
- If an id attribute contains dots tag will not be found HOT 2
- Find(selector)??????
- I am sorry, i can't find. May be some error. HOT 1
- Cant parse <tr> without <table> element. HOT 2
- how to get text 'CZ/KHN' HOT 3
- :first :first-child test failed,the expected nodes are not filtered out HOT 1
- Good
- How to query a shadow DOM? HOT 3
- Add a generic form of `Selection.Map` (requiring a more recent Go version) HOT 4
- Question about parsing nested tables and finding outer elements HOT 5
- Release Request for Current HEAD of golang.org/x/net with Security Updates HOT 5
- v1 (2).json
- Как добавить "github.com/PuerkitoBio/goquery" в свой код в визуал студио коде? HOT 1
- помогите с установкой GOPATH И GOROOT HOT 1
- Fails to remove some img elements HOT 2
- Has anyone ever tried goquery in a webbrowser with wasm? HOT 1
- Question: Is it possible support the iterator in version 1.23? HOT 3
- Bug about Text method HOT 1
- Resolve WARNING in CI of github actions. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from goquery.