antchfx / xquery Goto Github PK
View Code? Open in Web Editor NEWExtract data or evaluate value from HTML/XML documents using XPath
Home Page: https://github.com/antchfx/xpath
License: MIT License
Extract data or evaluate value from HTML/XML documents using XPath
Home Page: https://github.com/antchfx/xpath
License: MIT License
就直接中文吧
<span class="fr date">
<span style="color:#999;font-size:8px;">
<script type="text/javascript">
//something
</script>
</span>
2017-05-11
</span>
第一个问题:
表达式://span[@class='date']
直接出现panic匹配错误,没有返回err吗?
第二个问题
表达式://span[@class]
返回正常,但是InnerText返回的值还包含了script代码,而不是纯文本?
错误的表达式会触发panic err
node := htmlquery.FindOne(root, "span[@Class='test')") //这句
if node == nil {
//something
}
span[@class='test')
应该是 span[@class='test']
能否屏蔽 panic err?
I have the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<info>
This is the first sentence in the info tag.
<ext>http://example.org/1</ext>
This sentence is between two ext tags.
<ext>http://test.org/2</ext>
This sentence is at the end of the info tag.
</info>
The TextNode between the two <ext>
tags ("This sentence is between two ext tags."
) is ignored when attempting to parse the XML using xquery.Parse()
.
Example code:
package main
import (
"bytes"
"log"
"github.com/antchfx/xquery/xml"
)
func main() {
b := bytes.NewBuffer([]byte(`<?xml version="1.0" encoding="UTF-8"?>
<info>
This is the first sentence in the info tag.
<ext>http://example.org/1</ext>
This sentence is between two ext tags.
<ext>http://test.org/2</ext>
This sentence is at the end of the info tag.
</info>`))
root, err := xmlquery.Parse(b)
if err != nil {
log.Fatalf("Error parsing XML: %v", err)
}
log.Print(root.OutputXML(false))
}
Output:
<xml version="1.0" encoding="UTF-8"></xml>
<info>This is the first sentence in the info tag.
<ext>http://example.org/1</ext>
<ext>http://test.org/2</ext>
This sentence is at the end of the info tag.
</info>
As you can see, the text between the two <ext>
tags is missing.
package main
import (
"fmt"
"github.com/antchfx/xquery/xml"
)
func main() {
root, err := xmlquery.LoadURL("http://cn.engadget.com/rss.xml")
if err != nil {
panic(err)
}
item := xmlquery.FindOne(root, "//channel/item[1]/dc:creator")
}
item is nil
<span>2017/05/08<span>
//...some code
node := htmlquery.FindOne(root, "//span")
text := htmlquery.InnerText(node)
text
得到的结果并不是 2017/05/08
,而是2017-05-08 00:00:00.0
,是不是有问题?
for example:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?>
<?xml-stylesheet type="text/css" media="screen" href="http://feeds.reuters.com/~d/styles/itemcontent.css"?>
<rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
....
</rss>
when xquery
parse this xml file with multiple PI statement will panic error xml: document is invalid
.
// OutputHTML returns the text including tags name.
func OutputHTML(n *html.Node) string {
var buf bytes.Buffer
if err := html.Render(&buf, n); err != nil {
return ""
}
// outputXML(&buf, n)
return buf.String()
}
my xml (with a typo in the closing tag of three
<?xml version="1.0"?>
<Scenarios>
<scenario>
<one>1</one>
<two>2</two>
<three>3</tree>
</scenario>
</Scenarios>
when I do this:
allScenarios := xmlquery.Find(memXML, "//scenario")
fmt.Println(len(allScenarios))
It doesn't come to to printing of the len, but it raises me an error.
(And I'm not capable of handling it)
Hi,
First, thank you for your work on xpath and xquery for go, I'm adapting your xpath library to another dom structure I have and I looked at the code here to understand how the NodeNavigator worked.
I think I found inconsistencies in your NodeNavigator around attribute nodes, due to the fact that your dom structure don't have first class attribute nodes.
In particular, trying the xpath expression /a/b/@attr/..
on a document <a><b attr="1"/></a>
should select <b attr="1"/>
. It seems your implementation selects <a>...</a>
instead.
I think that this is because when you MoveToParent()
when an attribute is selected moves you to the grandparent of the attribute instead of its parent. It moves to the parent of the element containing the attribute.
When trying to test this kind of xml/query_test.go with
if node := FindOne(doc, "//book/@id/..[1]"); node.SelectAttr("id") != "bk101" {
t.Fatal("//book/@id/...[1]/@id != bk101")
}
it seems it selects no node, node
is nil and I get a stack trace
Hi,
When I try to install it on ubuntu 16.04, I am getting the following error:
can't load package: package github.com/antchfx/xquery: no buildable Go source files in /home/ps06756/go/src/github.com/antchfx/xquery
, when I run the following command:
go get github.com/antchfx/xquery
<ul>
<li><p><a href="test.html"></a></p></li>
</ul>
expr: //ul/li/p/a/@href
htmlquery. FindOne
return a Node, is not attribute, I want to the attribute string, If you use htmlquery.SelectAttr
, I also need to split the expression:
split := strings.Split("//ul/li/p/a@href", "@")
node := htmlquery.FindOne(root, split[0])
attribute := htmlquery.SelectAttr(node, split[1])
Too many steps
Can I use an expression in one step? Because the xpath expression can be done, not of using htmlquery.SelectAttr
<tr>
<td class="test">
<p></p><span></span>
</td>
</tr>
node := htmlquery.FindOne(root, `/tr/td`)
fmt.Println(htmlquery.OutputHTML(node))
result print:
<td class="test">
<p></p><span></span>
</td>
But I want to get:
<p></p><span></span>
How to do it?
More a question
To get an attribute in html:
href := htmlquery.SelectAttr(tag, "href")
to get it in xml:
debug := tag.SelectAttr("debug")
Why the difference?
strings.TrimSpace会做一些郁闷的事情,比如去除换行,去掉空格,去掉NBSP,但去掉的这些都是内容,直接输出原生的文本就好,至于结果由使用者自己处理。
Hello!
//*[@id="whatsnew"]/div/div[2]/strong/a//@href
doesn't work as expected.
My code:
package main
import (
"fmt"
"net/url"
"github.com/antchfx/xquery/html"
)
func main() {
root, err := htmlquery.LoadURL("https://www.apkmirror.com/apk/niantic-inc/pokemon-go")
if err != nil {
panic(err)
}
urlToAPK := htmlquery.InnerText(htmlquery.FindOne(root, "//*[@id=\"whatsnew\"]/div/div[2]/strong/a//@href"))
fmt.Println(urlToAPK)
}
and i got Pokémon GO 0.55.0
instead if link.
Is it my fault or your?
go get github.com/antchfx/xquery results in
can't load package: package github.com/antchfx/xquery: no buildable Go source files in /Users/i335366/go/src/github.com/antchfx/xquery, please help.
I wrote a unit test to demonstrate that multi-element matching is not working and returns false-positives.
if list := Find(doc, "//book[genre='Fantasy'][author='Corets, Eva']"); len(list) != 1 {
t.Fatal("//book[genre='Fantasy'][author='Corets, Eva'] items count is not equal 1")
}
https://github.com/drauschenbach/xquery/blob/master/xml/query_test.go
When I do
`
package main
import (
"fmt"
"log"
"net/http"
"github.com/antchfx/xquery/html"
"golang.org/x/net/html"
)
func main() {
resp, err := http.Get("http://www.lostfilm.tv/")
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
root, err := html.Parse(resp.Body)
if err != nil {
log.Fatal(err)
}
var xpath string
xpath = `//title`
node := htmlquery.FindOne(root, xpath)
fmt.Println(htmlquery.InnerText(node))
}
`
I get:
panic: unknown HTML node type: 5
goroutine 1 [running]:
panic(0x609180, 0xc042121620)
C:/Go/src/runtime/panic.go:500 +0x1af
github.com/antchfx/xquery/html.(_htmlNodeNavigator).NodeType(0xc04211d700, 0x0)
D:/Projects/gopath/src/github.com/antchfx/xquery/html/query.go:120 +0x130
github.com/antchfx/gxpath/internal/build.axisPredicate.func1(0x769000, 0xc04211d700, 0x40f700)
D:/Projects/gopath/src/github.com/antchfx/gxpath/internal/build/build.go:45 +0x4e
github.com/antchfx/gxpath/internal/query.(_DescendantQuery).Select.func1(0x63cc80, 0xc0421df350)
D:/Projects/gopath/src/github.com/antchfx/gxpath/internal/query/query.go:236 +0xa9
github.com/antchfx/gxpath/internal/query.(_DescendantQuery).Select(0xc0421df320, 0x763760, 0xc04211d6c0, 0x0, 0xc04211d6c0)
D:/Projects/gopath/src/github.com/antchfx/gxpath/internal/query/query.go:243 +0x3d
github.com/antchfx/gxpath.(_NodeIterator).MoveNext(0xc04211d6c0, 0xc04211d640)
D:/Projects/gopath/src/github.com/antchfx/gxpath/select.go:22 +0x50
github.com/antchfx/xquery/html.FindOne(0xc042010150, 0x65de4e, 0x7, 0x0)
D:/Projects/gopath/src/github.com/antchfx/xquery/html/query.go:33 +0xb4
main.main()
D:/Projects/gopath/src/Test2/main.go:30 +0x2d8
I found some XML file missing this declaration: <?xml version="1.0" encoding="UTF-8"?>
. the xmlquery package parse will return error about invalid xml.
In the most cases, the xmlquery package should be compatible this case. if xml file missing this declaration will take <?xml version="1.0" ?>
as default declaration.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.