Git Product home page Git Product logo

Comments (4)

mna avatar mna commented on September 26, 2024 1

Ah ok, sorry for the confusion, I didn't see where the user-agent could come into play in goquery. Well this is really just a helper function, you can't customize anything on the request as it is.

What you can do (though not tested, but along those lines) is this:

req, err := http.NewRequest("GET", url, nil)
if err != nil {
// handle error
}
req.Header.Set("User-Agent", ua)
res, err := http.DefaultClient.Do(req)
if err != nil {
// handle error
}
defer res.Body.Close()
root, err := html.Parse(res.Body)
if err != nil {
  return
}
d := NewDocumentFromNode(root)

from goquery.

mynameiskreang avatar mynameiskreang commented on September 26, 2024 1

@raichu
hello,
Maybe this code can help you.

req, _ := http.NewRequest("GET", link, nil)
req.Header.Add("Content-Type", "application/x-www-form-urlencoded")
req.Header.Add("user-agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36")
if resp, err := http.DefaultClient.Do(req); err == nil {
if doc, err := goquery.NewDocumentFromResponse(resp); err == nil {
}
}`

from goquery.

mna avatar mna commented on September 26, 2024

I think you're on the wrong project page, you must be referring to gocrawl https://github.com/PuerkitoBio/gocrawl ?

If so, yes you can configure the user-agent (both the user-agent of the crawler when requesting robots.txt and the user-agent used to request pages). It's on the Crawler.Options field. (https://github.com/PuerkitoBio/gocrawl#options)

from goquery.

raichu avatar raichu commented on September 26, 2024

I'm referring to this line specifically

from goquery.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.