Comments (9)
The bug is reproducible without colly.
The following code demonstrates the problem:
package main
import (
"net/http"
"net/http/cookiejar"
)
func main() {
jar, _ := cookiejar.New(nil)
client := &http.Client{Jar: jar}
client.CheckRedirect = func(req *http.Request, via []*http.Request) error {
lastRequest := via[len(via)-1]
req.Header = lastRequest.Header
return nil
}
client.Get("https://en.wikipedia.org/")
}
Seems, the bug only appears if the client has a cookie jar and a custom redirect handler which writes to http.Request.Header
using HTTP/2 protocol.
from colly.
I can't reproduce the bug, the above code runs without errors. What is your environment (os/go version/colly version)?
from colly.
$ go version
go version go1.9.1 linux/amd64
colly version: d7069d1 (master)
from colly.
hmm.. I can reproduce if I add the -race
flag to go run
command, but I don't really understand why this happens.
from colly.
@papa-stiflera you didn't paste the whole code you're using in order to be able to re-produce this. At the end of your logs: /home/skruglov/Projects/go/src/crawler/main.go:19 +0xe4
but the code you gave us is not too long( maybe of the import statements? I don't know but...) so I assume your data race is somewhere else and has nothing to do with the net/http package or the colly one. Please give us the necessary information to help you, thank you!
My output (win 10 x64, go 1.9.1);
from colly.
@kataras thanks for the debugging. My first assumption was the same, then I tried the above code, and the bug is reproducible - even with the below snippet:
package main
import (
"github.com/asciimoo/colly"
)
func main() {
colly.NewCollector().Visit("https://en.wikipedia.org/")
}
The strange thing is that if I change the URL to a service which doesn't support HTTP/2 (e.g. to https://httpbin.org/
) the race disappears.
UPDATE:
It's pretty sure that the bug is somehow connected to the HTTP/2 support;
This command doesn't fail:
GODEBUG='http2client=0' go run -race t/test_36.go
and this fails:
GODEBUG='http2client=1' go run -race t/test_36.go
from colly.
@asciimoo It's funny because the data racer couldn't find that with the first try, I had to re-run the program more than 4 times to view the race log...
Update:
I managed to "fix" that by locking when client.Do
, see below and test it by yourself, if that works just put locks there;
from colly.
@kataras unfortunately your suggested solution is not applicable, because it forbids parallelism in httpBackend
and the error doesn't disappear for me if I run GODEBUG='http2client=1' go run -race xy.go
.
from colly.
I know @asciimoo ...I suggested it as a temporary solution, I don't know the whole code base so I can't help any further for now, but if it's a net/http issue then you have to fill an issue there :/
from colly.
Related Issues (20)
- Fetching data that is not coming in curl output HOT 1
- SIGSEG on local files HOT 1
- Misleading Request.Depth documentation HOT 1
- I have a request that I don't know how to make using Colly. HOT 1
- Different MaxDepth on AllowedDomains and others ? HOT 3
- Is there a way spider for "https://netbanking.hdfcbank.com/netbanking" with my owner account? HOT 1
- HTML encoding is not autodetected properly HOT 6
- request chan error HOT 1
- Cannot send request with no Accept header HOT 1
- handleOnXML tries to parse`.xlsx` files HOT 2
- Next release - When?
- Duplicated requests HOT 1
- How to handle selector not found?
- how to by pass c.OnError HOT 1
- Post Login not persist session
- Weird async behaviour - duplicates in responses HOT 1
- Support BFS search .
- retry redirect to AlreadyVisitedUrl will loop error HOT 1
- Error trying to conditionally set up proxy function HOT 1
- Can't provide duplicate key values when POSTing form data
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from colly.