Git Product home page Git Product logo

Comments (9)

asciimoo avatar asciimoo commented on May 22, 2024 1

The bug is reproducible without colly.
The following code demonstrates the problem:

package main

import (
    "net/http"
    "net/http/cookiejar"
)

func main() {
    jar, _ := cookiejar.New(nil)
    client := &http.Client{Jar: jar}
    client.CheckRedirect = func(req *http.Request, via []*http.Request) error {
        lastRequest := via[len(via)-1]
        req.Header = lastRequest.Header
        return nil
    }
    client.Get("https://en.wikipedia.org/")
}

Seems, the bug only appears if the client has a cookie jar and a custom redirect handler which writes to http.Request.Header using HTTP/2 protocol.

from colly.

asciimoo avatar asciimoo commented on May 22, 2024

I can't reproduce the bug, the above code runs without errors. What is your environment (os/go version/colly version)?

from colly.

papa-stiflera avatar papa-stiflera commented on May 22, 2024
$ go version
go version go1.9.1 linux/amd64

colly version: d7069d1 (master)

from colly.

asciimoo avatar asciimoo commented on May 22, 2024

hmm.. I can reproduce if I add the -race flag to go run command, but I don't really understand why this happens.

from colly.

kataras avatar kataras commented on May 22, 2024

@papa-stiflera you didn't paste the whole code you're using in order to be able to re-produce this. At the end of your logs: /home/skruglov/Projects/go/src/crawler/main.go:19 +0xe4 but the code you gave us is not too long( maybe of the import statements? I don't know but...) so I assume your data race is somewhere else and has nothing to do with the net/http package or the colly one. Please give us the necessary information to help you, thank you!

My output (win 10 x64, go 1.9.1);

colly_issue_36_output

from colly.

asciimoo avatar asciimoo commented on May 22, 2024

@kataras thanks for the debugging. My first assumption was the same, then I tried the above code, and the bug is reproducible - even with the below snippet:

package main

import (
    "github.com/asciimoo/colly"
)

func main() {
    colly.NewCollector().Visit("https://en.wikipedia.org/")
}

The strange thing is that if I change the URL to a service which doesn't support HTTP/2 (e.g. to https://httpbin.org/) the race disappears.

UPDATE:
It's pretty sure that the bug is somehow connected to the HTTP/2 support;
This command doesn't fail:
GODEBUG='http2client=0' go run -race t/test_36.go
and this fails:
GODEBUG='http2client=1' go run -race t/test_36.go

from colly.

kataras avatar kataras commented on May 22, 2024

@asciimoo It's funny because the data racer couldn't find that with the first try, I had to re-run the program more than 4 times to view the race log...

Update:

I managed to "fix" that by locking when client.Do, see below and test it by yourself, if that works just put locks there;

colly_changes

from colly.

asciimoo avatar asciimoo commented on May 22, 2024

@kataras unfortunately your suggested solution is not applicable, because it forbids parallelism in httpBackend and the error doesn't disappear for me if I run GODEBUG='http2client=1' go run -race xy.go.

from colly.

kataras avatar kataras commented on May 22, 2024

I know @asciimoo ...I suggested it as a temporary solution, I don't know the whole code base so I can't help any further for now, but if it's a net/http issue then you have to fill an issue there :/

from colly.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.