Git Product home page Git Product logo

Comments (6)

LeMoussel avatar LeMoussel commented on May 21, 2024

Why do you a errChan channel?
Is it correct to do this?

     go func() {
         defer c.wg.Done()
 
         errChan := c.scrape(...)

         // Do some stuff ....
 
     }()
 

from colly.

THE108 avatar THE108 commented on May 21, 2024

Why do you a errChan channel?

I use it here to let caller wait for an error on that channel.

errChan := scrapeAsync(...)

... do stuff ...

err := <-errChan

Is it correct to do this?

It depends on what are you going to do with that error (c.scrape(...) returns error).

from colly.

asciimoo avatar asciimoo commented on May 21, 2024

@THE108 you are right, the current behavior is not suitable for your needs. As a workaround you can wrap c.Visit with a function and use a WaitGroup there.

The func (c *Collector) scrapeAsync(...) chan<- error approach is interesting, how do you imagine the public API for this? Duplicating all the Visit, Post, PostRaw, etc.. functions of the Collector and the Request with an Async prefix wouldn't be the best..
Perhaps Collector could be extended with an Async bool member and if Collector.Async is true then scrapeAsync would be called instead of scrape. With this solution it is not possible to return an error channel, this is the only downside, but errors can be handled in OnError callbacks. What do you think?

from colly.

THE108 avatar THE108 commented on May 21, 2024

I think it's better to add new functionality than changing existing one (breaking back compatibility).

Anyway, I think both options are not perfect. So, maybe we can think about:

  1. Let caller manage concurrency themselves. Just provide a sync.WaitGroup as a param (possibly nil).

  2. Manage concurrency internally. Collector could handle its own goroutine pool (which will provide better goroutine reuse and request rate limiting).

from colly.

asciimoo avatar asciimoo commented on May 21, 2024

I think it's better to add new functionality than changing existing one (breaking back compatibility).

The solution I suggested doesn't break the existing api.

from colly.

THE108 avatar THE108 commented on May 21, 2024

Sorry, I misunderstood your suggestion.

Perhaps Collector could be extended with an Async bool member and if Collector.Async is true then scrapeAsync would be called instead of scrape.

I think that should work.

from colly.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.