I am running a PiHole and thought I would try goclone against some websites, and encountered the following colly issue
$ goclone https://searchcode.com/
Extracting --> https://searchcode.com/
Css found --> /static/css/newstyles.css
Extracting --> https://searchcode.com/static/css/newstyles.css
Js found --> //cdn.carbonads.com/carbon.js?zoneid=1673&serve=C6AILKT&placement=searchcodecom
Extracting --> https://cdn.carbonads.com/carbon.js?zoneid=1673&serve=C6AILKT&placement=searchcodecom
panic: Get "https://cdn.carbonads.com/carbon.js?zoneid=1673&serve=C6AILKT&placement=searchcodecom": dial tcp 0.0.0.0:443: connect: connection refused
goroutine 35 [running]:
github.com/imthaghost/goclone/pkg/crawler.Extractor({0x14000407c00, 0x55}, {0x140003a6000, 0x21})
/Users/ghost/go/src/github.com/imthaghost/goclone/pkg/crawler/extractor.go:35 +0x24c
github.com/imthaghost/goclone/pkg/crawler.Collector.func2(0x140004aec60)
/Users/ghost/go/src/github.com/imthaghost/goclone/pkg/crawler/collector.go:37 +0x120
github.com/gocolly/colly/v2.(*Collector).handleOnHTML.func1(0x0, 0x140004a1560)
/Users/ghost/go/pkg/mod/github.com/gocolly/colly/[email protected]/colly.go:1074 +0x70
github.com/PuerkitoBio/goquery.(*Selection).Each(0x140004a1530, 0x14000073e30)
/Users/ghost/go/pkg/mod/github.com/!puerkito!bio/[email protected]/iteration.go:10 +0x50
github.com/gocolly/colly/v2.(*Collector).handleOnHTML(0x140003ac000, 0x140003c06c0)
/Users/ghost/go/pkg/mod/github.com/gocolly/colly/[email protected]/colly.go:1064 +0x288
github.com/gocolly/colly/v2.(*Collector).fetch(0x140003ac000, {0x140003a4060, 0x17}, {0x10531f364, 0x3}, 0x1, {0x0, 0x0}, 0x0, 0x1400038c210, ...)
/Users/ghost/go/pkg/mod/github.com/gocolly/colly/[email protected]/colly.go:676 +0x7a0
created by github.com/gocolly/colly/v2.(*Collector).scrape
/Users/ghost/go/pkg/mod/github.com/gocolly/colly/[email protected]/colly.go:574 +0x43c
This only occurs when running against websites that have blocked content, which then throws the above. While portions of the site are still cloned such an error seems like something that should be handled.
Disabling the pi-hole resolves the issue. While I understand pi-hole is not the expected path, I imagine DNS might be configured in some cases and produce something like the above.