Is there an existing issue for this? <li class="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

/cc <a class="user-mention notranslate" data-hovercard-type="user" data-h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Unable to access the registry when a specific User-Agent header is set about registry.k8s.io HOT 13 CLOSED

kwilczynski commented on September 2, 2024

Unable to access the registry when a specific User-Agent header is set

from registry.k8s.io.

Comments (13)

kwilczynski commented on September 2, 2024 3

@BenTheElder, thank you for some of this. Appreciated.

The rule that is catching this specific User-Agent is most definitely the:

  rule {
    action   = "deny(403)"
    priority = "920"
    match {
      expr {
        expression = "evaluatePreconfiguredWaf('scannerdetection-v33-stable', {'sensitivity': 1})"
      }
    }
    description = "Scanner detection"

    preview = false
  }

Per Google's own documentation:

Google Cloud Armor preconfigured WAF rules overview

This will lead us to the following:

This is where OWSAP folks decided to block some of the tools, including the "BFAC" project. Albeit, some of the names of the projects from this list there do pass... so it's a bit puzzling which version of the ruleset Google is using exactly.

That said, I don't think there is anything that we could sensibly do here....

Removing the ruleset from WAF will invite bots, spam and scams, so not ideal
Trying to manually replicate the "scanner" ruleset from OWSAP would be an unmaintainable headache
Adding a sort of an allowlist that includes most of the popular container runtimes would invite abuse eventually

Some of the projects already use different User-Agent strings, often to mimic curl or popular browsers, so there is no helping it here too, sadly.

As such, we on the CRI-O's side will strip the extra build and release information from the User-Agent, which should limit the possibility of running into some other combination of letters, like "bfac", that would match WAF rules.

On the note of the WAF rules... I wish these were a bit tighter, such that they would match User-Agents more precisely, rather than just a specific word or letter combination anywhere within the entire header. However, it's faster this way and requires less maintenance over time, so it is what it is.

So, this is it, I suppose. Unless you have some more thoughts?

from registry.k8s.io.

BenTheElder commented on September 2, 2024 2

At least this appears to be limited to non-release-tagged versions? But that's still going to impact someone at some point.

I was thinking about this some more, I think we could actually write some pretty simple rules that just reject most garbage requests at the edge purely based on path and hope that's sufficient, drop the standard WAF rules.

WIP at kubernetes/k8s.io#6969

It will be a little bit more annoying to support additional endpoints in the future, but that seems OK

from registry.k8s.io.

BenTheElder commented on September 2, 2024 1

Would you be able to verify Cloud Armor configuration, just out of curiosity and to make sure it is indeed it?

We're using standard rules, the full configuration is open source:

https://registry.k8s.io => https://github.com/kubernetes/registry.k8s.io

The community deployment configs are documented at in the k8s.io repo with the rest of the community infra deployments, but primarily here.

https://github.com/kubernetes/k8s.io/tree/main/infra/gcp/terraform/k8s-infra-oci-proxy-prod is the main deployment

The armor rules are here: https://github.com/kubernetes/k8s.io/blob/main/infra/gcp/terraform/modules/oci-proxy/cloud-armor.tf

from registry.k8s.io.

BenTheElder commented on September 2, 2024 1

I'm not sure which ruleset contains this, but we can drop most of these.

We shouldn't disable armor entirely because we're using a custom policy for rate limiting but most of these rule sets are probably irrelevant.

We can iterate on the staging instance (DO NOT depend on this endpoint, but for testing purposes we can iterate at registry-sandbox.k8s.io).

from registry.k8s.io.

BenTheElder commented on September 2, 2024 1

The other complication: The main reason we've kept these WAF rules is actually to deny spammy vuln scanner noise at the edge.

We get a TON of noisy requests from automated scanning (... and pull-through caches attempting to pull anything and everything) and any request we can deny at the loadbalancer saves the project funds versus letting them get through to the application we use to split load for valid requests between the different cloud storage endpoints ... funds we can use for CI etc instead.

So we'll want to still on balance block known "attack" requests with WAF, and it's much easier to use a pre-supplied ruleset than develop and maintain our own.

from registry.k8s.io.

dims commented on September 2, 2024 1

we on the CRI-O's side will strip the extra build and release information from the User-Agent

Yes please. That's it. (Rules are going to be constantly updated no matter what to keep up with new spam/bot crap)

from registry.k8s.io.

AkihiroSuda commented on September 2, 2024 1

/cc @AkihiroSuda

So Suda-san can take a look at User-Agent in containerd.

This was once discussed and rejected

containerd/containerd#6474

from registry.k8s.io.

BenTheElder commented on September 2, 2024 1

This is now deployed, though I can't make promises about the behavior of any leaky backend hosts we redirect to.

We're considering handling that differently but it would be more of a long term project.

I don't think anything we currently use would block requests purely based on header substrings anymore, only invalid request paths, or excessive usage.

from registry.k8s.io.

dims commented on September 2, 2024

@kwilczynski thanks for digging in deep into this. You can see all the code we use for responding to the curl command here - https://github.com/kubernetes/registry.k8s.io/tree/main/cmd/archeio

it's a cloud run application running in google infra. While we do get the client IP, we do not try to parse User-Agent, you can see some of the code here:

registry.k8s.io/pkg/net/clientip/clientip.go

Lines 27 to 37 in 5443169

 // Get gets the client IP for an http.Request 

 // 

 // NOTE: currently only two scenarios are supported: 

 // 1. no loadbalancer, local testing 

 // 2. behind Google Cloud LoadBalancer (as in cloudrun) 

 // 

 // Note that in particular we do not support hitting the CloudRun endpoint 

 // directly (though we could easily do so here). Cloud Armor is on the GCLB, 

 // so directly accessing the CloudRun endpoint would bypass that. 

 // 

 // At this time we have no need to complicate it further.

please feel free to clone the repo and peek if you spot something!

Scanning the github-verse quickly, the 403 may be an attempt by some application firewall (Cloud Armor) to reject traffic from some tools they consider hostile?
https://github.com/mazen160/bfac/blob/18fb0b5dc05005d4f39c242609bbf2347ca0d421/bfac#L257-L259

(No, i have no clue what other strings may be considered in the same fashion!)

from registry.k8s.io.

kwilczynski commented on September 2, 2024

[...]

it's a cloud run application running in google infra. While we do get the client IP, we do not try to parse User-Agent, you can see some of the code here:
[...]
Scanning the github-verse quickly, the 403 may be an attempt by some application firewall (Cloud Armor) to reject traffic from some tools they consider hostile? mazen160/bfac@18fb0b5/bfac#L257-L259

@dims, since the registry service itself is very simple, and we didn't expect it to be anything but, what blocks these requests is probably set up somewhere as part of the infrastructure that Google donates that runs and supports the registry itself.

You mentioned Cloud Armor—we were thinking that there perhaps is some sort of a transparent proxy or WAF (Web Application Firewall) deployed somewhere or even that the registry is perhaps fronted by Cloudflare or such (which is also popular).

The IP address 34.96.108.209 we get back for registry.k8s.io, which also resolves to the same IP from different networks/locations, is within Google's 34.64.0.0/10 network. As such, I bet it's the WAF/Cloud Armor setting of sorts, and Cloud Armor is quite sophisticated, that is looking for the string "bfac" anywhere within the User-Agent value it gets as part of the request.

whois for 34.96.108.209

NetRange:       34.64.0.0 - 34.127.255.255
CIDR:           34.64.0.0/10
NetName:        GOOGL-2
NetHandle:      NET-34-64-0-0-1
Parent:         NET34 (NET-34-0-0-0-0)
NetType:        Direct Allocation
OriginAS:       
Organization:   Google LLC (GOOGL-2)
RegDate:        2018-09-28
Updated:        2018-09-28
Ref:            https://rdap.arin.net/registry/ip/34.64.0.0



OrgName:        Google LLC
OrgId:          GOOGL-2
Address:        1600 Amphitheatre Parkway
City:           Mountain View
StateProv:      CA
PostalCode:     94043
Country:        US
RegDate:        2006-09-29
Updated:        2019-11-01
Comment:        *** The IP addresses under this Org-ID are in use by Google Cloud customers *** 
Comment:        
Comment:        Direct all copyright and legal complaints to 
Comment:        https://support.google.com/legal/go/report
Comment:        
Comment:        Direct all spam and abuse complaints to 
Comment:        https://support.google.com/code/go/gce_abuse_report
Comment:        
Comment:        For fastest response, use the relevant forms above.
Comment:        
Comment:        Complaints can also be sent to the GC Abuse desk 
Comment:        ([email protected]) 
Comment:        but may have longer turnaround times.
Comment:        
Comment:        Complaints sent to any other POC will be ignored.
Ref:            https://rdap.arin.net/registry/entity/GOOGL-2


OrgAbuseHandle: GCABU-ARIN
OrgAbuseName:   GC Abuse
OrgAbusePhone:  +1-650-253-0000 
OrgAbuseEmail:  [email protected]
OrgAbuseRef:    https://rdap.arin.net/registry/entity/GCABU-ARIN

OrgNOCHandle: GCABU-ARIN
OrgNOCName:   GC Abuse
OrgNOCPhone:  +1-650-253-0000 
OrgNOCEmail:  [email protected]
OrgNOCRef:    https://rdap.arin.net/registry/entity/GCABU-ARIN

OrgTechHandle: ZG39-ARIN
OrgTechName:   Google LLC
OrgTechPhone:  +1-650-253-0000 
OrgTechEmail:  [email protected]
OrgTechRef:    https://rdap.arin.net/registry/entity/ZG39-ARIN

Would you be able to verify Cloud Armor configuration, just out of curiosity and to make sure it is indeed it?

Re: https://github.com/mazen160/bfac —the project has an option to randomly pick other user agent to make it appear as a popular browser, etc., as such, I am not sure how much "bad traffic" simply blocking "bfac" sheds, perhaps not a lot.

from registry.k8s.io.

BenTheElder commented on September 2, 2024

@dims I think containerd also includes git commit for pre-release builds, but that's maybe less concerning since tagged releases don't (I think??) ... we should probably take a look at how likely we are to run into this again with other common tools.

I don't love any of the options here. We could invest in custom rules but I think it would take a lot of time and effort to maintain, at the moment this is pretty hands-off and we're spending a lot of time on other sustainability areas.

from registry.k8s.io.

kwilczynski commented on September 2, 2024

/cc @AkihiroSuda

So Suda-san can take a look at User-Agent in containerd.

from registry.k8s.io.

kwilczynski commented on September 2, 2024

[...]

I don't love any of the options here. We could invest in custom rules but I think it would take a lot of time and effort to maintain, at the moment this is pretty hands-off and we're spending a lot of time on other sustainability areas.

@BenTheElder, yeah. Like I said, it would be a headache, indeed.

Protecting the registry, whichever way we can, takes the precedence here. This goes without saying.

from registry.k8s.io.

Unable to access the registry when a specific User-Agent header is set about registry.k8s.io HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	// Get gets the client IP for an http.Request
	//
	// NOTE: currently only two scenarios are supported:
	// 1. no loadbalancer, local testing
	// 2. behind Google Cloud LoadBalancer (as in cloudrun)
	//
	// Note that in particular we do not support hitting the CloudRun endpoint
	// directly (though we could easily do so here). Cloud Armor is on the GCLB,
	// so directly accessing the CloudRun endpoint would bypass that.
	//
	// At this time we have no need to complicate it further.