Git Product home page Git Product logo

Comments (13)

kwilczynski avatar kwilczynski commented on September 2, 2024 3

@BenTheElder, thank you for some of this. Appreciated.

The rule that is catching this specific User-Agent is most definitely the:

  rule {
    action   = "deny(403)"
    priority = "920"
    match {
      expr {
        expression = "evaluatePreconfiguredWaf('scannerdetection-v33-stable', {'sensitivity': 1})"
      }
    }
    description = "Scanner detection"

    preview = false
  }

Per Google's own documentation:

This will lead us to the following:

This is where OWSAP folks decided to block some of the tools, including the "BFAC" project. Albeit, some of the names of the projects from this list there do pass... so it's a bit puzzling which version of the ruleset Google is using exactly.

That said, I don't think there is anything that we could sensibly do here....

  • Removing the ruleset from WAF will invite bots, spam and scams, so not ideal
  • Trying to manually replicate the "scanner" ruleset from OWSAP would be an unmaintainable headache
  • Adding a sort of an allowlist that includes most of the popular container runtimes would invite abuse eventually

Some of the projects already use different User-Agent strings, often to mimic curl or popular browsers, so there is no helping it here too, sadly.

As such, we on the CRI-O's side will strip the extra build and release information from the User-Agent, which should limit the possibility of running into some other combination of letters, like "bfac", that would match WAF rules.

On the note of the WAF rules... I wish these were a bit tighter, such that they would match User-Agents more precisely, rather than just a specific word or letter combination anywhere within the entire header. However, it's faster this way and requires less maintenance over time, so it is what it is.

So, this is it, I suppose. Unless you have some more thoughts?

from registry.k8s.io.

BenTheElder avatar BenTheElder commented on September 2, 2024 2

At least this appears to be limited to non-release-tagged versions? But that's still going to impact someone at some point.

I was thinking about this some more, I think we could actually write some pretty simple rules that just reject most garbage requests at the edge purely based on path and hope that's sufficient, drop the standard WAF rules.

WIP at kubernetes/k8s.io#6969

It will be a little bit more annoying to support additional endpoints in the future, but that seems OK

from registry.k8s.io.

BenTheElder avatar BenTheElder commented on September 2, 2024 1

Would you be able to verify Cloud Armor configuration, just out of curiosity and to make sure it is indeed it?

We're using standard rules, the full configuration is open source:

https://registry.k8s.io => https://github.com/kubernetes/registry.k8s.io

The community deployment configs are documented at in the k8s.io repo with the rest of the community infra deployments, but primarily here.

https://github.com/kubernetes/k8s.io/tree/main/infra/gcp/terraform/k8s-infra-oci-proxy-prod is the main deployment

The armor rules are here: https://github.com/kubernetes/k8s.io/blob/main/infra/gcp/terraform/modules/oci-proxy/cloud-armor.tf

from registry.k8s.io.

BenTheElder avatar BenTheElder commented on September 2, 2024 1

I'm not sure which ruleset contains this, but we can drop most of these.

We shouldn't disable armor entirely because we're using a custom policy for rate limiting but most of these rule sets are probably irrelevant.

We can iterate on the staging instance (DO NOT depend on this endpoint, but for testing purposes we can iterate at registry-sandbox.k8s.io).

from registry.k8s.io.

BenTheElder avatar BenTheElder commented on September 2, 2024 1

The other complication: The main reason we've kept these WAF rules is actually to deny spammy vuln scanner noise at the edge.

We get a TON of noisy requests from automated scanning (... and pull-through caches attempting to pull anything and everything) and any request we can deny at the loadbalancer saves the project funds versus letting them get through to the application we use to split load for valid requests between the different cloud storage endpoints ... funds we can use for CI etc instead.

So we'll want to still on balance block known "attack" requests with WAF, and it's much easier to use a pre-supplied ruleset than develop and maintain our own.

from registry.k8s.io.

dims avatar dims commented on September 2, 2024 1

we on the CRI-O's side will strip the extra build and release information from the User-Agent

Yes please. That's it. (Rules are going to be constantly updated no matter what to keep up with new spam/bot crap)

from registry.k8s.io.

AkihiroSuda avatar AkihiroSuda commented on September 2, 2024 1

/cc @AkihiroSuda

So Suda-san can take a look at User-Agent in containerd.

This was once discussed and rejected

from registry.k8s.io.

BenTheElder avatar BenTheElder commented on September 2, 2024 1

This is now deployed, though I can't make promises about the behavior of any leaky backend hosts we redirect to.

We're considering handling that differently but it would be more of a long term project.

I don't think anything we currently use would block requests purely based on header substrings anymore, only invalid request paths, or excessive usage.

from registry.k8s.io.

dims avatar dims commented on September 2, 2024

@kwilczynski thanks for digging in deep into this. You can see all the code we use for responding to the curl command here - https://github.com/kubernetes/registry.k8s.io/tree/main/cmd/archeio

it's a cloud run application running in google infra. While we do get the client IP, we do not try to parse User-Agent, you can see some of the code here:

// Get gets the client IP for an http.Request
//
// NOTE: currently only two scenarios are supported:
// 1. no loadbalancer, local testing
// 2. behind Google Cloud LoadBalancer (as in cloudrun)
//
// Note that in particular we do not support hitting the CloudRun endpoint
// directly (though we could easily do so here). Cloud Armor is on the GCLB,
// so directly accessing the CloudRun endpoint would bypass that.
//
// At this time we have no need to complicate it further.

please feel free to clone the repo and peek if you spot something!

Scanning the github-verse quickly, the 403 may be an attempt by some application firewall (Cloud Armor) to reject traffic from some tools they consider hostile?
https://github.com/mazen160/bfac/blob/18fb0b5dc05005d4f39c242609bbf2347ca0d421/bfac#L257-L259

(No, i have no clue what other strings may be considered in the same fashion!)

from registry.k8s.io.

kwilczynski avatar kwilczynski commented on September 2, 2024

[...]

it's a cloud run application running in google infra. While we do get the client IP, we do not try to parse User-Agent, you can see some of the code here:
[...]
Scanning the github-verse quickly, the 403 may be an attempt by some application firewall (Cloud Armor) to reject traffic from some tools they consider hostile? mazen160/bfac@18fb0b5/bfac#L257-L259

@dims, since the registry service itself is very simple, and we didn't expect it to be anything but, what blocks these requests is probably set up somewhere as part of the infrastructure that Google donates that runs and supports the registry itself.

You mentioned Cloud Armor—we were thinking that there perhaps is some sort of a transparent proxy or WAF (Web Application Firewall) deployed somewhere or even that the registry is perhaps fronted by Cloudflare or such (which is also popular).

The IP address 34.96.108.209 we get back for registry.k8s.io, which also resolves to the same IP from different networks/locations, is within Google's 34.64.0.0/10 network. As such, I bet it's the WAF/Cloud Armor setting of sorts, and Cloud Armor is quite sophisticated, that is looking for the string "bfac" anywhere within the User-Agent value it gets as part of the request.

whois for 34.96.108.209
NetRange:       34.64.0.0 - 34.127.255.255
CIDR:           34.64.0.0/10
NetName:        GOOGL-2
NetHandle:      NET-34-64-0-0-1
Parent:         NET34 (NET-34-0-0-0-0)
NetType:        Direct Allocation
OriginAS:       
Organization:   Google LLC (GOOGL-2)
RegDate:        2018-09-28
Updated:        2018-09-28
Ref:            https://rdap.arin.net/registry/ip/34.64.0.0



OrgName:        Google LLC
OrgId:          GOOGL-2
Address:        1600 Amphitheatre Parkway
City:           Mountain View
StateProv:      CA
PostalCode:     94043
Country:        US
RegDate:        2006-09-29
Updated:        2019-11-01
Comment:        *** The IP addresses under this Org-ID are in use by Google Cloud customers *** 
Comment:        
Comment:        Direct all copyright and legal complaints to 
Comment:        https://support.google.com/legal/go/report
Comment:        
Comment:        Direct all spam and abuse complaints to 
Comment:        https://support.google.com/code/go/gce_abuse_report
Comment:        
Comment:        For fastest response, use the relevant forms above.
Comment:        
Comment:        Complaints can also be sent to the GC Abuse desk 
Comment:        ([email protected]) 
Comment:        but may have longer turnaround times.
Comment:        
Comment:        Complaints sent to any other POC will be ignored.
Ref:            https://rdap.arin.net/registry/entity/GOOGL-2


OrgAbuseHandle: GCABU-ARIN
OrgAbuseName:   GC Abuse
OrgAbusePhone:  +1-650-253-0000 
OrgAbuseEmail:  [email protected]
OrgAbuseRef:    https://rdap.arin.net/registry/entity/GCABU-ARIN

OrgNOCHandle: GCABU-ARIN
OrgNOCName:   GC Abuse
OrgNOCPhone:  +1-650-253-0000 
OrgNOCEmail:  [email protected]
OrgNOCRef:    https://rdap.arin.net/registry/entity/GCABU-ARIN

OrgTechHandle: ZG39-ARIN
OrgTechName:   Google LLC
OrgTechPhone:  +1-650-253-0000 
OrgTechEmail:  [email protected]
OrgTechRef:    https://rdap.arin.net/registry/entity/ZG39-ARIN

Would you be able to verify Cloud Armor configuration, just out of curiosity and to make sure it is indeed it?

Re: https://github.com/mazen160/bfac —the project has an option to randomly pick other user agent to make it appear as a popular browser, etc., as such, I am not sure how much "bad traffic" simply blocking "bfac" sheds, perhaps not a lot.

from registry.k8s.io.

BenTheElder avatar BenTheElder commented on September 2, 2024

@dims I think containerd also includes git commit for pre-release builds, but that's maybe less concerning since tagged releases don't (I think??) ... we should probably take a look at how likely we are to run into this again with other common tools.

I don't love any of the options here. We could invest in custom rules but I think it would take a lot of time and effort to maintain, at the moment this is pretty hands-off and we're spending a lot of time on other sustainability areas.

from registry.k8s.io.

kwilczynski avatar kwilczynski commented on September 2, 2024

/cc @AkihiroSuda

So Suda-san can take a look at User-Agent in containerd.

from registry.k8s.io.

kwilczynski avatar kwilczynski commented on September 2, 2024

[...]

I don't love any of the options here. We could invest in custom rules but I think it would take a lot of time and effort to maintain, at the moment this is pretty hands-off and we're spending a lot of time on other sustainability areas.

@BenTheElder, yeah. Like I said, it would be a headache, indeed.

Protecting the registry, whichever way we can, takes the precedence here. This goes without saying.

from registry.k8s.io.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.