Git Product home page Git Product logo

Comments (5)

konklone avatar konklone commented on September 15, 2024

This is intended behavior, so perhaps we should make the README more clear. The README right now says:

A domain has "valid HTTPS" if it responds on port 443 at its canonical hostname with an unexpired valid certificate for the hostname.

The canonical hostname only refers to the canonical endpoint on the original domain. So if you're scanning amazonaws.com, the canonical hostname can only be amazonaws.com or www.amazonaws.com.

One of the goals of measuring Valid HTTPS, for an SLD or any of its subdomains, is to identify "is there anything preventing preloading?" As it stands, if amazonaws.com were preloaded, the redirect to aws.amazon.com would break and not function. So, I believe Valid HTTPS should be false.

from pshtt.

jsha avatar jsha commented on September 15, 2024

Ah, yes, I was confused by:

Canonical URL - A judgment call based on the observed redirect logic of the domain.

But now I see the difference between "URL" vs "hostname." Maybe just delete "at its canonical hostname" from the "Valid HTTPS" definition?

from pshtt.

konklone avatar konklone commented on September 15, 2024

It's the other way around - Canonical URL is a judgment call based on the observed internal redirect logic of the domain.

For example.com, the Canonical URL can have one of four values:

It can get slightly confusing when talking about canonical endpoints, because there are actually two calculations: which is the canonical hostname (www or the root), and what is the canonical protocol (http: or https:)? The "Canonical URL" is effectively the union of those two calculations.

Sometimes only one of those calculations is relevant for a given field. For example, whether a domain has "Valid HTTPS" means looking at the validity (e.g. open on port 443, cert is valid for the hostname) of either https://example.com or https://www.example.com. So the code identifies the canonical hostname and then examines HTTPS at that hostname.

So it could be the case that "Valid HTTPS" is based on a scan of https://www.example.com, but that http://www.example.com is the "Canonical URL".

Is this making sense? It's definitely more complicated than I was originally imagining when I first wrote some of this logic, but it is designed for extreme resilience to arguments made by people undergoing compliance audits informed by this tool's criteria.

from pshtt.

jsha avatar jsha commented on September 15, 2024

Looking at it again, I realize I was misunderstanding "Canonical URL." Intuitively I think the canonical URL for amazonaws.com is https://aws.amazon.com. But since Canonical URL can only encompass www/non-www varianets, that's not the Canonical URL and instead it's the domain itself. Maybe this:

Canonical URL - A judgment call based on the observed redirect logic of the domain.
Valid HTTPS - A domain has "valid HTTPS" if it responds on port 443 at its canonical hostname with an unexpired valid certificate for the hostname.

Should say:

Canonical URL - One of the four endpoints described above; a judgment call based on the observed redirect logic of the domain.
Valid HTTPS - A domain has "valid HTTPS" if it responds on port 443 at the hostname in its Canonical URL with an unexpired valid certificate for the hostname. This can be true even if the Canonical URL uses HTTP.

from pshtt.

konklone avatar konklone commented on September 15, 2024

Those are good edits!

And I can understand why the definition might not feel intuitive. The best way I can frame the rationale is -

  • Valid HTTPS - "If I preloaded the domain, would this break?"
  • Canonical URL - "If I were to hyperlink example.com, what should the href attribute of the link be?"

from pshtt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.