Git Product home page Git Product logo

pshtt's Introduction

Pushing HTTPS πŸ”’

Latest Version GitHub Build Status CodeQL Coverage Status Known Vulnerabilities

pshtt ("pushed") is a tool to scan domains for HTTPS best practices. It saves its results to a CSV (or JSON) file.

pshtt was developed to push organizations β€” especially large ones like the US Federal Government πŸ‡ΊπŸ‡Έ β€” to adopt HTTPS across the enterprise. Federal agencies must comply with M-15-13, a 2015 memorandum from the White House Office of Management and Budget, and BOD 18-01, a 2017 directive from the Department of Homeland Security, which require federal agencies to enforce HTTPS on their public web services. Much has been done, but there's more yet to do.

pshtt is a collaboration between the Cyber and Infrastructure Security Agency's National Cybersecurity Assessments and Technical Services (NCATS) team and the General Service Administration's 18F team, with contributions from NASA, Lawrence Livermore National Laboratory, and various non-governmental organizations.

Getting started

pshtt can be installed as a module, or run directly from the repository.

Installed as a module

pshtt can be installed directly via pip:

pip install pshtt

It can then be run directly:

pshtt example.com [options]

Running directly

To run the tool locally from the repository, without installing, first install the requirements:

pip install -r requirements.txt

Then run it as a module via python -m:

python -m pshtt.cli example.com [options]

Usage and examples

pshtt [options] DOMAIN...
pshtt [options] INPUT

pshtt dhs.gov
pshtt --output=homeland.csv --debug dhs.gov us-cert.gov usss.gov
pshtt --sorted current-federal.csv

Note: if INPUT ends with .csv, domains will be read from the first column of the CSV. CSV output will always be written to disk (unless --json is specified), defaulting to results.csv.

Options

  -h --help                     Show this message.
  -s --sorted                   Sort output by domain, A-Z.
  -o --output=OUTFILE           Name output file. (Defaults to "results".)
  -j --json                     Get results in JSON. (Defaults to CSV.)
  -m --markdown                 Get results in Markdown. (Defaults to CSV.)
  -d --debug                    Print debug output.
  -u --user-agent=AGENT         Override user agent.
  -t --timeout=TIMEOUT          Override timeout (in seconds).
  -c --cache-third-parties=DIR  Cache third party data, and what directory to cache it in.
  -f --ca-file=PATH             Specify custom CA bundle (PEM format)
Using your own CA bundle

By default, pshtt relies on the root CAs that are trusted in the Mozilla root store. If you work behind a corporate proxy or have your own certificates that aren't publicly trusted, you can specify your own CA bundle:

pshtt --ca-file=/etc/ssl/ca.pem server.internal-location.gov

What's checked?

A domain is checked on its four endpoints:

  • http://
  • http://www
  • https://
  • https://www

Domain and redirect info

The following values are returned in results.csv:

  • Domain - The domain you're scanning!
  • Base Domain - The base domain of Domain. For example, for a Domain of sub.example.com, the Base Domain will be example.com. Usually this is the second-level domain, but pshtt will download and factor in the Public Suffix List when calculating the base domain. (To cache the Public Suffix List, use --suffix-cache as documented above.)
  • Canonical URL - One of the four endpoints described above; a judgment call based on the observed redirect logic of the domain.
  • Live - The domain is "live" if any endpoint is live.
  • HTTPS Live - The domain is "HTTPS live" if any HTTPS endpoint is live.
  • HTTPS Full Connection - The domain is "fully connected" if any HTTPS endpoint is fully connected. A "fully connected" HTTPS endpoint is one with which pshtt could make a full TLS connection.
  • HTTPS Client Auth Required - A domain requires client authentication if any HTTPS endpoint requires it for a full TLS connection.
  • Redirect - The domain is a "redirect domain" if at least one endpoint is a redirect, and all endpoints are either redirects or down.
  • Redirect to - If a domain is a "redirect domain", where does it redirect to?

Landing on HTTPS

  • Valid HTTPS - A domain has "valid HTTPS" if it responds on port 443 at the hostname in its Canonical URL with an unexpired valid certificate for the hostname. This can be true even if the Canonical URL uses HTTP.
  • HTTPS Publicly Trusted - A domain is "publicly trusted" if its canonical endpoint has a publicly trusted certificate.
  • HTTPS Custom Truststore Trusted - A domain is "custom truststore trusted" if its canonical endpoint has a certificate that is trusted by the custom truststore.
  • Defaults to HTTPS - A domain "defaults to HTTPS" if its canonical endpoint uses HTTPS.
  • Downgrades HTTPS - A domain "downgrades HTTPS" if HTTPS is supported in some way, but its canonical HTTPS endpoint immediately redirects internally to HTTP.
  • Strictly Forces HTTPS - This is different than whether a domain "defaults" to HTTPS. A domain "Strictly Forces HTTPS" if one of the HTTPS endpoints is "live", and if both HTTP endpoints are either down or redirect immediately to any HTTPS URI. An HTTP redirect can go to HTTPS on another domain, as long as it's immediate. (A domain with an invalid cert can still be enforcing HTTPS.)

Common errors

  • HTTPS Bad Chain - A domain has a bad chain if either HTTPS endpoint contains a bad chain.
  • HTTPS Bad Hostname - A domain has a bad hostname if either HTTPS endpoint fails hostname validation.
  • HTTPS Expired Cert - A domain has an expired certificate if either HTTPS endpoint has an expired certificate.
  • HTTPS Self-Signed Cert - A domain has a self-signed certificate if either HTTPS endpoint has a self-signed certificate.
  • HTTPS Probably Missing Intermediate Cert - A domain is "probably missing intermediate certificate" if the canonical HTTPS endpoint is probably missing an intermediate certificate.

HSTS

  • HSTS - A domain has HTTP Strict Transport Security enabled if its canonical HTTPS endpoint has HSTS enabled.
  • HSTS Header - This field provides a domain's HSTS header at its canonical endpoint.
  • HSTS Max Age - A domain's HSTS max-age is its canonical endpoint's max-age.
  • HSTS Entire Domain - A domain has HSTS enabled for the entire domain if its root HTTPS endpoint (not the canonical HTTPS endpoint) has HSTS enabled and uses the HSTS includeSubDomains flag.
  • HSTS Preload Ready - A domain is HSTS "preload ready" if its root HTTPS endpoint (not the canonical HTTPS endpoint) has HSTS enabled, has a max-age of at least 18 weeks, and uses the includeSubDomains and preload flag.
  • HSTS Preload Pending - A domain is "preload pending" when it appears in the Chrome preload pending list with the include_subdomains flag equal to true. The intent of pshtt is to make sure that the user is fully protected, so it only counts domains as HSTS preloaded if they are fully HSTS preloaded (meaning that all subdomains are included as well).
  • HSTS Preloaded - A domain is HSTS preloaded if its domain name appears in the Chrome preload list with the include_subdomains flag equal to true, regardless of what header is present on any endpoint. The intent of pshtt is to make sure that the user is fully protected, so it only counts domains as HSTS preloaded if they are fully HSTS preloaded (meaning that all subdomains are included as well).
  • Base Domain HSTS Preloaded - A domain's base domain is HSTS preloaded if its base domain appears in the Chrome preload list with the include_subdomains flag equal to true. This is subtly different from HSTS Entire Domain, which inspects headers on the base domain to see if HSTS is set correctly to encompass the entire zone.

Scoring

These three fields use the previous results to come to high-level conclusions about a domain's behavior.

  • Domain Supports HTTPS - A domain 'Supports HTTPS' when it doesn't downgrade and has valid HTTPS, or when it doesn't downgrade and has a bad chain but not a bad hostname (a bad hostname makes it clear the domain isn't actively attempting to support HTTPS, whereas an incomplete chain is just a mistake.). Domains with a bad chain "support" HTTPS but user-side errors can be expected.
  • Domain Enforces HTTPS - A domain that 'Enforces HTTPS' must 'Support HTTPS' and default to HTTPS. For websites (where Redirect is false) they are allowed to eventually redirect to an https:// URI. For "redirect domains" (domains where the Redirect value is true) they must immediately redirect clients to an https:// URI (even if that URI is on another domain) in order to be said to enforce HTTPS.
  • Domain Uses Strong HSTS - A domain 'Uses Strong HSTS' when the max-age β‰₯ 31536000.

General information

  • IP - The IP for the domain.
  • Server Header - The server header from the response for the domain.
  • Server Version - The server version, as extracted from the server header.
  • HTTPS Cert Chain Length - The certificate chain length for the canonical HTTPS endpoint.
  • Notes - A field where free-form notes about the domain can be stored.

Uncommon errors

  • Unknown Error - A Boolean value indicating whether or not an unexpected exception was encountered when testing the domain. The purpose of this field is to flag any odd websites for further debugging.

Troubleshooting

DNS blackhole / DNS assist

One issue which can occur when running pshtt, particularly for home/residential networks, with standard ISPs is the use of "DNS Assist" features, a.k.a. "DNS Blackholes".

In these environments, you may see inconsistent results from pshtt owing to the fact that your ISP is attempting to detect a request for an unknown site without a DNS record and is redirecting you to a search page for that site. This means that an endpoint which should resolve as "not-alive", will instead resolve as "live", owing to the detection of the live search result page.

If you would like to disable this "feature", several ISPs offer the ability to opt out of this service, and maintain their own instructions for doing so:

Who uses pshtt?

Acknowledgements

This code was modeled after Ben Balter's site-inspector, with significant guidance from Eric Mill.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

License

This project is in the worldwide public domain.

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

pshtt's People

Contributors

afeld avatar amirian28 avatar arcsector avatar bengardiner avatar bengardiner-at-irdeto avatar cablej avatar dav3r avatar dependabot[bot] avatar echudow avatar egyptiankarim avatar ericlaw1979 avatar felddy avatar garrettr avatar h-m-f-t avatar hillaryj avatar ianlee1521 avatar jasonodoom avatar jmorrowomni avatar jsf9k avatar jsha avatar klauern avatar konklone avatar kyleevers avatar mcdonnnj avatar saptaks avatar siccovansas avatar teancom avatar ultiferrago avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pshtt's Issues

Being nicer to hstspreload.org, chromium.googlesource.com, and publicsuffix.org

The pshtt tool now hits 3 external endpoints per-scan:

  1. hstspreload.org for the Preload Pending status
  2. chromium.googlesource.com for the Preloaded status
  3. publicsuffix.org to incorporate Public Suffix checking into calculation of the "parent domain"

I've gotten reports from @lgarron that hstspreload.org is getting hit really hard by what are apparently large batches of pshtt runs. It's possible they're coming via domain-scan, and possible that they're not. And though I thought we did, we actually have no caching facility whatsoever for hstspreload.org. (Though if pshtt's internal batching is being used, it will only hit hstspreload.org once.)

So I think we definitely need to add caching for the hstspreload.org hit.

I also wonder if it's best to incentivize downstream users of pshtt who are doing their own batching to take advantage of the caching features we have built in for (2) and (3), by requiring locations of cached files to be specified for those features to be enabled.

But that would punish folks who are using pshtt's internal batching by adding additional complexity, since internal batching just holds the contents of the 3 requests in-memory the whole time as each domain is being processed.

Any ideas on ways we could help external batchers do the right thing, without making built-in batchers' lives harder?

Output a row to the CSV after each scan

And catch any crashes with a generic exception handler just to close the CSV file, to ensure that any partial scan results still get written out in the case of a crash.

HSTS subdomain detection happening at the root should be more explicit

The Guardian just deployed HSTS, and wanted to know why their score on Secure the News hadn't been updated yet. After a bit of digging, we realized the problem appears to be caused by a bug in pshtt.

Here's the relevant output from pshtt --json theguardian.com:

[
  {
    "Base Domain": "theguardian.com", 
    "Canonical URL": "https://www.theguardian.com", 
    "Defaults to HTTPS": true, 
    "Domain": "theguardian.com", 
    "Downgrades HTTPS": false, 
    "HSTS": true, 
    "HSTS Entire Domain": null, 
    "HSTS Header": "max-age=31536000; includeSubDomains; preload", 
    "HSTS Max Age": 31536000, 
    "HSTS Preload Ready": null, 
    "HSTS Preloaded": false,
  }
]

As you can see, pshtt correctly parsed the max-age from the HSTS header, but failed to correctly parse includeSubDomains or preload, leading to incorrect results for "HSTS Entire Domain" and "HSTS Preload Ready".

'str' object has no attribute 'custom_ca_file

Traceback (most recent call last):
File "/usr/bin/pshtt", line 9, in
load_entry_point('pshtt==0.1.5', 'console_scripts', 'pshtt')()
File "/usr/lib/python2.7/site-packages/pshtt/cli.py", line 54, in main
results = pshtt.inspect_domains(domains, options)
File "/usr/lib/python2.7/site-packages/pshtt/pshtt.py", line 873, in inspect_domains
preload_list = create_preload_list()
File "/usr/lib/python2.7/site-packages/pshtt/pshtt.py", line 800, in create_preload_list
request = requests.get(file_url)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 497, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [Errno 2] No such file or directory
[root@secengfedora25 Desktop]# pshtt www.state.gov
Traceback (most recent call last):
File "/usr/bin/pshtt", line 9, in
load_entry_point('pshtt==0.1.5', 'console_scripts', 'pshtt')()
File "/usr/lib/python2.7/site-packages/pshtt/cli.py", line 54, in main
results = pshtt.inspect_domains(domains, options)
File "/usr/lib/python2.7/site-packages/pshtt/pshtt.py", line 873, in inspect_domains
preload_list = create_preload_list()
File "/usr/lib/python2.7/site-packages/pshtt/pshtt.py", line 800, in create_preload_list
request = requests.get(file_url)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 497, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [Errno 2] No such file or directory

thoughts?
v/r
Q

Change License to MIT?

Would it be possible to change the license for external (non-Federal Employee) contributions to an MIT (or other open source license)?

I've recently encountered an issue where @LLNL isn't able to allow me to contribute as part of my work due to the CC0 provisions around "waiving copyright", which it can't / won't do.

MIT is the license used in, at least, https://github.com/deptofdefense/anet which the lab is more amenable to contributing to.

"Downgrades HTTPS" result inconsistent for economist.com

Secure the News' score for economist.com just changed from a D to a C, indicating that HTTPS is available (that is, the site does not downgrades HTTPS to HTTP). Unfortunately, a quick visit to https://www.economist.com shows that the site is actually still downgrading HTTPS.

Even stranger, it appears that pshtt's output is inconsistent in this regard. Here's the output of three calls to pshtt run in short succession (within a 10-second span or so):

$ pshtt --json economist.com | grep Downgrades
Unexpected other requests exception.
Unexpected SSL protocol (or other) error during retry.
    "Downgrades HTTPS": true, 
$ pshtt --json economist.com | grep Downgrades
Unexpected SSL protocol (or other) error during retry.
    "Downgrades HTTPS": true, 
$ pshtt --json economist.com | grep Downgrades
Unexpected SSL protocol (or other) error during retry.
    "Downgrades HTTPS": false, 

I ran curl a couple of times in quick succession and it appears they might be doing something strange on their server. I mostly got 302 redirects to http, but I got a 200 as well:

$ curl -I https://www.economist.com
HTTP/2.0 302
age:4723
cache-control:max-age = 10800
content-length:0
content-type:text/html; charset=utf-8
date:Fri, 10 Feb 2017 18:48:38 GMT
expires:Sun, 11 Mar 1984 12:00:00 GMT
grace:none
last-modified:Fri, 10 Feb 2017 17:29:55 GMT
location:http://www.economist.com/
server:Economist Web Server
set-cookie:ec_device=false; expires=Sat, 11-Feb-2017 17:29:55 GMT; Max-Age=86400; path=/
set-cookie:rvjourney=a/30.00/a;Domain=.economist.com;Path=/
set-cookie:rvuuid=744b0bd84e575276324fa01093461e0b;Domain=.economist.com;Path=/;Max-Age=2147483647
vary:Cookie
x-cache-hits:29
x-varnish-cache:HIT
set-cookie:visid_incap_121505=HXAyBFaLQWKFm/nqsoBqAgYLnlgAAAAAQUIPAAAAAADwlOfSlUu0EI8GgwL+4Upy; expires=Sat, 10 Feb 2018 10:25:11 GMT; path=/; Domain=.economist.com
set-cookie:nlbi_121505=3yScb414szOLXcfyE5bw2QAAAADAhkTqK1/KRpaaZa4daL8M; path=/; Domain=.economist.com
set-cookie:incap_ses_569_121505=1qhbLsp9ryhozCqHk37lBwYLnlgAAAAAKqVXtKjW07UOcH/BjLdXAw==; path=/; Domain=.economist.com
x-iinfo:10-7083031-7083032 NNNN CT(70 71 0) RT(1486752518432 0) q(0 0 1 -1) r(2 2) U5
x-cdn:Incapsula

$ curl -I https://www.economist.com
HTTP/2.0 200
accept-ranges:bytes
cache-control:max-age = 60
content-encoding:gzip
content-type:text/html; charset=utf-8
date:Fri, 10 Feb 2017 18:48:40 GMT
grace:none
server:Economist Web Server
set-cookie:rvjourney=b/30.00/b;Domain=.economist.com;Path=/
set-cookie:rvuuid=532470bb8ad6db7abc2a970f239b658c;Domain=.economist.com;Path=/;Max-Age=2147483647
vary:accept-encoding
x-cache-hits:30
x-varnish-cache:HIT
set-cookie:visid_incap_121505=TiXLmJnHTUqM19wvN4y1iQgLnlgAAAAAQUIPAAAAAAC/iMw5/f8vr6M44vbQ55YR; expires=Sat, 10 Feb 2018 10:25:11 GMT; path=/; Domain=.economist.com
set-cookie:nlbi_121505=n9eibqePrSd2yIr5E5bw2QAAAACCRx6DVVu8jtBnp1CZbxla; path=/; Domain=.economist.com
set-cookie:incap_ses_569_121505=ME2eQncy5G5lziqHk37lBwgLnlgAAAAAmcBvinRCjhfzcwH/XTQzAA==; path=/; Domain=.economist.com
x-iinfo:3-4725038-4725039 NNNN CT(70 70 0) RT(1486752520224 0) q(0 0 1 -1) r(3 3) U5
x-cdn:Incapsula

I'm not sure if there's anything pshtt can do about this: pshtt's results are inconsistent because the Economist's server responses are inconsistent. Maybe they're A/B testing their HTTPS rollout? 😝

I decided to file the issue anyway because the various Unexpected errors might deserve a closer look as well.

Allow the tool to be used in either a Python 2 or Python 3 environment

This means replacing the use of sslyze with something that can work in Python 3, or separating the sslyze component out in some way (by having the user rely on Docker or pyenv or something).

For what the tool currently does (analysis of HTTP behavior and HTTPS failure behavior), I think sslyze is wholly replace-able. sslyze, or something like it, would be more necessary for analyzing other TLS qualities, like use of ciphers or protocol versions.

Replacing sslyze would mean digging into requests' API more and analyzing certificate validation errors. This would also have the benefit of speeding up use of the tool in development considerably, because these would be included in the transparent requests caching added in #3.

However, we do want to still allow the tool to be used in a Python 2 environment, if at all possible. I believe that's doable.

Potential bug on canonical URL detection

We heard from NASA that in the situation where the HTTP endpoints are off, but the HTTPS endpoints have cert errors, the "canonical URL" is the HTTP version, which is confusing. I haven't verified yet.

cc @egyptiankarim for details or an example hostname

"Downgrades HTTPS" is null for abcnews.go.com

When you access https://abcnews.go.com in a browser, it downgrades the connection to HTTP; however, it does not use an HTTP redirect:

$ curl -I https://abcnews.go.com
HTTP/1.1 200 OK
[..snip..]

Instead, it uses Javascript to redirect:

<script>
        if (window.location.protocol == "https:" && window.parent.location.hostname.indexOf("outbrain") == -1) {
                var _sslurl = window.location.href.replace("https://", "http://");
                window.location.replace(_sslurl);
                window.location.href = _sslurl;
        }
</script>

pshtt's results are confusing. pshtt --json abcnews.go.com returns "Valid HTTPS": true but also returns "Downgrades HTTPS": null. I would expect that if "Valid HTTPS" true, then "Downgrades HTTPS" should be one of true or false, but not null. null only makes sense if the site doesn't have valid HTTPS, because then the question of whether or not it downgrades it is moot.

What do y'all think? Is this a bug?

Note that I am not saying that pshtt should (necessarily) detect downgrades done with Javascript. While it would be possible (and possibly cool) to implement that, you would need to implement browser automation to do it right, which is a significant expansion of the crawler's current design. However, I do think there seems to be a bug in this case, where the result should be true or false (not null).

Rename pshtt_cli to pshtt

The tool should just use its own name as its CLI executable. This shouldn't interfere with the main file being pshtt.py (though that should probably be moved, along with models.py, into a lib/ directory).

Possible (long standing) bugs in logic to review

Some issues we've seen in Pulse that might merit logical changes or tweaks:

So this issue is looking into specific identified edge cases, validating whether they are still a problem, and if so, fixing them.

Error with internal network, self-signed CA

Does this tool only work with publicly facing websites, or should it be able to work within internal networks with a self-signed cert? I am not able to run it inside my corporate network, with the following error:

10:43 $ pshtt server.internal-place.com --json
Traceback (most recent call last):
  File "/usr/bin/pshtt", line 11, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/site-packages/pshtt/cli.py", line 54, in main
    results = pshtt.inspect_domains(domains, options)
  File "/usr/lib/python2.7/site-packages/pshtt/pshtt.py", line 870, in inspect_domains
    preload_list = create_preload_list()
  File "/usr/lib/python2.7/site-packages/pshtt/pshtt.py", line 798, in create_preload_list
    request = requests.get(file_url)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 497, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:661)

Consider factoring in meta refresh tags when calculating redirects

Not necessarily for relaxing compliance standards around using server-side 80->443 redirects, but just to detect a broader swathe of agency behavior.

For example, segurosocial.gov seems to redirect to socialsecurity.gov, but it actually uses a <meta> tag to do the refresh. And further, it redirects to an insecure URL:

curl https://segurosocial.gov
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>SEGUROSOCIAL</TITLE>
<META content="text/html; charset=windows-1252" http-equiv=Content-Type>
<META content="MSHTML 5.00.2314.1000" name=GENERATOR>
<META HTTP-EQUIV="refresh" CONTENT="0; URL=http://www.socialsecurity.gov/espanol">
</HEAD>
<BODY aLink=#ff0000 bgColor=#ffffff link=#000ff text=#000000 vLink=#0000ff>
</BODY></HTML>

However, this doesn't show up in pshtt at all, so there's no way to detect this kind of thing.

It'd be a new thing to look at (and parse) HTML content instead of just HTTP headers and status codes, but if it's simple enough, it may be worth it, and offering a new field or set of fields (separate from the fields there now for server redirects) for downstream tools who care about them.

If given a www. subdomain, remove the www. before scanning

In other words, running pshtt on www.nasa.gov should be the same as running it on nasa.gov. The tool already makes www a special case -- that's a fundamental assumption of the entire tool -- so this is an appropriate way to normalize input.

This is especially relevant when checking fourth-level subdomains, e.g. www.something.example.com, which are unfortunately common and come up frequently in subdomain data.

bfelob.gov blocks user agents with "github" in it

As far as I can tell, it's "github" that does it. "githuc" works fine, "github" doesn't. We should start by asking bfelob.gov (I believe this is the OMB MAX team) to drop this restriction, as it would be nice to put a URL to this project into the user-agent.

Unit tests for HTTP endpoints

We need unit tests to measure behavior, to prevent regressions and to catch issues early. Testing against live endpoints is not recommended, and testing against a localhost HTTP server is not recommended.

I think we want to use something like vcrpy or betamax to capture and replay requests from particular endpoints. We could set up mock endpoints as necessary to capture other expected situations so that they remain recorded for future test runs.

Resolving this issue doesn't require unit testing every single thing, but it does require getting a solid mocked test harness in place with unit tests for at least a few example endpoint configurations, so we can build on that test suite over time.

Uncaught openssl error

Found this when scanning access.dot.gov:

Traceback (most recent call last):
  File "/home/eric/space/pshtt/pshtt_cli", line 73, in <module>
    main()
  File "/home/eric/space/pshtt/pshtt_cli", line 53, in main
    results = pshtt.inspect_domains(domains, options)
  File "/home/eric/space/pshtt/pshtt.py", line 784, in inspect_domains
    results.append(inspect(domain))
  File "/home/eric/space/pshtt/pshtt.py", line 61, in inspect
    basic_check(domain.https)
  File "/home/eric/space/pshtt/pshtt.py", line 140, in basic_check
    req = ping(endpoint.url)
  File "/home/eric/space/pshtt/pshtt.py", line 123, in ping
    timeout=TIMEOUT
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/api.py", line 71, in get
    return request('get', url, params=params, **kwargs)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/api.py", line 57, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/sessions.py", line 585, in send
    r = adapter.send(request, **kwargs)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/adapters.py", line 403, in send
    timeout=timeout
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 578, in urlopen
    chunked=chunked)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 362, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/httplib.py", line 1057, in request
    self._send_request(method, url, body, headers)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/httplib.py", line 1097, in _send_request
    self.endheaders(body)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/httplib.py", line 1053, in endheaders
    self._send_output(message_body)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/httplib.py", line 897, in _send_output
    self.send(msg)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/httplib.py", line 873, in send
    self.sock.sendall(data)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/packages/urllib3/contrib/pyopenssl.py", line 253, in sendall
    sent = self._send_until_done(data[total_sent:total_sent + SSL_WRITE_BLOCKSIZE])
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/packages/urllib3/contrib/pyopenssl.py", line 242, in _send_until_done
    return self.connection.send(data)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1271, in send
    self._raise_ssl_error(self._ssl, result)
  File "/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1182, in _raise_ssl_error
    raise SysCallError(errno, errorcode.get(errno))
OpenSSL.SSL.SysCallError: (104, 'ECONNRESET')

Debian 8 - PIP error trying to install pshtt

If I use pip or pip3 this errors out on me when trying to install pshtt.

Running setup.py (path:/tmp/pip-build-wdme68eo/nassl/setup.py) egg_info for package nassl
Traceback (most recent call last):
File "", line 17, in
File "/tmp/pip-build-wdme68eo/nassl/setup.py", line 67, in
OPENSSL_LIB_INSTALL_PATH = OPENSSL_INSTALL_PATH_DICT[CURRENT_PLATFORM]
KeyError: None
Complete output from command python setup.py egg_info:
Traceback (most recent call last):

File "", line 17, in

File "/tmp/pip-build-wdme68eo/nassl/setup.py", line 67, in

OPENSSL_LIB_INSTALL_PATH = OPENSSL_INSTALL_PATH_DICT[CURRENT_PLATFORM]

KeyError: None


Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip-build-wdme68eo/nassl
Exception information:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 122, in main
status = self.run(options, args)
File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 290, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "/usr/lib/python3/dist-packages/pip/req.py", line 1230, in prepare_files
req_to_install.run_egg_info()
File "/usr/lib/python3/dist-packages/pip/req.py", line 326, in run_egg_info
command_desc='python setup.py egg_info')
File "/usr/lib/python3/dist-packages/pip/util.py", line 716, in call_subprocess
% (command_desc, proc.returncode, cwd))
pip.exceptions.InstallationError: Command python setup.py egg_info failed with error code 1 in /tmp/pip-build-wdme68eo/nassl

Cache HSTS preload pending list

No reason not to cache the HSTS preload pending call, since it's a third party call that downstream clients performing batching would reasonably be fine relying on a snapshot of during batched processing.

Filing another issue to address the cache/flag glut that this would create.

slate.com is a lie

At first glance, slate.com appears to be available over HTTPS:

$ curl -I https://slate.com | head -n 1
HTTP/1.1 200 OK

pshtt agrees, and as a result so does Secure the News. Unfortunately, upon closer inspection, Slate's secure homepage is a lie:

screenshot_2017-02-07_00-12-28

I'm filing this issue simply to bring it to everyone's attention, and potentially start a discussion about what we might do to fix it. Ultimately, I don't think there are any easy answers here: it's hard to automate "this site looks the way it should: y/n". It may simply be the case that projects that use pshtt, like Secure the News, which I hoped would be fully automated and self-updating, will still require a degree of manual intervention to correct for false positive such as this one.

Other ideas or feedback greatly appreciated!

Factor in redirects when calculating HSTS

I mentioned some of the set up to this problem in the TTS #https-partner-support Slack channel, but the long and the short of the issue is that in https://github.com/dhs-ncats/pshtt/blob/00ff246f40acbea185d478d838c7fcd6652b9aa8/pshtt/pshtt.py#L80 the check is being done on "not the final https URL" as best I can tell (possibly it's actually randomly doing this...).

So for instance, as I'm looking at lc.llnl.gov, the redirects are:

And only the last hop (final URL) has the HSTS bits set up. This is the case for both lc.llnl.gov and lc-idm.llnl.gov, but for mylc.llnl.gov (which pshtt is currently saying is good for HSTS) the only difference I can find is in the middle hop:

Although, it also is not showing HSTS on the middle hop.

I'm still continuing to dig into this, but thought I would get the ticket opened while I work (since I still can't contribute directly yet).

Crash on AttributeError or LocationValueError

When scanning ecommerce.barclays.com I see two different possible crashes, seemingly at random:

~/pshtt-docker/pshtt$ docker run --rm -it --name pshtt -v $(pwd):/data -e USER_ID=1042 -e GROUP_ID=1042 pshtt/cli --output=april1.csv --debug --timeout=15 --preload-cache=PRELOAD ecommerce.barclays.com
Fetching Chrome preload list from source...
Starting new HTTPS connection (1): chromium.googlesource.com
"GET /chromium/src/net/+/master/http/transport_security_state_static.json?format=TEXT HTTP/1.1" 200 None
Caching preload list at PRELOAD
Fetching Chrome pending preload list...
Starting new HTTPS connection (1): hstspreload.org
"GET /api/v2/pending HTTP/1.1" 200 325150
Pinging http://ecommerce.barclays.com...
Starting new HTTP connection (1): ecommerce.barclays.com
"GET / HTTP/1.1" 302 213
Starting new HTTP connection (1): ecommerce.barclays.com
"GET / HTTP/1.1" 302 193
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/src/app/pshtt/cli.py", line 73, in <module>
    main()
  File "/usr/src/app/pshtt/cli.py", line 54, in main
    results = pshtt.inspect_domains(domains, options)
  File "pshtt/pshtt.py", line 892, in inspect_domains
    results.append(inspect(domain))
  File "pshtt/pshtt.py", line 61, in inspect
    basic_check(domain.http)
  File "pshtt/pshtt.py", line 216, in basic_check
    ultimate_req = ping(endpoint.url, allow_redirects=True, verify=False)
  File "pshtt/pshtt.py", line 131, in ping
    timeout=TIMEOUT
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 71, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 57, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 606, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 179, in resolve_redirects
    **adapter_kwargs
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 585, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 370, in send
    conn = self.get_connection(request.url, proxies)
  File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 279, in get_connection
    conn = self.poolmanager.connection_from_url(url)
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.py", line 143, in connection_from_url
    return self.connection_from_host(u.host, port=u.port, scheme=u.scheme)
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.py", line 114, in connection_from_host
    raise LocationValueError("No host specified.")
requests.packages.urllib3.exceptions.LocationValueError: No host specified.
~/pshtt-docker/pshtt$ docker run --rm -it --name pshtt -v $(pwd):/data -e USER_ID=1042 -e GROUP_ID=1042 pshtt/cli --output=april1.csv --debug --timeout=15 --preload-cache=PRELOAD ecommerce.barclays.com
Fetching Chrome preload list from source...
Starting new HTTPS connection (1): chromium.googlesource.com
"GET /chromium/src/net/+/master/http/transport_security_state_static.json?format=TEXT HTTP/1.1" 200 None
Caching preload list at PRELOAD
Fetching Chrome pending preload list...
Starting new HTTPS connection (1): hstspreload.org
"GET /api/v2/pending HTTP/1.1" 200 325150
Pinging http://ecommerce.barclays.com...
Starting new HTTP connection (1): ecommerce.barclays.com
"GET / HTTP/1.1" 302 193
Starting new HTTP connection (1): ecommerce.barclays.com
"GET / HTTP/1.1" 302 213
Starting new HTTPS connection (1): ecommerce.barcap.com
"GET / HTTP/1.1" 301 243
"GET /online HTTP/1.1" 301 244
"GET /online/ HTTP/1.1" 302 291
Starting new HTTPS connection (1): live.barcap.com
"GET /UAB/S/ecom/logon/1/default?returnUrl=https%3a%2f%2fecommerce.barcap.com%2fonline%2f HTTP/1.1" 200 10250
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/src/app/pshtt/cli.py", line 73, in <module>
    main()
  File "/usr/src/app/pshtt/cli.py", line 54, in main
    results = pshtt.inspect_domains(domains, options)
  File "pshtt/pshtt.py", line 892, in inspect_domains
    results.append(inspect(domain))
  File "pshtt/pshtt.py", line 61, in inspect
    basic_check(domain.http)
  File "pshtt/pshtt.py", line 234, in basic_check
    base_immediate = parent_domain_for(subdomain_immediate)
  File "pshtt/pshtt.py", line 736, in parent_domain_for
    return str.join(".", hostname.split(".")[-2:])
AttributeError: 'NoneType' object has no attribute 'split'

Incorrect "Valid HTTPS" results for amazonaws.com?

amazonaws.com doesn't support HTTPS, but its HTTP version redirects eventually to https://aws.amazon.com. I believe this should be labeled as "Valid HTTPS" according to the rules in the README, but when I run pshtt I get "Valid HTTPS": false.

$ curl -IL http://amazonaws.com/
HTTP/1.1 301 Moved Permanently
Date: Mon, 13 Mar 2017 19:57:29 GMT
Server: Server
Location: http://aws.amazon.com
nnCoection: close
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 301 Moved Permanently
Date: Mon, 13 Mar 2017 19:57:29 GMT
Server: Server
Location: https://aws.amazon.com/
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 200 OK
Server: Server
Date: Mon, 13 Mar 2017 19:57:30 GMT
Content-Type: text/html;charset=UTF-8
Content-Length: 366144
Connection: keep-alive
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
x-amz-id-1: V93TTB078C28FJYNEV6Y
Last-Modified: Fri, 10 Mar 2017 20:07:16 GMT
Vary: Accept-Encoding,User-Agent
Set-Cookie: aws_lang=en; Domain=.amazon.com; Path=/

[
{
"Base Domain": "amazonaws.com",
"Canonical URL": "http://amazonaws.com",
"Defaults to HTTPS": false,
"Domain": "amazonaws.com",
"Domain Enforces HTTPS": null,
"Domain Supports HTTPS": null,
"Domain Uses Strong HSTS": false,
"Downgrades HTTPS": false,
"HSTS": false,
"HSTS Entire Domain": null,
"HSTS Header": null,
"HSTS Max Age": null,
"HSTS Preload Pending": false,
"HSTS Preload Ready": false,
"HSTS Preloaded": false,
"HTTPS Bad Chain": null,
"HTTPS Bad Hostname": null,
"HTTPS Expired Cert": null,
"Live": true,
"Redirect": true,
"Redirect To": "https://aws.amazon.com/",
"Strictly Forces HTTPS": false,
"Valid HTTPS": false,
"endpoints": {
"http": {
"headers": {
"Cneonction": "close",
"Content-Length": "229",
"Content-Type": "text/html; charset=iso-8859-1",
"Date": "Mon, 13 Mar 2017 19:55:51 GMT",
"Location": "http://aws.amazon.com",
"Server": "Server"
},
"live": true,
"redirect": true,
"redirect_eventually_to": "https://aws.amazon.com/",
"redirect_eventually_to_external": true,
"redirect_eventually_to_http": false,
"redirect_eventually_to_https": true,
"redirect_eventually_to_subdomain": false,
"redirect_immediately_to": "http://aws.amazon.com",
"redirect_immediately_to_external": true,
"redirect_immediately_to_http": true,
"redirect_immediately_to_https": false,
"redirect_immediately_to_subdomain": false,
"redirect_immediately_to_www": null,
"status": 301,
"url": "http://amazonaws.com"
},
"https": {
"headers": {},
"hsts": false,
"hsts_all_subdomains": null,
"hsts_header": null,
"hsts_max_age": null,
"hsts_preload": null,
"https_bad_chain": null,
"https_bad_hostname": null,
"https_expired_cert": null,
"https_valid": null,
"live": false,
"redirect": null,
"redirect_eventually_to": null,
"redirect_eventually_to_external": null,
"redirect_eventually_to_http": null,
"redirect_eventually_to_https": null,
"redirect_eventually_to_subdomain": null,
"redirect_immediately_to": null,
"redirect_immediately_to_external": null,
"redirect_immediately_to_http": null,
"redirect_immediately_to_https": null,
"redirect_immediately_to_subdomain": null,
"redirect_immediately_to_www": null,
"status": null,
"url": "https://amazonaws.com"
},
"httpswww": {
"headers": {},
"hsts": false,
"hsts_all_subdomains": null,
"hsts_header": null,
"hsts_max_age": null,
"hsts_preload": null,
"https_bad_chain": null,
"https_bad_hostname": null,
"https_expired_cert": null,
"https_valid": null,
"live": false,
"redirect": null,
"redirect_eventually_to": null,
"redirect_eventually_to_external": null,
"redirect_eventually_to_http": null,
"redirect_eventually_to_https": null,
"redirect_eventually_to_subdomain": null,
"redirect_immediately_to": null,
"redirect_immediately_to_external": null,
"redirect_immediately_to_http": null,
"redirect_immediately_to_https": null,
"redirect_immediately_to_subdomain": null,
"redirect_immediately_to_www": null,
"status": null,
"url": "https://www.amazonaws.com"
},
"httpwww": {
"headers": {
"Cneonction": "close",
"Content-Length": "229",
"Content-Type": "text/html; charset=iso-8859-1",
"Date": "Mon, 13 Mar 2017 19:55:52 GMT",
"Location": "http://aws.amazon.com",
"Server": "Server"
},
"live": true,
"redirect": true,
"redirect_eventually_to": "https://aws.amazon.com/",
"redirect_eventually_to_external": true,
"redirect_eventually_to_http": false,
"redirect_eventually_to_https": true,
"redirect_eventually_to_subdomain": false,
"redirect_immediately_to": "http://aws.amazon.com",
"redirect_immediately_to_external": true,
"redirect_immediately_to_http": true,
"redirect_immediately_to_https": false,
"redirect_immediately_to_subdomain": false,
"redirect_immediately_to_www": null,
"status": 301,
"url": "http://www.amazonaws.com"
}
}
}
]

Version information varies widely

Determining which version of pshtt one is running can be difficult. For example:

$ pshtt  --version
v0.0.1
$ pip show pshtt
Name: pshtt
Version: 0.2.3
Summary: Scan websites for HTTPS deployment best practices
Home-page: https://www.dhs.gov/cyber-incident-response
Author: Department of Homeland Security, National Cybersecurity Assessments and Technical Services team
Author-email: [email protected]
License: License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Location: /usr/local/lib/python3.4/site-packages
Requires: wget, requests, requests-cache, pyopenssl, docopt, publicsuffix, sslyze, pytablewriter
$ grep version= setup.py
    version='0.2.2',

The --version flag displays v0.0.1, pip thinks it's 0.2.3, and setup.py thinks it's 0.2.2. Let's resolve this madness. We should have a single version submitted everywhere. I'll open a PR to standardize on 0.2.3 in all three locations immediately.

Move some policy decisions into pshtt

Right now, pulse.cio.gov makes some additional policy decisions on top of pshtt data to prepare its published values:

This logic should be moved down into pshtt so that others don't have to port the above logic to their own demesnes. And users who want to make different policy decisions will still have the direct measurements already in pshtt to do that with.

Use more formal/complete HSTS parser

Ideally, there would already be a good tested Python library for doing this.

For reference, here's the parser I wrote in Ruby (with careful attention to the HSTS RFC):
https://github.com/benbalter/site-inspector/blob/erics-mode/lib/site-inspector.rb#L28-L84

Fixing this issue should include unit tests validating that the parser meets the HSTS spec. An example of that in a previous (Ruby) project can be seen here:

https://github.com/benbalter/site-inspector/blob/erics-mode/test/test_hsts.rb

Exception if redirect with no 'Location' header

Working on compliance for lc-idm.llnl.gov I ran into an issue where if the server returns a 3xx (redirect) status, but NOT a 'Location' header, it causes an exception. Adding this check works around that.

The exception is thrown by: https://github.com/dhs-ncats/pshtt/blob/master/pshtt/pshtt.py#L206

A possible fix I have locally is to update the endpoint.redirect determination here: https://github.com/dhs-ncats/pshtt/blob/master/pshtt/pshtt.py#L199 to make sure that req.headers.get('Location') != None

requirements.txt error

Relates to a comment I left on one of the recent pull requests, but the requirements.txt file actually has an error if you try to install it:

$ pip install -r requirements.txt
Invalid requirement: 'pyopenssl=17.2.0'
= is not a valid operator. Did you mean == ?

I would recommend changing it to simply pyopenssl without the pinned version.

/cc @konklone

Combine caching of HSTS preload, pending, and PSL into one command

I think it's unlikely that a user would ever want to cache only the PSL but not the HSTS preload list, or vice versa. I think that in general, users of any of those options likely want to use all of them -- in other words, they're comfortable using the same frozen state of 3rd party sources during the extent of a bulk scan.

Keeping the individual options is fine, but I think it'd be useful to have a general option to cache third party resources (something like --third-party-cache), and to give it a directory to stuff each one into at predictable non-conflicting filenames.

This also lets users doing batching avoid getting suddenly hit with a new third party reference added on version updates -- which, depending on the size of the batching, could be quite costly. So this would free up the project to add new resources without worrying about harming downstream users.

Mysterious connection failures on some domains when not verifying certificates

For a number of domains:

  • https://www.cupcao.gov
  • https://www.mitigationcommission.gov
  • https://www.lmrcouncil.gov

When making a requests.get() call with just the URL and default args, you get a certificate verification error:

In [68]: requests.get("https://www.cupcao.gov")

...

/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
    475         except (_SSLError, _HTTPError) as e:
    476             if isinstance(e, _SSLError):
--> 477                 raise SSLError(e, request=request)
    478             elif isinstance(e, ReadTimeoutError):
    479                 raise ReadTimeout(e, request=request)

SSLError: hostname 'www.cupcao.gov' doesn't match 'www.usbr.gov'

But then if you disable certificate validation, you get an ECONNRESET error that seems to crash earlier in the underlying request process workflow:

In [71]: requests.get("https://www.mitigationcommission.gov", verify=False)

...

/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
    451 
    452         except (ProtocolError, socket.error) as err:
--> 453             raise ConnectionError(err, request=request)
    454 
    455         except MaxRetryError as e:

ConnectionError: ('Connection aborted.', error("(104, 'ECONNRESET')",))

This suggests to me that disabling certificate validation doesn't just cause requests to not validate the certificate when presented, but to actually change its approach to the TLS handshake in some way that causes a connection reset with this particular subset of domains.

cc @alex in case he has any ideas what's happening here.

Drop specific point version pins in requirements.txt

We're falling behind, and currently are effectively pinned to cryptography 1.9, and are not able to go to 2.1. I think it's worth considering a dependency strategy that relies on semantic versioning, since this comes up a lot.

Outputs should include the values when HTTPS {Bad Chain, Bad Hostname, Expired Cert} are TRUE

The first question to be asked when HTTPS {Bad Chain, Bad Hostname, Expired Cert} is TRUE will be "What's the value that the scanner saw?" pshtt CSV/JSON outputs should include three additional fields (which should be blank when the corresponding field is FALSE).

Presuming we stick with sslyze:

  • HTTPS Bad Chain Value: return the value for "Certificate Chain Received"
  • HTTPS Bad Hostname Value: return the value for "Hostname Validation"
  • HTTPS Expired Certificate Value: return value for "Not After"

Crash on encoding error

Got a requests.exceptions.ChunkedEncodingError during scanning that caused a stack trace. Should be handled more gracefully.

Traceback (most recent call last):
  File "/opt/install/pyenv/versions/2.7.11/bin/pshtt", line 9, in <module>
    load_entry_point('pshtt==0.1.3', 'console_scripts', 'pshtt')()
  File "/opt/scan/pshtt/pshtt/cli.py", line 54, in main
    results = pshtt.inspect_domains(domains, options)
  File "/opt/scan/pshtt/pshtt/pshtt.py", line 871, in inspect_domains
    preload_pending = fetch_preload_pending()
  File "/opt/scan/pshtt/pshtt/pshtt.py", line 770, in fetch_preload_pending
    request = requests.get(pending_url)
  File "/opt/install/pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/
api.py", line 71, in get
    return request('get', url, params=params, **kwargs)
  File "/opt/install/pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/
api.py", line 57, in request
    return session.request(method=method, url=url, **kwargs)
File "/opt/install/pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/
sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/install/pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/sessions.py", line 606, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "/opt/install/pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/sessions.py", line 179, in resolve_redirects
    **adapter_kwargs
  File "/opt/install/pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/sessions.py", line 617, in send
    r.content
  File "/opt/install/pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/models.py", line 741, in content
    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
 File "/opt/install/pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/models.py", line 667, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: error(104, 'Connection reset by peer')", error(104, 'Connection reset by peer'))

ChunkedEncodingError for youth.gov

Youth.gov has a bad hostname cert on both HTTPS endpoints, and its HTTP endpoints work in browser, and in libcurl (site-inspector) -- but fail in requests with a ChunkedEncodingError:

In [85]: requests.get("http://www.youth.gov", headers={'User-agent': "Mozilla/5.0 (X11; Linux x
    ...: 86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36"})

...

/home/eric/.pyenv/versions/2.7.11/lib/python2.7/site-packages/requests/models.pyc in generate()
    665                         yield chunk
    666                 except ProtocolError as e:
--> 667                     raise ChunkedEncodingError(e)
    668                 except DecodeError as e:
    669                     raise ContentDecodingError(e)

ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

It looks like something under the hood with urllib3 is handling odd server behavior in a non-graceful way, that browsers and libcurl seem to manage okay with.

AttributeError thrown when there is a hostname validation error

Hit an unhandled exception while scanning cbc.ca:

$ pshtt cbc.ca --json
Certificate did not match expected hostname: cbc.ca. Certificate: {'subjectAltName': [('DNS', '*.akamaihd.net'), ('DNS', '*.akamaihd-staging.net'), ('DNS', '*.akamaized-staging.net'), ('DNS', '*.akamaized.net'), ('DNS', 'a248.e.akamai.net')], 'subject': ((('commonName', u'a248.e.akamai.net'),),)}
Traceback (most recent call last):
  File "/usr/local/bin/pshtt", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/site-packages/pshtt/cli.py", line 54, in main
    results = pshtt.inspect_domains(domains, options)
  File "/usr/local/lib/python2.7/site-packages/pshtt/pshtt.py", line 882, in inspect_domains
    results.append(inspect(domain))
  File "/usr/local/lib/python2.7/site-packages/pshtt/pshtt.py", line 63, in inspect
    basic_check(domain.https)
  File "/usr/local/lib/python2.7/site-packages/pshtt/pshtt.py", line 173, in basic_check
    https_check(endpoint)
  File "/usr/local/lib/python2.7/site-packages/pshtt/pshtt.py", line 335, in https_check
    except nassl.x509_certificate.X509HostnameValidationError:
AttributeError: 'module' object has no attribute 'x509_certificate'

Make into an installable PyPi package

For general use as a tool, users should be able to pip install pshtt and then be able to run the CLI right away, without cloning the repo and setting up developer dependencies.

Logic issue with is_redirect method

The way is_redirect is currently written, a domain will get flagged as a redirect if all endpoints are down or otherwise code 400-ing. For example, imagine domain.tld, which only has its https endpoint up and happens to be returning for a 404 for its index. That will return true for is_redirect, which is confusing (and happening a bunch where I am).

I'm curious if that whole logic block can be simplified to check and see if any of the end points are returning a code 300. That entire block starting on line 527 might reasonably be reduced to something like:

redirection_codes = range(300, 309)

return (https.status in redirection_codes) or (http.status in redirection_codes) or (httpswww.status in redirection_codes) or (httpwww.status in redirection_codes)

I'll submit a pull request and we can hash it out there.

Distinguish incomplete chains from untrusted roots

At least in a naΓ―ve fashion, flagging likely incomplete chains from untrusted roots should be feasible by counting the number of certificates returned in "Certificate Chain Received" from sslyze. requests may also return something that could be useful.

I recallopenssl returns a 'depth' value, which, when a site is less than 2 deep, is a strong indication intermediate certs are not served, making the chain incomplete. If depth<2 and the certificate is not trusted in the Mozilla store, this seems to indicate an incomplete chain, while depth>=2 seems to indicate an untrusted root.

Installation on Raspberry Pi 3... big headache

I'm trying to install pshtt on the Raspberry Pi 3, after numerous errors which I have fixed over time, I am only stuck with the following one. Building the wheel for nassl (necessary package for pshtt) simply does not succeed, probably because nassl does not support ARM processors.

Is there a way to replace nassl by something else during the pshtt installation?

The problem is mainly here actually:

/tmp/pip-build-nngih_0d/nassl/bin/openssl/linux32/libssl.a: error adding symbols: File in wrong format
collect2: error

Full log:

Running setup.py install for nassl ... error
Complete output from command /usr/local/bin/python3.5 -u -c "import setuptools, tokenize;file='/tmp/pip-build-nngih_0d/nassl/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-i5j52ciu-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.linux-armv7l-3.5
creating build/lib.linux-armv7l-3.5/nassl
copying nassl/init.py -> build/lib.linux-armv7l-3.5/nassl
copying nassl/ssl_client.py -> build/lib.linux-armv7l-3.5/nassl
copying nassl/debug_ssl_client.py -> build/lib.linux-armv7l-3.5/nassl
copying nassl/ocsp_response.py -> build/lib.linux-armv7l-3.5/nassl
running build_ext
building 'nassl._nassl' extension
creating build/temp.linux-armv7l-3.5
creating build/temp.linux-armv7l-3.5/nassl
creating build/temp.linux-armv7l-3.5/nassl/_nassl
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/nassl.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/nassl_SSL_CTX.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_SSL_CTX.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/nassl_SSL.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_SSL.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/nassl_X509.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_X509.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/nassl_errors.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_errors.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/nassl_BIO.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_BIO.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/nassl_X509_EXTENSION.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_X509_EXTENSION.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/nassl_X509_NAME_ENTRY.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_X509_NAME_ENTRY.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/nassl_SSL_SESSION.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_SSL_SESSION.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/openssl_utils.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/openssl_utils.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/nassl_OCSP_RESPONSE.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_OCSP_RESPONSE.o -Wall
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ibin/openssl/include -Inassl/_nassl -I/usr/local/include/python3.5m -c nassl/_nassl/python_utils.c -o build/temp.linux-armv7l-3.5/nassl/_nassl/python_utils.o -Wall
gcc -pthread -shared build/temp.linux-armv7l-3.5/nassl/_nassl/nassl.o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_SSL_CTX.o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_SSL.o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_X509.o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_errors.o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_BIO.o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_X509_EXTENSION.o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_X509_NAME_ENTRY.o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_SSL_SESSION.o build/temp.linux-armv7l-3.5/nassl/_nassl/openssl_utils.o build/temp.linux-armv7l-3.5/nassl/_nassl/nassl_OCSP_RESPONSE.o build/temp.linux-armv7l-3.5/nassl/_nassl/python_utils.o /tmp/pip-build-nngih_0d/nassl/bin/openssl/linux32/libssl.a /tmp/pip-build-nngih_0d/nassl/bin/openssl/linux32/libcrypto.a /tmp/pip-build-nngih_0d/nassl/bin/zlib/linux32/libz.a -o build/lib.linux-armv7l-3.5/nassl/_nassl.cpython-35m-arm-linux-gnueabihf.so
/usr/bin/ld: /tmp/pip-build-nngih_0d/nassl/bin/openssl/linux32/libssl.a(s2_meth.o): Relocations in generic ELF (EM: 3)
/usr/bin/ld: /tmp/pip-build-nngih_0d/nassl/bin/openssl/linux32/libssl.a(s2_meth.o): Relocations in generic ELF (EM: 3)
/usr/bin/ld: /tmp/pip-build-nngih_0d/nassl/bin/openssl/linux32/libssl.a(s2_meth.o): Relocations in generic ELF (EM: 3)
/tmp/pip-build-nngih_0d/nassl/bin/openssl/linux32/libssl.a: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
error: command 'gcc' failed with exit status 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.