Git Product home page Git Product logo

http-observatory's Introduction

Mozilla HTTP Observatory

The Mozilla HTTP Observatory is a set of tools to analyze your website and inform you if you are utilizing the many available methods to secure it.

It is split into three projects:

Scanning sites with the HTTP Observatory

Sites can be scanned using:

Development

Prerequisites

  • Python 3.11
  • Git
  • pip

Notes

These instructions assume that you have a working Python3.11 development environment with pip installed and capable of building requirements, which may require installing an additional python OS package (-dev, -devel).

# Clone the code
$ git clone https://github.com/mozilla/http-observatory.git
$ cd http-observatory
# Install poetry
$ pip install poetry
# Install the project dependencies and scripts
$ poetry install
# Activate the virtual environment
$ poetry shell
# Install the pre-commit hooks
$ pre-commit install
# copy and edit the config file
$ cp httpobs/conf/httpobs.conf ~/.httpobs.conf
$ nano ~/.httpobs.conf
# start the dev server
$ httpobs-server

Running tests

$ nosetests httpobs/tests --with-coverage --cover-package=httpobs

Running a scan from the local codebase, without DB, for continuous integration

# Install the HTTP Observatory
$ git clone https://github.com/mozilla/http-observatory.git
$ cd http-observatory
$ pip install poetry
$ poetry install

Using the scanner function calls

>>> from httpobs.scanner import scan
>>> scan('observatory.mozilla.org')  # a scan with default options
>>> scan('observatory.mozilla.org',  # all the custom options
         http_port=8080,             # http server runs on port 8080
         https_port=8443,            # https server runs on port 8443
         path='/foo/bar',            # don't scan /, instead scan /foo/bar
         cookies={'foo': 'bar'},     # set the "foo" cookie to "bar"
         headers={'X-Foo': 'bar'},   # send an X-Foo: bar HTTP header
         verify=False)               # treat self-signed certs as valid for tests like HSTS

The same, but with the local CLI

$ poetry shell
$ httpobs-local-scan --http-port 8080 --https-port 8443 --path '/foo/bar' \
    --cookies '{"foo": "bar"}' --headers '{"X-Foo": "bar"}' --no-verify mozilla.org

Authors

  • April King

License

  • Mozilla Public License Version 2.0

http-observatory's People

Contributors

allanjude avatar amuntner avatar april avatar async-costelo avatar calebhearth avatar cptwafflez avatar craigfrancis avatar danielhartnell avatar dependabot[bot] avatar dpox avatar floatingatoll avatar fmeum avatar fox-rose avatar gdestuynder avatar gene1wood avatar jarondl avatar joshag avatar kingthorin avatar konarkmodi avatar lchski avatar leomca avatar mbbx6spp avatar mozilla-github-standards avatar thecontrarycat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

http-observatory's Issues

Support running against localhost

In order to test changes to cookie settings, CSP and other things quickly, it would be really good if the httpobs could be run against a localhost environment. At the moment, this doesnt work.

Issue quality badges

Sites should be able to list quality badges on their home or github pages. For example:

observatory A

observatory F

"cross-origin-resource-sharing-implemented-with" does not indicate when XML is involved

While diagnosing a production site, we ended up having to line-by-line trace the httpobs source code to understand a cross-origin-resource-sharing-implemented-with-universal error. It turned out that there was a crossdomain.xml present on the site being scanned, something we did not know was present, nor that httpobs checks for.

Please consider using a slightly different error code when the presence of an XML file downgrades the CORS policy, e.g. cross-origin-resource-sharing-XML-implemented-with, with the accompanying documentation, so that it's more readily apparent that an XML file was found and defined such a policy. (Or otherwise indicating, somehow, that an XML file was taken into account when evaluating the site.)

Duplicate headers result in 'cannot be recognized'

This particular error could maybe be handled separately from a completely unrecognized header. It is the result of a reverse proxy and the actual web server both setting the header to nosniff in this case (though e.g. X-Frame-Options, X-XSS-Protection and similar checks seem to be affected by this too).

        "x-content-type-options": {
            "expectation": "x-content-type-options-nosniff",
            "name": "x-content-type-options",
            "output": {
                "data": "nosniff, nosniff"
            },
            "pass": false,
            "result": "x-content-type-options-header-invalid",
            "score_description": "X-Content-Type-Options header cannot be recognized",
            "score_modifier": -5
        },

Improve "Site down" experience in general (e.g. 404s)

Hello,
I was testing one of the domains whose IIS had a HTTP redirection when you navigate to the root (i.e when you browse to www.example.com). In spirit of not showing the IIS landing page, then a HTTP 301 was implemented.
However, during the deployment the redirection was not set and the server throws a HTTP 404, I got the message indicating that the site was down from Observatory.

I have a couple of questions here:

  • Is that Ok that the tool cannot scan the domain when a HTTP 404 is presented? I wonder if the cipher suites test can/should be performed at least?
  • Should the HTTP headers analysis be performed? Eventually if you do a HTTP HEAD then you should have a picture of what headers are exposed even when there is a HTTP 404

I appreciate your feedback!.

sri-not-implemented-and-external-scripts-not-loaded-securely has the wrong score

https://github.com/mozilla/http-observatory/blob/master/httpobs/scanner/grader/grade.py#L237

Currently the observatory scores https: subresources w/o SRI at -20 and http: subresources at -50.

These scores don't seem to reflect a reasonable security assessment. We can debate the value of SRI, but because browsers (or at least any browser that supports SRI) don't load active mixed content, an http: subresource is basically a broken link, which is more secure than the https: subresource, which is a potential security issue (at least within the SRI threat model). And of course if the main page is http: then SRI doesn't help. At minimum these scores should be the same and arguably the http: one should be a less serious demerit

Adding HttpOnly results in worse score

[ -30] Session cookie set without using the HttpOnly flag

turns into

[ -40] Session cookie set without using the Secure flag or set over http

on otherwise unchanged server once the PHP-Option

session.cookie_httponly

is turned on.

I would have expected the score for using none of HTTPS, HttpOnly, Secure to be worse than (or equal to) the one for using HttpOnly but not the other two.

Specify custom user agent upon scan

Allow the ability for a user to either 1) select from a dropdown of different user agents ( aka mobile, IE, Firefox, chrome etc) or 2) having a text box to specify a specific one upon scan.

This way, the user can try and test out different versions of a website (mobile vs desktop for example).

/etc/httpobs.conf for api

Hi,

Reading docker-compose.yml, we see that you give a special password for the api process.

Unfortunately, the httpobs.conf does not contain a line that shows how/where to configure the access to the database for the api process.

Could you, please, tell me how to do this.

Thanks in advance.

cED

Separate out 'data' from the scans into a separate artifacts table

  1. We still want to be able to run queries based on how the internet is doing
  2. If we store a list of cookies and XML files and everything with each test, then the database could grow quite large

We should have an artifacts table in the database that stores the most recent version of each of these files.

design fault!

This test is based on the basic precondition that generally everything should be transfered over ssl.

But why should I trust external sources when I visit a site? And why is this not tested???

When I visit a site of e.g. a news-paper, I get the main page from the company server and many other components from different companys, e.g. counters, analytics, api, font and tools provider.
But why should I trust them in the same way I trust the news-paper company? And why should I trust a company I do not know that I use there service?

In the past we have seen more than one, when a shared used resource has a bug, many servers are infected and the guys who use this bug have many servers to eat from! So a reason for a shared resource of fonts, api's, tools, ets. is not existent.

e.g.: This test gives us/me negaive points because I do not activate HSTS, but it don't have a look to the sources I use for all of my content. Yes, we/I could activate HSTS, but this is not a security feature! On the other site, when you open a page of our company, you will get only components from our subnet! We do not use any external resources (we use a local mirror for many of this resources where we aktivate all changes manually after a human review), and this we do of security reasons!

When you want to judge the security of a site you should not go the mainstream! To do this, you should trust noone and this means you should not trust shared resources first!

SRI listed as not in use

On my results I get a warning: "Subresource Integrity (SRI) not implemented, but all external scripts are loaded over https"

https://observatory.mozilla.org/analyze.html?host=scotthelme.co.uk

SRI is deployed on my site with the exception of one <link> element that fetches content from the Google Fonts API. You can't use SRI with the Fonts API as "the Fonts API serves a stylesheet generated for the specific user agent making the request".

https://developers.google.com/fonts/docs/technical_considerations#what_is_the_google_fonts_api_serving

If this is the reason I'm being marked as not having SRI then perhaps an exception needs to be made for <link> elements with a href attribute pointing to fonts.googleapis.com.

Return third party scan results in API

It'd be nice if one could retrieve the third party scan results that have been done by the observatory from the API as well.
They're available on the website, but apparently not from the API.

problem starting tls scanner

According to the directions to start the scanner we should run the following command:
HTTPOBS_DATABASE_USER="httpobsscanner" HTTPOBS_DATABASE_PASS="password" /opt/http-observatory/httpobs/scripts/httpobs-scan-worker

Upon inspecting the postgres log:
sudo tail /var/log/postgresql/postgresql-9.5-main.log

2016-09-04 05:46:45 PDT [19623-1] httpobsscanner@http_observatory LOG: could not receive data from client: Connection reset by peer

The API starts without issue but just the tls scanner i am having trouble with.

Any ideas how this might be troubleshooted?

Check for contribute.json seems unwarranted

Most of the checks in HTTP Observatory are reasonable security best-practices that alter the functioning of the site. However the check for a contributing.json file seems markedly different; it doesn't represent any kind of accepted best practice and the presence or absence of such a file doesn't materially alter the functioning of the site in any way.

Although I agree that the goal of making information about the provenance of a site available is laudable, the approach taken by contributing.json follows a design pattern of relegating metadata to some default-invisible location. This has been tried in many contexts on the web and had repeatedly failed (in the sense that tools which intended to consume such data found that the data quality was so low that they could not reliably do so). Unless the data is visible in the site UI people will forget to update it when reality changes, and the file will become defacto untrustworthy. Scoring people on the presence of the file without any way to validate that the information contained therein is actually meaningful does not seem like it will help with this problem. Therefore I suggest that this check is removed from the tool and that the approach to surfacing this (potentially useful) data reevaluated, taking into account the fact that people are much more likely to keep it up to date if it is actually visible on their site.

Suggestion: Legend

It would be nice to have a legend somewhere explaining the letter grade and maybe a link to our guidelines on how to improve it. I realize each test is linked to something - but most people are not going to want to click on 10 different links to understand.

select_site_header never actually works

The function httpobs.database.select_site_headers calls the database, and asks for public_headers, private_headers, cookies FROM sites. But I have not found where these get filled. I believe that they always remain NULL, and that this does nothing. The function always returns {'headers: None, cookies: None}, except when the DB is empty, in which case it returns {}, breaking the API, or when there is no DB and it crashes.

You may ask why I care, and the reason is that I'm trying to run the scanner without a DB, and the retriever calls this function and errors out. That is the one thing stopping me from running a DB-less scan.

PS. The code could have been simpler with headers = row.get('public_headers', {}) in three occurrences here.

Interpret and use TLS Observatory scores

The TLS observatory API currently returns a 0-100 score in addition to other things.
It would be nice if the score was integrated in the overall http observatory score so that sites can get a unified rating

error during setup of scanner

Tested on:
Ubuntu 16.04 (fresh install)

Following the documentation, performing the following command yields some errors
pip3 install -r requirements.txt

Exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 1006, in check_if_exists self.satisfied_by = pkg_resources.get_distribution(str(no_marker)) File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/__init__.py", line 535, in get_distribution dist = get_provider(dist) File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/__init__.py", line 415, in get_provider return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0] File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/__init__.py", line 695, in find raise VersionConflict(dist, req) pkg_resources.VersionConflict: (beautifulsoup4 4.4.1 (/usr/lib/python3/dist-packages), Requirement.parse('beautifulsoup4==4.5.1'))

To fix I had to run the following command:
sudo pip3 install --upgrade pip

Option to not follow redirects

When scanning http://site/ 301 --A-> https://site/ 303 --B-> https://offsite/ -->C, it would be extremely helpful if we could ask the scanner to stop at B so we can validate HSTS and HPKP headers for our site, rather than for the target site C.

patch to Access-Control-Allow-Origin

http-observatory/httpobs/scanner/analyzer/misc.py

+# http://www.w3.org/TR/access-control/#access-control-allow-origin-response-header
+ if 'null' in domains:
+          output['result'] = 'cross-origin-resource-sharing-is-null'
+
if '*' in domains:
            output['result'] = 'cross-origin-resource-sharing-implemented-with-universal-access'

http-observatory/httpobs/scanner/grader/grade.py

# Cross-origin resource sharing
+
+# http://www.w3.org/TR/access-control/#access-control-allow-origin-response-header
+'cross-origin-resource-sharing-is-null': {
+  'description': 'Content is not visible via cross-origin resource sharing (CORS) files or headers',
+  'modifier': +5,
+},

Using Observatory behind corporate proxy ?

Hi,

First, thank you for this great project !

I want to use http-observatory for internal corporate tests. So, I just installed the tool but when starting it, the code from utils.py try to retrieving the Google HSTS Preload list from "https://chromium.googlesource.com/chromium/src/net/+/master/http/transport_security_state_static.json?format=TEXT". As we are behind a proxy for Internet access, i was trying to set up ENV variables but it's not working. I used http_proxy and https_proxy pointing to a local cntml proxy.
I also tried to modify utils.py in order to use a proxy configuration, without succes (python noob writing).

proxiesList = {
  'https': 'http://localhost:3128'
}

# Download the Google HSTS Preload List
try:
    print('Retrieving the Google HSTS Preload list', file=sys.stderr)
    r = b64decode(requests.get(HSTS_URL,proxies=proxiesList,timeout=5).text).decode('utf-8').split('\n')

What am I missing ?

Check SRI on link elements

Currently it appears that the HTTP Observatory checks for SRI on <script> elements only. The specification also states that SRI is valid on <link> elements as well (section 3.4).

I'm not sure how this would be incorporated into the observatory or how it would be affect scoring.

Add support for scanning more than just the root document

currently http-observatory is limited to scanning a domain; this assumes that all pages served on the domain have an equal security profile, which isn't always the case.

it would be useful to have the ability to specify a full url to scan.

Official (supported) single-shot Python API

We (the Servo project) are looking to directly run the observatory in CI for our infrastructure code, but without complex dependencies.

I tried out the (public website for the) observatory when it first came out and am very excited about this project. Our site currently has a terrible score, so we'd like to add the observatory to our CI to automatically monitor our progress and prevent regressions. Since we only need to test one site, we would like to run the observatory without Postgres, Redis, and Celery, i.e. load + scan + analyze one site in a "single-shot" mode, via a Python API. Docker helps with the dependency problem, but IMO should still be unnecessary for what could be done in pure Python.

@jarondl has been working on implementing this for us by hooking into observatory internals.
However, I think this could also help a lot of other teams automate this in CI and drive adoption.
How would you feel about supporting an official Python API for this, so that the observatory can be used as a regular Python module locally? We can help upstream improvements for this if this is something you're interested in.

Access-Control-Allow-Origin

The purpose of a compliance assessment is to check whether implementation A is sound and complete with respect to the standard specification B. Although a formal correctness proof would be necessary, in practical cases such as this one, where compliance is not a matter of life or death, we can skip the proof. In you case, however, not only you skip the proof, but you take the liberty of misreading the standard and ignore the issue repeatedly. In fact, this is the third (and last) time that I am reporting this!

Normative reference
https://www.w3.org/TR/access-control/#access-control-allow-origin-response-header

5.1 Access-Control-Allow-Origin Response Header

The Access-Control-Allow-Origin header indicates whether a resource can be shared based by
returning the value of the Origin request header, "*", or "null" in the response. ABNF:

Access-Control-Allow-Origin = "Access-Control-Allow-Origin" ":" origin-list-or-null | "*"
In practice the origin-list-or-null production is more constrained. Rather than allowing a 
space-separated list of origins, it is either a single origin or the string "null".

When no resources are shared, the specification says explicitly that the value of Access-Control-Allow-Origin is the string "null".

The mozilla compliance test ignores the above, and returns a penalty value of -50,
that is, you get a penalty if you implement the standard correctly!

This patch corrects the compliance test, and returns a value of +5 instead.

http-observatory/httpobs/scanner/analyzer/misc.py

+# http://www.w3.org/TR/access-control/#access-control-allow-origin-response-header
+ if 'null' in domains:
+          output['result'] = 'cross-origin-resource-sharing-is-null'
+
if '*' in domains:
            output['result'] = 'cross-origin-resource-sharing-implemented-with-universal-access'

http-observatory/httpobs/scanner/grader/grade.py

# Cross-origin resource sharing
+
+# http://www.w3.org/TR/access-control/#access-control-allow-origin-response-header
+'cross-origin-resource-sharing-is-null': {
+  'description': 'Content is not visible via cross-origin resource sharing (CORS) files or headers',
+  'modifier': +5,
+},

Check for Referrer Policy

As discussed on Twitter (https://twitter.com/estark37/status/770875377682001920), it would be cool if the observatory checked for the presence of Referrer Policy.

<meta name="referrer"> is widely supported already. The observatory could also check for the Referrer-Policy HTTP header, though it's not shipped in any browser yet.

There were a couple idea for how to score referrer policies: you could simply make it a bonus for having it but no penalty for not having it (like HPKP); or you could give a penalty for a policy of unsafe-url and a bonus for having any other policy; or you could choose a reasonable baseline policy like origin-when-cross-origin and penalize for anything less restrictive than that.

But basically I think the important thing is to mention Referrer Policy at all regardless of the actual scoring, because I suspect many devs don't know it exists.

Don't penalize 'unsafe-inline' if hash or nonce source is used, in CSP test

Hello,

After analyzing blackfire.io https://observatory.mozilla.org/analyze.html?host=blackfire.io , the report tells me:

Content Security Policy (CSP) implemented, but allows 'unsafe-inline' inside script-src

This is true in case of CSP level1. However, since level2, it's recommended to add unsafe-inline to the script-src directive when you're using a nonce or a hash. In this particular case, the unsafe-inline is discarded, except for browser that do not support CSP level2.

What about taking care of this?

No warning when HSTS includeSubDomains is miscapitalized

Per hstspreload.appspot.com, on a domain with the incorrect capitalization:

Status: people-mozilla.org is not preloaded.
Eligibility: people-mozilla.org is eligible for preloading, although we recommend fixing the following warnings:

⚠️ Warning: Non-standard capitalization of includeSubDomains
Header contains the token `includeSubdomains`. The recommended capitalization is `includeSubDomains`.

The observatory score report for the domain gives no indication that the header is capitalized incorrectly, especially in conjuction with ; preload.

CSP scoring could be improved significantly

Scoring CSP is a very complex problem, and the simple scoring heuristic implemented here could use some improvements.

A policy such as:

default-src 'none'; script-src https:

allowing an attacker to simply inject:

<script src="https://attacker.com/evil.js"></script>

gets the maximum score for CSP. This is bad.

Furthermore, in that policy object-src is missing, allowing bypasses via Flash (see Making CSP great again! - Slide 13 - please read the whole presentation to see other potential issues, such as CDNs hosting Angular or sites with JSONP).

Finally, 'strict-dynamic' is not supported, and common fallback mechanisms are not understood (for instance, 'unsafe-inline' when used in combination with a nonce or a hash).

Cannot specify a port for specific environment testing

Some servers, especially a localhost environment, will be presenting content on a port other than 80 and 443.

It would be great to specify a port for httpobs to test against, for example, httpPort=8999&httpsPort=8443

no progress on Retrieving the Google HSTS Preload list

I am trying to start the scanner using this command:
HTTPOBS_DATABASE_USER="httpobsscanner" HTTPOBS_DATABASE_PASS="password" /opt/http-observatory/httpobs/scripts/httpobs-scan-worker

Initially I got this error

python3: can't open file 'httpobs/scanner/main.py': [Errno 2] No such file or directory

I updated the file httpobs-scan-worker to have it reference the full path to main.py, like so:
python3 /opt/http-observatory/httpobs/scanner/main.py

On second attempt to start the scanner, the above error went away but now I am getting stuck on the screen - it never times out but i waited long enough with no progress - same result on ubuntu 15.10 and 16.04 (vmware images).

Retrieving the Google HSTS Preload list
Retrieving the Google HSTS Preload list

However, I am able to read the chromium link directly from python cli:

>>> import requests
>>> requests.get('https://chromium.googlesource.com/chromium/src/net/+/master/http/transport_security_state_static.json')
<Response [200]>

I am now stuck without direction and spent hours to get this far.. any assistance to get me passed this would be much appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.