Git Product home page Git Product logo

https-everywhere-checker's People

Contributors

graingert avatar grimreaper avatar hiviah avatar jsha avatar pde avatar thorsten-sick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

https-everywhere-checker's Issues

String formatting not working properly.

rsz_error1 1

ERROR %(filename)s: Not enough tests (%(actual_count)d vs %(needed_count)d) for %(rule)s

One possible solution is to replace .format(problem) with %problem. So that

logging.error("%(filename)s: Not enough tests (%(actual_count)d vs %(needed_count)d) for %(exclusion)s".format(problem))

becomes

logging.error("%(filename)s: Not enough tests (%(actual_count)d vs %(needed_count)d) for %(exclusion)s"%problem)

rsz_1corrected

Self-signed root certificate causes failure

Some HTTPS sites include the self-signed root cert in the certificate chain they provide. This is always useless: that root cert is either in the trust store or not, and including it doesn't help prove the identity of the site. However, it's harmless except for wasted bytes and browsers don't treat it as an error.

Fetching such a site with the checker produces an error, though:

cd https-everywhere
./fetch-test.sh src/chrome/content/rules/Matt_Wilcox.net.xml
...
ERROR src/chrome/content/rules/Matt_Wilcox.net.xml: Fetch error: http://www.mattwilcox.net/ => https://www.mattwilcox.net/: (60, 'SSL certificate problem: self signed certificate in certificate chain')

openssl s_client -connect www.mattwilcox.net:443 -showcerts | grep '[si]:'
...
 3 s:/C=IL/O=StartCom Ltd./CN=StartCom Certification Authority G2
   i:/C=IL/O=StartCom Ltd./CN=StartCom Certification Authority G2

Fetching with command-line curl works fine, surprisingly:

$ curl -I https://mattwilcox.net
HTTP/1.1 200 OK

Target matching semantics are subtly different than HTTPS Everywhere

Specifically, in HTTPS Everywhere, right-wildcards don't match arbitrarily deep (unlike left wildcards, which do). Specifically, google.* will match google.com but not google.com.au. However, I think https-everywhere-checker does match arbitrarily deep. We should fix this to match the HTTPS Everywhere behavior, which is intentional.

Up until recently this was a little ambiguous on https://www.eff.org/https-everywhere/rulesets, so I've updated it to clarify.

PEP8 cleanup

Hi

My IDE complains about some PEP8 issues (especially TABs). If you want I can fix these PEP8 issues and replace the Tabs with spaces and create a PR. What do you think ?
It would make my IDE happy....

Mysterious failures on CDG_Commerce rule

python2.7 https-everywhere-checker/src/https_everywhere_checker/check_rules.py https-everywhere-checker/manual.checker.config src/chrome/content/rules/CDG_Commerce.com.xml 

ERROR src/chrome/content/rules/CDG_Commerce.com.xml: Fetch error: http://cdgcommerce.com/ => https://cdgcommerce.com/: (28, 'Operation timed out after 15001 milliseconds with 0 bytes received')
ERROR src/chrome/content/rules/CDG_Commerce.com.xml: Fetch error: http://myapp.cdgcommerce.com/ => https://myapp.cdgcommerce.com/: (28, 'Operation timed out after 15001 milliseconds with 0 bytes received')
ERROR src/chrome/content/rules/CDG_Commerce.com.xml: Fetch error: http://secure.cdgcommerce.com/ => https://secure.cdgcommerce.com/: (28, 'Operation timed out after 15001 milliseconds with 0 bytes received')
ERROR src/chrome/content/rules/CDG_Commerce.com.xml: Fetch error: http://www.cdgcommerce.com/ => https://www.cdgcommerce.com/: (28, 'Operation timed out after 15001 milliseconds with 0 bytes received')

These domains fetch fine from command line curl and in the browser.

Python 3

There are a few bits remaining which are Python 2 syntax.

> python3 -m pyflakes src/https_everywhere_checker/*.py setup.py 
src/https_everywhere_checker/check_rules.py:90:20: invalid syntax
                        except Exception, e:
                                        ^
src/https_everywhere_checker/gvgen.py:125:60: invalid syntax
                                print "/* (newLink): Cannot get the destination edge */"
                                                                                       ^
src/https_everywhere_checker/http_client.py:425:22: invalid syntax
        except BaseException,e: #this will trap KeyboardInterrupt as well
                            ^
src/https_everywhere_checker/metrics.py:21: undefined name 'NotImlementedError'
src/https_everywhere_checker/metrics.py:76: undefined name 'basestring'
src/https_everywhere_checker/rules.py:151:20: invalid syntax
                        except Exception, e:
                                        ^
src/https_everywhere_checker/rule_trie.py:123:11: invalid syntax
                print " "*offset,
                        ^
setup.py:3: 'os' imported but unused
setup.py:4: 'sys' imported but unused

License and fork sync

As you are no doubt aware, there is a lively fork at https://github.com/EFForg/https-everywhere/tree/master/test/rules , but it is not installable as they have deleted the __init__.py.

They found a license problem EFForg/https-everywhere#6466 and I have raised stricaud/gvgen#11 to see if it can be solved properly.

EFF is using the same license, so it should be fine to pull their changes back into this repo. If you dont intend to maintain this, maybe we can fix the EFF fork to be installable and ask them to publish a new version periodically, perhaps using a new name.

Checker incorrectly reports missing Location for bitmex.com

Steps to reproduce:

  1. Check out https-everywhere repo and init submodules
  2. exec python2.7 https-everywhere-checker/src/https_everywhere_checker/check_rules.py https-everywhere-checker/manual.checker.config src/chrome/content/rules/BitMEX.com.xml

Expected results:

Success

Actual results:

ERROR src/chrome/content/rules/BitMEX.com.xml: Fetch error: http://bitmex.com/ => https://bitmex.com/: Redirect for 'https://bitmex.com/' missing Location

Fetch both the HTTP and HTTPS version with the curl command line tool shows Location headers at each hop of both.

Allow certain failures for 'from' part of rewrite

Right now the checker fetches both the HTTP version and the rewritten HTTPS version of a URL, and fails if either one fails.

There are a small number of cases in which we don't mind if the HTTP version fails. Specifically, a connection failure or timeout on the HTTP version are more or less okay. These failures prevent us from measuring a delta between the HTTP and HTTPS versions, but as long as the rule is simple (rewrite everything on HTTP to the same path on HTTPS), it's okay to assume resources are equivalent.

Does not catch problematic certificate chains

This one is inspired by EFForg/https-everywhere#4280 (comment) and EFForg/https-everywhere#2585 I figured I'd try to run the checker against a ruleset which would cause problems in the wild, and which I would have expected to be caught by the checker. Rulesets with targets serving broken chains show up in pull requests every now and then, so certainly it would be nice if the existing test framework would catch those, but unfortunately it seems that it does not.

Steps to reproduce:

  • Run the checker on a ruleset with target hosts serving bad chains (such as this one).

Expected results:

  • An output of the form ERROR rules/Drugcom.de.xml: Fetch error: http://www.drugcom.de/ => https://www.drugcom.de/: (6, 'SSL certificate problem: unable to get local issuer certificate')

Actual results:

  • Test passes

Require minimum test URLs per target host

For ruleset coverage testing, we currently check that each rule and exclusion has a number of covering test URLs. We should add logic that checks that each target host has at least one test URL, and left-wildcard target hosts have at least three, and right-wildcard target hosts have at least ten.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.