Git Product home page Git Product logo

onionscan's Introduction

What is OnionScan?

Build Status Go Report Card

OnionScan is a free and open source tool for investigating the Dark Web. For all the amazing technological innovations in the anonymity and privacy space, there is always a constant threat that has no effective technological patch - human error.

Whether it is operational security leaks or software misconfiguration - most often times the attacks on anonymity don't come from breaking the underlying systems, but from ourselves.

OnionScan has two primary goals:

  • We want to help operators of hidden services find and fix operational security issues with their services. We want to help them detect misconfigurations and we want to inspire a new generation of anonymity engineering projects to help make the world a more private place.

  • Secondly we want to help researchers and investigators monitor and track Dark Web sites. In fact we want to make this as easy as possible. Not because we agree with the goals and motives of every investigation force out there - most often we don't. But by making these kinds of investigations easy, we hope to create a powerful incentive for new anonymity technology (see goal #1)

Installing

A Note on Dependencies

OnionScan requires either Go 1.6 or 1.7.

In order to install OnionScan you will need the following dependencies not provided by the core go standard library:

  • golang.org/x/net/proxy - For the Tor SOCKS Proxy connection.
  • golang.org/x/net/crypto - For PGP parsing
  • golang.org/x/net/html - For HTML parsing
  • github.com/rwcarlsen/goexif - For EXIF data extraction.
  • github.com/HouzuoGuo/tiedot/db - For crawl database.

See the wiki for guidance.

Grab with go get

go get github.com/s-rah/onionscan

Compile/Run from git cloned source

Once you have cloned the repository into somewhere that go can find it you can run go install github.com/s-rah/onionscan and then run the binary in $GOPATH/bin/onionscan.

Alternatively, you can just do go run github.com/s-rah/onionscan.go to run without compiling.

Quick Start

For a simple report detailing the high, medium and low risk areas found with a hidden service:

onionscan notarealhiddenservice.onion

The most interesting output comes from the verbose option:

onionscan --verbose notarealhiddenservice.onion

There is also a JSON output, if you want to integrate with another program or application:

onionscan --jsonReport notarealhiddenservice.onion

If you would like to use a proxy server listening on something other that 127.0.0.1:9050, then you can use the --torProxyAddress flag:

onionscan --torProxyAddress=127.0.0.1:9150 notarealhiddenservice.onion

More detailed documentation on usage can be found in doc.

What is scanned for?

A list of privacy and security problems which are detected by OnionScan can be found here.

You can also directly configure the types of scanning that onionscan does using the scans parameter.

./bin/onionscan --scans web notarealhiddenservice.onion

Running the OnionScan Correlation Lab

If you are a researcher monitoring multiple sites you will definitely want to use the OnionScan Correlation Lab - a web interface hosted by OnionScan that allows you to discover, search and tag different identity correlations.

You can find a full guide on the OnionScan correlation lab here.

onionscan's People

Contributors

antoniaklja avatar dballard avatar josephgregg avatar korons avatar laanwj avatar mapmeld avatar neilzone avatar s-rah avatar sainslie avatar unkaktus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

onionscan's Issues

SSH-key checking

A lot of hidden services (close to 3% in my last big scan) are configured so that the .onion address serves all ports.

If SSH is being served, you can grab the key fingerprint and sometimes uncloak the HS by checking it against Shodan or your own database of scans.

Example code (in Python) to do this is here: https://github.com/0x27/ssh_keyscanner

getting socks connections errors after it was working previously

bin/onionscan -torProxyAddress 127.0.0.1:9050 -verbose -jsonReport http://legionhiden4dqh4.onion/
2016/04/11 20:48:30 Starting Scan of http://legionhiden4dqh4.onion/
2016/04/11 20:48:30 This might take a few minutes..

2016/04/11 20:48:30 Error running scanner: Get http://http://legionhiden4dqh4.onion/: Can't complete SOCKS5 connection.

tor message

20:48:30 [NOTICE] Application asked to connect to port 0. Refusing. [18
x duplicates hidden]

18 SocksPort 127.0.0.1:9050 # Default: Bind to localhost:9050 for local
x connections.

tor is being started by arm
tor is latest git pull on latest yosemite

Consider CARONTE's ideas for location leaks

There's a 2015 research paper that presented a tool, CARONTE, that attempts to find a number of configuration issues that can cause location leaks:
https://software.imdea.org/~juanca/papers/caronte_ccs15.pdf

In particular, they talk about:

  • Finding "broken" links on onion pages (Section 3.3.1).
  • Finding unique strings on onion pages, and then google for them (Section 3.3.2).
  • Finding reused X.509 certificates (Section 3.3.3).

The techniques turned out to be mildly successful:

We apply CARONTE to 1,974 hidden services, fully recovering the IP address of 100 (5%) of them.

Maybe some of it is helpful to onionscan!

Flag Unintentional Leaks in HTML Comments

Sometimes we find identifiers like Bitcoin addresses commented out in code - we still extract these because we do a very simple regex across the page snapshot. OnionScan should tell the user when we have found something in plain site verses when we have discovered it unintentionally.

This will likely involve filtering out the text as part of the spider crawl and storing it in Page - perhaps also filtering out comments into their own section too - that way we don't need the entire page snapshot.

Output should be via SimpleReport.

Database Support

As a follow-on from #3, it would be interesting to have an option to connect results and scanning up to a database.

Correlating data across multiple onions can lead to some amusing results. Such as finding that a bunch of onions all share the same SSH fingerprint (hosting providers...) where the data by itself is not much help, but when you correlate it all, you end up being able to deanonymize a whole cluster of onions in one go because one of the hosts on the server is leaking something interesting.

This could be connected up to external API's such as Shodan somehow on a schedule with re-checking of things like SSH keys, etc every now and then.

Automated Fingerprinting

At the moment any fingerprinting would have to be done by an extra process (or by hand) - would be nice to automate some of this with in the scanner - most likely requiring a database of prescanned hidden services.

Design 3rd Party Database Lookups

There have been discussions and suggestions in #3 and #6 for using external services such as Tineye and Shodan for comparing fingerprints collected. As hidden services may be scanned by onionscan, possibly even by the owners of hidden services, to prevent the chances of correlation between scanners and sites, all IP accesses using onionscan should have the option of being accessed through an anonymising service.

Refactor Standard Page Scan

Expanding OnionScan to check the whole site to, for example, find encryption keys (#20) - means refactoring protocols/standard_page_scan.go to:

  • Extract each check out into it's own module (we need to be able to turn these on and off - and testing)
  • We need to do this once per page, some of the current code assumes we will only do this check once (like grabbing page title)
  • We need a way to ensure we don't scan pages, images or directories multiple times (see also: #22)

Service detection for non standard ports

While running OnionScan today, I noted an unusual edge where a service is not on the expected port.

2016/08/11 09:57:12 ERROR: Get http://xxxxxxxxxxxxxxxx.onion/images: malformed HTTP response "SSH-2.0-OpenSSH_7.2"

Effectively, OnionScan tries to run HTTP fingerprinting on an SSH service due to the SSH daemon being bound to port 80.

The suggested enhancement that I can think of is to do some rudimentary banner parsing on connect and engage the correct fingerprinting engine based on the response.

Getting an error trying to run onionscan

I'm running Ubuntu GNOME 15.04 with golang package installed.

~/onionscan$ ./onionscan.go
./onionscan.go: line 1: package: command not found
./onionscan.go: line 3: syntax error near unexpected token newline' ./onionscan.go: line 3:import ('

Load domains to check from a file

I've heard from more than a couple of people already who have wrapped OnionScan in python/perl/bash just to scan more than 1 domain.

Let's make this easy and provide a -f option. Each domain on a single line.

Internationalization for SimpleReport

To be able to give a report in the user's native language, could upload the strings for the SimpleReport to e.g. transifex to get them translated into various languages, then import these translations periodically and map them on output when this is requested - or based on user's locale.

Better Timeout Policies

Some of the new improvements e.g. spider/ and bitcoin changes have dramatically increased the timing expectations for certain sites. For example scanning for onion peers in bitcoin takes a rather long time and a user configuring that and a small timeout should likely be warned that it is a bad idea.

On top of that, we need to put some thought into why timeouts exist and how they can be helpful. Some thoughts:

  • Some sites are really really slow but for large scans we want to wait for them.
  • If the first protocol scan succeeds we probably want to ignore timeouts and keep checking since we know at least something is there.
  • As said above, some protocol scans are really slow and can potentially contradict and confuse user specified timeouts.
  • Currently timeouts are applied at two levels - on a web page basis and on a scan basis as a whole - this makes actually predicting how long a scan will take is pretty difficult.

Update README.md / Docs for Onionscan 0.2

OnionScan 0.2 will add a large number of scans and features. These need to be documented somewhere - main features in the README and more detailed usage examples somewhere new (to be defined...maybe readthedocs or something)

Look for cryptocurrency private keys?

May make sense to look for Bitcoin (etc) private keys in Wallet Import Format.

These are also represented in base58 but have different prefix bytes and sizes, so would be a relatively small change to deanonymization/check_bitcoin_addresses.go.

This would be another CRITICAL (or at least HIGH) risk for the simple report.

Scan Hanging.

I've noted that some sites seem to trigger a scan hang. I have set the time out (-timeout 1) but the scan still seems to hang right after the mod_status check.

onionscan -timeout 1 -depth 0 -verbose hafacwgmrntoolno.onion
2016/06/14 11:43:56 Starting Scan of hafacwgmrntoolno.onion
2016/06/14 11:43:56 This might take a few minutes..

2016/06/14 11:43:56 Checking hafacwgmrntoolno.onion http(80)
2016/06/14 11:43:56 Found potential service on http(80)
2016/06/14 11:43:59 HTTP response headers:
2016/06/14 11:43:59 CONTENT-TYPE : text/html
2016/06/14 11:43:59 VARY : Accept-Encoding
2016/06/14 11:43:59 X-FRAME-OPTIONS : sameorigin
2016/06/14 11:43:59 X-XSS-PROTECTION : 1; mode=block
2016/06/14 11:43:59 ACCEPT-RANGES : bytes
2016/06/14 11:43:59 X-CONTENT-TYPE-OPTIONS : nosniff
2016/06/14 11:43:59 DATE : Tue, 14 Jun 2016 18:46:51 GMT
2016/06/14 11:43:59 SERVER : Apache
2016/06/14 11:43:59 LAST-MODIFIED : Wed, 02 Sep 2015 09:26:18 GMT
2016/06/14 11:43:59 ETAG : "ac27-51ec04248c771-gzip"
2016/06/14 11:44:00 Apache mod_status Not Exposed...Good!

any idea, this might be the occurring? I'm running a new version I just grabbed from onion scan.

Add Support for i2p

OnionScan should support connecting to .i2p eepsites - they can suffer the same opsec issues as tor hidden services.

Split Resources into Chunks for Storing

Currently we drop resources larger than 2MB because of limitations with the database - regardless of which backing store we end up with in the future, we are likely always going to need to chunk blobs.

We should

  • Configure a max object size for downloaded resources (set probably around 10MB for default)

  • Split resources smaller than this into <1MB chunks and store them in the DB in a way that can be put back together later e.g.

      resource:{
     url net.URL
     data []byte
     nextChunk int //id to resource chunk containing next part of the data
     }
    

Protocol Detection/Correction

I have occasionally observed some onions serving traffic on the wrong port e.g. SSH on port 25 or SSH on port 5900 - these behaviors could be intentional or misconfigurations. Probably need to refactor the flow to Check Port -> Detect Protocol -> Fingerprint -> Scan.

Add Port Scanning & Eventing Base

To nicely implement #3 & #7 we should first have OnionScan scan various ports, and if a successful connection is made then fire off a goroutine to go run the associated tests. This would then allow us to more easily build out other protocols.

Extract Page Title

It would be nice to be able to have some grasp over what the page is about - <tiitle> should be good enough, and is another possible fingerprinting mechanism.

Configuration for Attacks instead of Hard Coding

It would be nice to specify new attacks as JSON files that can be interpreted by the scanner. E.g. something like:

{
    "name":"Apache mod_status is Accessible",
    "location":"/server-status",
    "requirements": [
        {"equals": ["http-status-code", 200]},
        {"contains":["contents","Server Version: (.*)</dt>"]}
    ]
}

With extra reporting options and such, this would clean the reporting code up.

Depth Limiting on Directory Scanning

While following directory listings can be fruitful it can also be fairly expensive in terms of time and bandwidth. Provide a -d option to limit how deep we scan (default for all with value 0)

can't 'go run onionscan.go'

user@ubuntu:~/go/src/github.com/s-rah/onionscan$ go run onionscan.go

golang.org/x/crypto/ed25519

/home/user/onion/src/golang.org/x/crypto/ed25519/ed25519.go:54:66: error: reference to undefined identifier ‘crypto.SignerOpts’
func (priv PrivateKey) Sign(rand io.Reader, message []byte, opts crypto.SignerOpts) (signature []byte, err error) {
^
lool@ubuntu:~/go/src/github.com/s-rah/onionscan$

Fingerprint Images on Site

Take the sha1/md5/whatever of each image on the front page of the site. This will feed into the fingerprint later on.

Usecases:

  • Detecting onioncloner copies
  • #6
  • General fingerprinting of resources.

Scan Each Page of the Site

Most sites aren't single page, we should crawl the site to find issues e.g. a PGP key being on the /contact page. Depends on #32

Analytics Framework

Currently SimpleReport is the only kind of post-analytics we do. This can definitely be expanded.

Some examples of post-processing steps we likely want in the core onionscan base:

  • Use the new Database functionality to compare reports to automatically find correlations.
  • Automatically identify new onion services.
  • Perform site-specific follow on actions e.g. if we know that a site stores all user profiles under /user?name={username} we can use that information to prepare an even more detailed report.

Any analytics performed by OnionScan should be modular and configurable. It might make sense for OnionScan to accept a json formatted config file detailing the exact flow that it should undertake.

At the same time, we should try to minimize the amount of code dedicated to analytics that is best performed by other dedicated applications (one example that comes to mind is stylometry that requires ML models and databases of known samples - we likely do not want to support that).

Extensible Messaging and Presence Protocol

I feel it might be helpful to add support for popular Extensible Messaging and Presence Protocol servers.

Fingerprinting popular Extensible Messaging and Presence Protocol servers might bear a significant amount of useful data and I don't feel it's at all an unrealistic scenario. It's not uncommon for public-facing Extensible Messaging and Presence Protocol servers to also cater for access through non-public TLD special-use suffix'.

I feel it'll also help to consider collecting and parsing Extensible Messaging and Presence Protocol servers-side X.509 credentials as it bears potential for it to contain useful or identification information of other host-names or IP addresses and ascertain if other Extensible Messaging and Presence Protocol servers exist or establish potential co-hosting of other services.

It also might aid identification and correlation of public-facing Extensible Messaging and Presence Protocol servers or other services built upon prior assumptions.

Suggest a tool & Online Service

Unlike other darknet hosts, I'm running ethical/lawful website for years
and appearence of this tool makes me... nervous.

The good news is I don't use Apache, and the server is tor-exclusive.

Do you have a plan to release a online web tool that anyone can check easily(like Qualys SSLLabs)?

Also, what do you recommend for "image anonymizer"(change time & strip metadata) for Linux?

Better IRC Scanning

We currently only check known IRC ports we don't do much in the way of confirmation. We should connect (and in the case of IRCS pull the X.509 certificate).

We may want to consider snapshoting the IRC Welcome Message and Channel List also.

onionscan doesn't compile on 32-bit (dependency issue)

One of the dependencies, HouzuoGuo/tiedot explicitly only works on x86_64 right now, and fails to build on ARM32:

tiedot should be compiled/run on x86-64 systems.
If you decide to compile tiedot on 32-bit systems, the following integer-smear algorithm will cau
se compilation failure
due to 32-bit interger overflow; therefore you must modify the algorithm.
Do not remove the integer-smear process, and remember to run test cases to verify your mods.

Doesn't seem that difficult to solve (possibly just replacing int by uint64), however this needs to be resolved before onionscan can run on most Android devices and such.

Edit: There's an upstream issue for this HouzuoGuo/tiedot#68 and it's being worked on, there is even a 32-bit branch but it isn't integrated into mainline yet.

Scan Intensity e.g Fingerprint verses Full Scan

Currently we assume all or nothing. It would be nice if we could configure this behavior. I can imagine a few levels (where each level includes the ones before it) like:

  1. Just Fingerprint the Service: Open Protocols, Front Page images/keys, SSH Fingerprint etc.
  2. Check for Major Flaws e.g. Localhost Bypasses.
  3. Deeper fingerprint of all web pages / other indepth protocol analysis.
  4. Full blown invasive scan, follow all directories, check every image etc.

Collect links to external sites by analysing html source code.

We can collect all external links by adding checking of html source code (optionally javascript) and find links to clearnet sites (Hight Risk) and other onion sites (Low Risk).

  • <imr src=?
  • <a href=?
  • <script src=?
  • <iframe src=?

and so on.

What do you think about it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.