Git Product home page Git Product logo

commonspeak2's Introduction

Commonspeak2

Commonspeak2 leverages publicly available datasets from Google BigQuery to generate content discovery and subdomain wordlists.

As these datasets are updated on a regular basis, the wordlists generated via Commonspeak2 reflect the current technologies used on the web.

By using the Golang client for BigQuery, we can stream the data and process it very quickly. The future of this project will revolve around improving the quality of wordlists generated by creating automated filters and substitution functions.

Let's turn creating wordlists from a manual task, into a reproducible and reliable science with BigQuery.

I just want the wordlists...

We will update wordlists.assetnote.io website with any wordlists generated the Commonspeak2 tool.

Wordlists are automatically generated at the end of each month and uploaded to this site. Further details here: https://github.com/assetnote/wordlists

Instructions & Usage

If you're compiling or running Commonspeak2 from source:

If you're using the pre-built binaries:

  • Download the newest release here

Upon completing the above steps, Commonspeak2 can be used in the following ways:

Subdomains

Currently subdomains are extracted from HackerNews and HTTPArchive's latest scans. Unlike the previous revision of Commonspeak, the datasets and queries have been optimised to contain valid data that occurs often in the wild.

⟩ ./commonspeak2 --project crunchbox-160315 --credentials credentials.json subdomains -o subdomains.txt

INFO[0000] Generated SQL template for HackerNews.        Mode=Subdomains
INFO[0000] Generated SQL template for HTTPArchive.       Mode=Subdomains
INFO[0000] Executing BigQuery SQL... this could take some time.  Mode=Subdomains Source=hackernews
INFO[0019] Total rows extracted 71415.                   Mode=Subdomains Silent=false Source=hackernews Verbose=false
INFO[0019] Executing BigQuery SQL... this could take some time.  Mode=Subdomains Source=httparchive
INFO[0075] Total rows extracted 484701.                  Mode=Subdomains Silent=false Source=httparchive Verbose=false

Words with extensions

Using a single query on GitHub's dataset, we can extract every path filtered by file extension. This can be done with:

⟩ ./commonspeak2 --project crunchbox-160315 --credentials credentials.json ext-wordlist -e jsp -l 100000 -o jsp.txt

INFO[0000] Executing BigQuery SQL... this could take some time.  Extensions=jsp Limit=100000 Mode=WordsWithExt Source=Github
INFO[0013] Total rows extracted 100000.                  Mode=WordsWithExt Source=Github

Any set of extensions can be passed via the -e flag, i.e. -e aspx,php,html,js.

Deleted files

Contributed by mhmdiaa

Using GitHub's commits dataset, we can extract what may be files that developers decided to delete from their public repositories. These files may contain sensitive data. This can be done with:

⟩ ./commonspeak2 --project crunchbox-160315 --credentials credentials.json deleted-files -l 50000 -o deleted.txt

INFO[0000] Executing BigQuery SQL... this could take some time.  Limit=50000 Mode=DeletedFiles Source=Github
INFO[0013] Total rows extracted 50000.                  Mode=DeletedFiles Source=Github

Features in Active Development

Feel free to send pull requests to complete the features below, add datasets or improve the architecture of this project. Thank you!

Routes Based Extraction

We can create SQL statements that cover routing patterns in almost any web framework. For now we support the following web frameworks to extract path's from:

  • Rails [working implementation ✅]
  • NodeJS [to be implemented ❎]
  • Tomcat [to be implemented ❎]

This data can be extracted using the following command:

⟩ ./commonspeak2 --project crunchbox-160315 --credentials credentials.json routes --frameworks rails -l 100000 -o rails-routes.txt

WARNING: running the above query will cost you lots of money (over $20 per framework). Commonspeak2 will prompt to confirm that this is OK. To skip this prompt use the --silent flag.

When this is ran on for Rails routes, Commonspeak2 does the following:

  1. Pulls Rails routes from config/routes.rb using Regex and the latest Github dataset.
  2. Processes the data, converts it into paths and does contexual replacements to make the path valid (i.e. converting /:id to /1234)
  3. Normalizes the path, finally saving to disk after all the processing is complete.

Scheduled Wordlist Generation

Planned feature to use a cron-like system to allow for wordlist generation from BigQuery to happen continuously.

When this command is introduced, we will insert the --schedule parameter to any of our pre-existing commands covered in this README like so:

⟩ ./commonspeak2 --project crunchbox-160315 --credentials credentials.json --schedule weekly routes --frameworks nodejs,tomcat -l 100000 -o nodejs-tomcat-routes.txt

The above query will run a weekly BigQuery and save the output to ./nodejs-tomcat-routes.txt.

Substitutions and Alterations

Generate smart substitutions and alterations for the datasets that it makes sense for. For example, converting string values from /admin/users/:id to /admin/users/1234 (contextually aware of the number).

Credits

Shubham Shah @infosec_au

Michael Gianarakis @mgianarakis

License

   Copyright 2018 Assetnote

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

Assetnote Pty. Ltd. - Twitter @assetnote

commonspeak2's People

Contributors

infosec-au avatar mosesrenegade avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

commonspeak2's Issues

httparchieve query is invalid

It looks like some changes has been made on httparchive

INFO[0000] Generated SQL template for HackerNews.        Mode=Subdomains
INFO[0000] Generated SQL template for HTTPArchive.       Mode=Subdomains
INFO[0000] Executing BigQuery SQL... this could take some time.  Mode=Subdomains Source=hackernews

INFO[0022] Total rows extracted 74160.                   Mode=Subdomains Silent=false Source=hackernews Verbose=false
INFO[0022] Executing BigQuery SQL... this could take some time.  Mode=Subdomains Source=httparchive
FATA[0025] Error executing BigQuery SQL.                 Error="googleapi: Error 400: Unrecognized name: origin at [17:16], invalidQuery" Mode=Subdomains Source=httparchive

FATA[0011] Error executing BigQuery SQL. gs://commonspeak-udf/URI.min.js

FATA[0011] Error executing BigQuery SQL. Error="googleapi: Error 403: Access Denied: BigQuery BigQuery: Error getting metadata for external code resource, please verify you have provided a valid path and/or that you have access to the resource: gs://commonspeak-udf/URI.min.js, accessDenied" Mode=WordsWithExt Source=httparchive

has no field or method GlobalBool

Default installation is not working.

$ go get github.com/assetnote/commonspeak2

github.com/assetnote/commonspeak2/command/deletedfiles

go/src/github.com/assetnote/commonspeak2/command/deletedfiles/deleted.go:33:17: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
go/src/github.com/assetnote/commonspeak2/command/deletedfiles/deleted.go:34:16: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
go/src/github.com/assetnote/commonspeak2/command/deletedfiles/deleted.go:35:14: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
go/src/github.com/assetnote/commonspeak2/command/deletedfiles/deleted.go:36:14: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)
go/src/github.com/assetnote/commonspeak2/command/deletedfiles/deleted.go:37:18: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)

github.com/assetnote/commonspeak2/command/routes

go/src/github.com/assetnote/commonspeak2/command/routes/routes.go:37:17: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
go/src/github.com/assetnote/commonspeak2/command/routes/routes.go:38:16: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
go/src/github.com/assetnote/commonspeak2/command/routes/routes.go:39:14: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
go/src/github.com/assetnote/commonspeak2/command/routes/routes.go:40:14: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)
go/src/github.com/assetnote/commonspeak2/command/routes/routes.go:41:18: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)

github.com/assetnote/commonspeak2/command/subdomains

go/src/github.com/assetnote/commonspeak2/command/subdomains/subdomains.go:32:17: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
go/src/github.com/assetnote/commonspeak2/command/subdomains/subdomains.go:33:16: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
go/src/github.com/assetnote/commonspeak2/command/subdomains/subdomains.go:34:14: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)
go/src/github.com/assetnote/commonspeak2/command/subdomains/subdomains.go:35:18: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)

github.com/assetnote/commonspeak2/command/wordswithext

go/src/github.com/assetnote/commonspeak2/command/wordswithext/words.go:31:17: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
go/src/github.com/assetnote/commonspeak2/command/wordswithext/words.go:32:16: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
go/src/github.com/assetnote/commonspeak2/command/wordswithext/words.go:33:14: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)
go/src/github.com/assetnote/commonspeak2/command/wordswithext/words.go:34:18: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)

commonspeak doesn't work correctly when it comes to generating routes wordlists

when i try to generate rails routes wordlists using this command: ./commonspeak2 --project XXXX --credentials XXXXX routes --frameworks rails -l 100000 -o rails-routes.txt, it doesnt return the expected result, all it returns is:

INFO[0000] Generated SQL template for Rails routes. Mode=Routes
INFO[0000] Executing BigQuery SQL... this could take some time. Framework=rails Mode=Routes
id
staff_id
event_id
smart_proxy_id
key
loggable_id
forum_id
topic_id
conversation_id
index
record
account_id
page
invitation_token
start
end
date
project_id
nonprofit_id
user_id
agent_id
discussion_id
feed_uid
commit_id
key_id
paper_uid
group_id
auth_token
category_id
prison_id
recording_id
confirmation_token
comment_id
host_id
parent_slug
back_id
call_id
1
activation_code
days
new_id
issue_id
govtrack_id
step
provider
name
action
User
body
permalink
legislative_term
paper
slug
component_1
component_2
component_3
loggable_type
repo
address
year
month
locale
token
tags
username
archetype
link
check
apiv
query
filename
day
feed
article_name
controller
status
secret
user
code
tab
service
text
Filemanager
currency
model
platform
idp
color
url_token
admin_key
client
subscription
letter
ministry
filter
url_start
url_end
session
application
newsletter
format
category_slug
organization
INFO[0009] Total rows extracted 0. Framework=rails Mode=Routes Source=Github

FATA[0020] Error executing BigQuery SQL. Error="googleapi: Error 400: Unrecognized name: origin at [17:16], invalidQuery" Mode=Subdomains Source=httparchive

Hello
I get this error "FATA[0020] Error executing BigQuery SQL. Error="googleapi: Error 400: Unrecognized name: origin at [17:16], invalidQuery" Mode=Subdomains Source=httparchive" when commonspeak tries to fetch subdomains from httparchive specifically, how can i fix this?

Command: ./commonspeak2 --project XXXXX --credentials XXXXX subdomains -o subdomains.txt

Error 403 Access Denied

Hi,

I'm getting this error when getting a subdomain list with this command:

commonspeak2 --project crunchbox-160315 --credentials creds.json subdomains -o subdomains.txt

and this as a result:

INFO[0000] Generated SQL template for HackerNews.        Mode=Subdomains
INFO[0000] Generated SQL template for HTTPArchive.       Mode=Subdomains
INFO[0000] Executing BigQuery SQL... this could take some time.  Mode=Subdomains Source=hackernews
FATA[0001] Error executing BigQuery SQL.                 Error="googleapi: Error 403: Access Denied: Project crunchbox-160315: User does not have bigquery.jobs.create permission in project crunchbox-160315., accessDenied" Mode=Subdomains Source=hackernews

How to fix this?

runtime error?

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x82f66c]

goroutine 1 [running]:
github.com/assetnote/commonspeak2/vendor/cloud.google.com/go/bigquery.(*JobIDConfig).createJobRef(0xc4200d7d40, 0x0, 0x0)
/go/src/github.com/assetnote/commonspeak2/vendor/cloud.google.com/go/bigquery/job.go:154 +0x3c
github.com/assetnote/commonspeak2/vendor/cloud.google.com/go/bigquery.(*Query).newJob(0xc4200d7d40, 0x9f0668, 0xa34e20, 0xc4201ee990)
/go/src/github.com/assetnote/commonspeak2/vendor/cloud.google.com/go/bigquery/query.go:294 +0x6c
github.com/assetnote/commonspeak2/vendor/cloud.google.com/go/bigquery.(*Query).Run(0xc4200d7d40, 0xa34e20, 0xc4201ee990, 0x0, 0x0, 0x0)
/go/src/github.com/assetnote/commonspeak2/vendor/cloud.google.com/go/bigquery/query.go:277 +0xb5
github.com/assetnote/commonspeak2/vendor/cloud.google.com/go/bigquery.(*Query).Read(0xc4200d7d40, 0xa34da0, 0xc4200a4050, 0x4, 0xc4200c6280, 0x34)
/go/src/github.com/assetnote/commonspeak2/vendor/cloud.google.com/go/bigquery/query.go:302 +0x43
github.com/assetnote/commonspeak2/command/wordswithext.query(0x0, 0xa34da0, 0xc4200a4050, 0xc4200ca0b0, 0xae, 0x1, 0x1, 0x0)
/go/src/github.com/assetnote/commonspeak2/command/wordswithext/helper.go:32 +0x90
github.com/assetnote/commonspeak2/command/wordswithext.CmdStatus(0xc4200bcf20, 0xc4200b6600, 0xc4200bcf20)
/go/src/github.com/assetnote/commonspeak2/command/wordswithext/words.go:88 +0x8c0
github.com/assetnote/commonspeak2/vendor/github.com/urfave/cli.HandleAction(0x8d4120, 0x9f0610, 0xc4200bcf20, 0x0, 0xc4200b6600)
/go/src/github.com/assetnote/commonspeak2/vendor/github.com/urfave/cli/app.go:501 +0xc8
github.com/assetnote/commonspeak2/vendor/github.com/urfave/cli.Command.Run(0x9b9361, 0xc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9d11be, 0x3c, 0x0, ...)
/go/src/github.com/assetnote/commonspeak2/vendor/github.com/urfave/cli/command.go:165 +0x47d
github.com/assetnote/commonspeak2/vendor/github.com/urfave/cli.(*App).Run(0xc42009a1c0, 0xc4200ac000, 0xc, 0xc, 0x0, 0x0)
/go/src/github.com/assetnote/commonspeak2/vendor/github.com/urfave/cli/app.go:259 +0x6e8
main.main()
/go/src/github.com/assetnote/commonspeak2/main.go:36 +0x175

Write BigQuery SQL for routes from NodeJS and Tomcat

See the data/sql/ folder for examples of how we currently create SQL queries for rails routes.

See the command/ folder for examples of how we modify the routes returned to produce better quality data.

Create SQL queries for NodeJS and Tomcat, create appropriate filtering and substitution methods to improve quality of output.

Compilation issue!

Hello,
While compiling commonspeak2 from source I am getting the following error:

github.com/assetnote/commonspeak2/command/deletedfiles
gotools/src/github.com/assetnote/commonspeak2/command/deletedfiles/deleted.go:33:17: c.GlobalBool undefined (type *cli.Context has no fi eld or method GlobalBool)
gotools/src/github.com/assetnote/commonspeak2/command/deletedfiles/deleted.go:34:16: c.GlobalBool undefined (type *cli.Context has no fi eld or method GlobalBool)
gotools/src/github.com/assetnote/commonspeak2/command/deletedfiles/deleted.go:35:14: c.GlobalBool undefined (type *cli.Context has no fi eld or method GlobalBool)
gotools/src/github.com/assetnote/commonspeak2/command/deletedfiles/deleted.go:36:14: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)
gotools/src/github.com/assetnote/commonspeak2/command/deletedfiles/deleted.go:37:18: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)
github.com/assetnote/commonspeak2/command/routes
gotools/src/github.com/assetnote/commonspeak2/command/routes/routes.go:37:17: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
gotools/src/github.com/assetnote/commonspeak2/command/routes/routes.go:38:16: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
gotools/src/github.com/assetnote/commonspeak2/command/routes/routes.go:39:14: c.GlobalBool undefined (type *cli.Context has no field or method GlobalBool)
gotools/src/github.com/assetnote/commonspeak2/command/routes/routes.go:40:14: c.GlobalString undefined (type *cli.Context has no field o r method GlobalString)
gotools/src/github.com/assetnote/commonspeak2/command/routes/routes.go:41:18: c.GlobalString undefined (type *cli.Context has no field o r method GlobalString)
github.com/assetnote/commonspeak2/command/subdomains
gotools/src/github.com/assetnote/commonspeak2/command/subdomains/subdomains.go:32:17: c.GlobalBool undefined (type *cli.Context has no f ield or method GlobalBool)
gotools/src/github.com/assetnote/commonspeak2/command/subdomains/subdomains.go:33:16: c.GlobalBool undefined (type *cli.Context has no f ield or method GlobalBool)
gotools/src/github.com/assetnote/commonspeak2/command/subdomains/subdomains.go:34:14: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)
gotools/src/github.com/assetnote/commonspeak2/command/subdomains/subdomains.go:35:18: c.GlobalString undefined (type *cli.Context has no field or method GlobalString)
github.com/assetnote/commonspeak2/command/wordswithext
gotools/src/github.com/assetnote/commonspeak2/command/wordswithext/words.go:31:17: c.GlobalBool undefined (type *cli.Context has no fiel d or method GlobalBool)
gotools/src/github.com/assetnote/commonspeak2/command/wordswithext/words.go:32:16: c.GlobalBool undefined (type *cli.Context has no fiel d or method GlobalBool)
gotools/src/github.com/assetnote/commonspeak2/command/wordswithext/words.go:33:14: c.GlobalString undefined (type *cli.Context has no fi eld or method GlobalString)
gotools/src/github.com/assetnote/commonspeak2/command/wordswithext/words.go:34:18: c.GlobalString undefined (type *cli.Context has no fi eld or method GlobalString)

Keep compiled wordlists separate?

While I love the idea of having premise wordlists, I wonder if keeping them in this same repo might clutter things up/make it too big.

Would it make more sense to have a separate repo (like commonspeak2-wordlists or similar) to keep those in?

How to use commonspeak2

./commonspeak2 --project crunchbox-160315 --o subdomains.txt
Incorrect Usage. flag provided but not defined: -o

NAME:
Commonspeak 2 - Generate wordlists using BigQuery by analysing datasets that evolve constantly.

USAGE:
commonspeak2 [global options] command [command options] [arguments...]

VERSION:
0.1.0

AUTHOR:
Assetnote

COMMANDS:
ext-wordlist Generate wordlists based on extensions provided by the user.
subdomains Generates a list of subdomains from all available BigQuery public datasets.
routes Generate wordlists based on routes extracted from popular frameworks.
help, h Shows a list of commands or help for one command

GLOBAL OPTIONS:
--project value, -p value The Google Cloud Project to use for the queries.
--credentials value, -c value Refer to: https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-go [credentials.json]
--verbose Enable verbose output.
--silent, -s If this is set to true, the results will be written to a file but not to STDOUT.
--test, -t If this is set to true, Commonspeak2 will execute queries against smaller, testing datasets.
--help, -h show help
--version, -v print the version
04:02:00 root kali /root/Desktop/commonspeak2_0.1.4_Linux_x86_64

./commonspeak2 --project crunchbox-160315 --o /root/Desktop/alive.txt

Incorrect Usage. flag provided but not defined: -o

NAME:
Commonspeak 2 - Generate wordlists using BigQuery by analysing datasets that evolve constantly.

USAGE:
commonspeak2 [global options] command [command options] [arguments...]

VERSION:
0.1.0

AUTHOR:
Assetnote

COMMANDS:
ext-wordlist Generate wordlists based on extensions provided by the user.
subdomains Generates a list of subdomains from all available BigQuery public datasets.
routes Generate wordlists based on routes extracted from popular frameworks.
help, h Shows a list of commands or help for one command

GLOBAL OPTIONS:
--project value, -p value The Google Cloud Project to use for the queries.
--credentials value, -c value Refer to: https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-go [credentials.json]
--verbose Enable verbose output.
--silent, -s If this is set to true, the results will be written to a file but not to STDOUT.
--test, -t If this is set to true, Commonspeak2 will execute queries against smaller, testing datasets.
--help, -h show help
--version, -v print the version

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.