Git Product home page Git Product logo

har-sanitizer's Introduction

HAR Sanitizer

Description

HAR files are JSON-formatted "recordings" of web traffic activity for a user's browser session, which are often used to troubleshoot web front-ends, REST APIs, authentication issues, etc. However, HAR files will capture everything in a web session, including passwords, sensitive form information, authentication cookies and headers, and any content embedded in HTTP requests. This makes HAR files extremely sensitive, and highly prone to privacy breaches if handled incorrectly.

This tool aims to help mitigate these concerns by offering a simple, flexible interface to redact HAR file contents of any potentially sensitive information. It collects the names and values of all passwords, cookies, headers, URLQuery/POSTData/HTML-Form parameters, and embedded content mimetypes, and redacts values either already known to be sensitive, or those specified by the user. It currently exists as a both a client-side web tool and Flask REST API.

Live version may be found at https://har-sanitizer.appspot.com/

(This is NOT an official Google product)

Installation

$ git clone https://github.com/google/har-sanitizer.git
$ cd har-sanitizer
$ virtualenv -p $(which python2.7) venv --no-site-packages
$ source venv/bin/activate
$ pip install -r requirements.txt

Local Flask site (CLI @ root "./har-sanitizer/" directory)

  1. (If virtual environment not already activated)
$ source venv/bin/activate
  1. If desired, change static files location in config.json. Examples:
{
  "static_files": "./harsanitizer/static"
}

-or-

{
  "static_files": "https://storage.googleapis.com/har-sanitizer/static"
}

  1. Change port, debug, and other options in ./harsanitizer/harsan_api.py under:
app.run(...)
  1. Launch Flask server:
$ PYTHONPATH=. python ./harsanitizer/harsan_api.py
  1. Load the Har-Sanitizer web tool by visiting "http://localhost:8080" in Chrome or Firefox (substituting '8080' with the port #, if modified).

Usage

Web Tool

  1. Load HAR JSON file using 'Load HAR' button.

  2. Select names of cookies/headers/parameters/content mimetypes to scrub.

  3. Preview changes before committing, modifying scrub options as necessary.

  4. Export scrubbed HAR file once ready.

API Endpoint

  • /get_wordlist - Returns default HarSanitizer wordlist.

  • /default_mimetype_scrublist - Returns default HarSanitizer mimeTypes scrub list.

  • /cookies - Returns all cookie names found in POSTed Har (json). Example (Python w/ 'requests' package):

    import json, requests
    with open("har_file.har", "r") as har_file:
        har = json.load(har_file)
    url = 'http://localhost:8080/cookies'
    headers = {"Content-Type": "application/json"}
    r = requests.post(url, data=json.dumps(har), headers=headers)
    
  • /headers - Returns all header names found in POSTed Har (json). See /cookies for example.

  • /params - Returns all URL Query and POSTData Parameter names found in POSTed Har (json). See /cookies for example.

  • /mimetypes - Returns all content mimeTypes found in POSTed Har (json). See /cookies for example.

  • /scrub_har - Full scrub/redaction of sensitive HAR fields.

    Args:

    • har: the har json to be scrubbed

    • wordlist=None, (list of strs) appends to default wordlist

    • content_list=None, (list of strs) appends to default content_list

    • all_cookies=False, (Boolean) Redacts all cookies

    • all_headers=False, (Boolean) Redacts all headers

    • all_params=False, (Boolean) Redacts all URLQuery/POSTData parameters

    • all_content_mimetypes=False (Boolean) Redacts all content mimeTypes

    Example:

    import json, requests
    with open("har_file.har", "r") as har_file:
      har = json.load(har_file)
    url = 'http://localhost:8080/scrub_har'
    headers = {"Content-Type": "application/json"}
    data = {"har": har, "wordlist": ['mycookie', 'mycookie2'], "all_params": True}
    r = requests.post(url, data=json.dumps(data), headers=headers)
    

TODO

  1. Needs tests bad. This should be current priority. Use pytest and Jasmine.

  2. Other issues are being tracked in Github issue tracker.

Contact

Garrett Anderson:

Greg Cochard:

Geoffrey Coulter:

License

Copyright 2017, Google Inc.

Authors: Garrett Anderson

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

har-sanitizer's People

Contributors

thefunkjunky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

har-sanitizer's Issues

Takes a long time to process HAR file

It takes too long of time to process a HAR file that's 30 MBs. Also, it gives error in browser console stating that "Regular expression is too large" if there is too large of files.

Is there a way to fix this?

Thanks!

Incompatible with Python 3

import urllib2 requires refactoring to from urllib.request import urlopen in harsanitizer.py and harsan_api.py

Selecting any key found within multiple types will select them all

Selecting any one key out of others with the same name in other types (cookies, headers, params, etc) will effectively keep all of them selected, until all of them are simultaneously un-selected. This is due to the fact that the "wordlist" API doesn't categorize the words into different types. This will likely require a re-design of the API.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.