pralab / secml_malware Goto Github PK

View Code? Open in Web Editor NEW

200.0 9.0 46.0 4.76 MB

Create adversarial attacks against machine learning Windows malware detectors

Home Page: https://secml-malware.readthedocs.io/

License: GNU General Public License v3.0

Python 85.11% Jupyter Notebook 14.73% Dockerfile 0.16%

adversarial-machine-learning infosec python attack machine-learning security

secml_malware's Introduction

SecML Malware

Python library for creating adversarial attacks against Windows Malware detectors. Built on top of SecML, SecML Malware includes most of the attack proposed in the state of the art. We include a pre-trained MalConv model trained by EndGame, used for testing.

Included Attacks

Partial DOS Header manipulation, formulated by Demetrio et al.
Padding attack, formulated by Kolosnjaji et al.
GAMMA, formulated by Demetrio et al.
FGSM padding + slack formulated by Kreuk et al. and Suciu et al.
Content shifting and DOS header extension formulated by Demetrio et al.
Header Fields inspired by Nisi et al.

Installation

Navigate to the folder where you want to clone the project. I recommend creating a new environment (I use conda):

conda create -n secml_malware_env python=3.9
conda activate secml_malware_env
pip install git+https://github.com/zangobot/ember.git
pip install secml-malware

If you are an Apple Silicon user, please install lightgbm from conda:

conda install -c conda-forge lightgbm

How to use

Activate your environment, and import the secml_malware package inside your script:

import secml_malware
print(secml_malware.__version__)

The tests included in this project show how the library can be used for applying the manipulation to the input programs. There is also an example Jupyter notebook tutorial that shows how to build a apply a standard attack.

Docker

There is also a Dockerfile that can be used to start a container and test the library without messing with virtual environments!

docker build --tag secml_malware:0.2.9.1 .
docker run --rm -it secml_malware:0.2.9.1 bash

The container is also shipped with ipython, for a more interactive experience with this library.

Cite

If you use our library, please cite us!

@misc{demetrio2021secmlmalware,
      title={secml-malware: A Python Library for Adversarial Robustness Evaluation of Windows Malware Classifiers}, 
      author={Luca Demetrio and Battista Biggio},
      year={2021},
      eprint={2104.12848},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

Also, depending on the manipulations / formalization you are using, please cite our work:

Content shifting and DOS header extension manipulations or RAMEn formalization

@article{demetrio2021adversarial,
    title={Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection},
    author={Luca Demetrio and Scott E. Coull and Battista Biggio and Giovanni Lagorio and Alessandro Armando and Fabio Roli},
    journal={ACM Transactions on Privacy and Security},
    year={2021},
    publisher={ACM}
}

GAMMA

@article{demetrio2021functionality,
  title={Functionality-preserving black-box optimization of adversarial windows malware},
  author={Demetrio, Luca and Biggio, Battista and Lagorio, Giovanni and Roli, Fabio and Armando, Alessandro},
  journal={IEEE Transactions on Information Forensics and Security},
  year={2021},
  publisher={IEEE}
}

Partial DOS manipulation

@inproceedings{demetrio2019explaining,
  title={Explaining Vulnerabilities of Deep Learning to Adversarial Malware Binaries},
  author={Luca Demetrio and Battista Biggio and Giovanni Lagorio and Fabio Roli and Alessandro Alessandro},
  booktitle={ITASEC19},
  volume={2315},
  year={2019}
}

Bug reports

If you encounter something strange, feel free to open an issue! I am working a lot, and bugs are present everywhere. Let me know, and I'll try to fix them as soon as possible.

Testing

I provide a small test suite for the attacks I have developed inside the plugin. If you want to run them, ADD GOODWARE/MALWARE samples! There are two distinct folders:

secml_malware/data/goodware_samples
secml_malware/data/malware_samples/test_folder

Please, add samples to both folders (if and only if you want to run the internal tests).

secml_malware's People

Contributors

Stargazers

Watchers

secml_malware's Issues

How to apply secml_malware to my multiclass malconv classifier？

Hi @zangobot , I am using the malconv to deal with the malware family classification problem. It's a Multi-classification tasks so I modify the malconv's last fc layer's out_features dimension and I train the malconv from scratch. If I want to apply secml_malware library to attack the multiclass malconv, what should I do?

Thanks in advance :)

Improve logging

Remove all the debug prints, and use a standard logger that can be customized with a config file or at import time.

Fix SOREL model

SOREL models have many outputs, and currently all the attack focuses only on goodware/malware class.
The goal is to enrich the SOREL wrappers with metadata that specifies which quantity must be fine tuned.

Confidence on Microsoft Malware Classification Challenge

Describe the bug
Running the Pretrained Malconv included in the repo on the Microsoft malware classification dataset produces a confidence of [0.5,0.5].

To Reproduce
Link to Paper: https://arxiv.org/pdf/1802.10135.pdf
Link to Dataset: https://www.kaggle.com/c/malware-classification
I am using the code that is provided by the tutorial.

Expected behavior
I expected the confidence for each Malware sample to be higher. I think because the model think both benign and malware are possible, the attacks are not working properly.

Library info
Lastest Version of SecML

System info (please complete the following information):
Linux - Ubuntu
Additional context
Any advice on how I am might get a better confidence for the malware samples?

AttributeError: 'NoneType' object has no attribute 'dos_header'

I follow the tutorials and I have this problem.

Here, are my code

import lief

exe_path = '/Boom'
exe_object: lief.PE = lief.parse(exe_path)

print('DOS Header')
print(exe_object.dos_header)

print('PE Header')
print(exe_object.header)

print('Optional Header')
print(exe_object.optional_header)

print('Sections')
for s in exe_object.sections:
print(s.name, s.characteristics_lists)

The error:

AttributeError Traceback (most recent call last)
Input In [42], in <cell line: 2>()
1 print('DOS Header')
----> 2 print(exe_object.dos_header)
4 print('PE Header')
5 print(exe_object.header)

AttributeError: 'NoneType' object has no attribute 'dos_header'

How could I fix it?

CGammaSectionsEvasionProblem attack budget

Hi when im using my Sorel NeuralNet and CGammaSectionsEvasionProblem attack, the engine.confidences_ seems like the attack was not successful. Do you know why and what is the parameter to increase the budget for this attack. I've read but haven't found that yet. Thanks alot
[0.946287214756012, 0.8430443406105042, 0.8625335693359375, 0.8607925176620483, 0.8579151630401611, 0.8579151630401611, 0.8637216687202454, 0.8637216687202454, 0.8637216687202454, 0.8637216687202454, 0.8637216687202454]

Adding support for QuoVadis models

Include wrappers for QuoVadis, by leveraging the fusion models provided inside the original repository by @dtrizna.
It should be implemented as a CQuoVadisClassifier and its wrapper for blackbox attack CQuoVadisWrapperPhi.

The only problem is data preprocessing, as the original code leverage file paths (and later it extracts binaries from them).

Include Binary Rewriting techniques that changes opcodes

As a new feature, it would be interesting to port the perturbations proposed by Lucas et al. in their research paper.
Maybe, SecML Malware could use this repository as a dependency, or by including portions of it.

Fix numpy retrocompatibility for CClassifierEmber

Describe the bug
When using a CClassifierEmber, an error occurs since it uses np.int in c_classifier_ember.py, line 45 and np.int (deprecated after Numpy v1.20) no longer exists when using a Numpy version >= 1.20.

To Reproduce

bytes_buffer = np.frombuffer(data, dtype=np.uint8) # data : str of bytes
convertedData: CArray = CArray(bytes_buffer).atleast_2d()

# model_init
tree = CClassifierEmber(tree_path='ember_model.txt')
features = tree.extract_features(convertedData)
y_pred = tree.predict(features)

Expected behavior
A prediction is expected.

Library info
secml 0.15.6
secml_malware v0.2.7

System info (please complete the following information):

OS: Linux XUbuntu
Version 18

Additional context
None

errors in obtain index_to_perturb

Hi 🙋 ,
one more question,

confusion in function _get_list_adv_example(self,x0) in file c_format_exploit_evasion.py

step1: list1 <-- get the index_to_perturb_sections from content shift attack

step2: list2 <-- get the index_to_perturb_pe from DOS extension attack

for my understanding, there should not have a overlap between list1 and list2 since there are from totally different parts of PE files, and all corresponding offset already shifted

therefore, the final indexes_to_perturb should concatenate two lists (e.g., list1+list2), while you shift the list2 with a length of (list1), e.g.,
"indexes_to_perturb = index_to_perturb_pe + [i + len(index_to_perturb_pe) for i in index_to_perturb_sections]"

Maybe I miss something here, hope you can clarify it. Thank you ☺️ !!

Implementation of Extend and Shift

Hi,
I had a look into your code (especially into c_format_exploit_evasion.py) but I can't figure out how to implement an attack which only utilizes either Extend or Shift.
I would really appreciate it if you could explain me how to use the Keyword Arguments in order to use the Extend- or Shift-attack.

Thank you very much!!

Test suite that do not require malware to run

In order to better automate the testing part, the library should use some mock classifier to be used to test the algorithms (rather than using real malware / real networks).

Support for ensemble models

Hello,

I am working on a fork of the repository to add support for ensembles of models for attacks (while crafting adversarial examples, so shadow models). While working my way through it, I noticed that one of the calls to the secml library seems rather odd. Specifically, here:

secml_malware/secml_malware/attack/whitebox/c_end2end_evasion.py

Line 42 in 916dee0

end2end_model,

end2end_model seems to be provided as an argument twice, which ends up being set as both the classfier and y_target` in the parent class. Is this by design (the fact that it appears twice makes me wonder if it may have be an unintional error?)

Train models

I want to fine tune the model using new data, I see that we need to pass boolean value to train model but however there are no supporting functions available for such operations

Error while running the sample attack code from blackbox_tutorial.ipynb

Hi,

I have been trying to run white-box and black-box attacks using attack tutorials that came with secml_malware. While running the sample code from the black-box tutorial using malware samples, I got the below traceback. I have attached the code sample with the issue. It seems there is a bug in the implementation shown in the black-box attack tutorial.

Traceback (most recent call last):
File "/ramen/secml_malware/black_box_attack.py", line 45, in y_pred, adv_score, adv_ds, f_obj = engine.run(sample, CArray(label[1]))
File "/ramen/secml_malware/secml_malware/attack/blackbox/ga/c_base_genetic_engine.py", line 135, in run x_opt, f_opt = self._run(x[k, :], y[k], x_init=xi)
File "/ramen/secml_malware/secml_malware/attack/blackbox/ga/c_base_genetic_engine.py", line 167, in _run minimization_results = self._compute_black_box_optimization()
File "/ramen/secml_malware/secml_malware/attack/blackbox/ga/c_base_genetic_engine.py", line 216, in _compute_black_box_optimization toolbox.mate(child1, child2)
File "/usr/local/lib/python3.9/site-packages/deap/tools/crossover.py", line 31, in cxOnePoint cxpoint = random.randint(1, size - 1) File "/usr/local/lib/python3.9/random.py", line 338, in randint return self.randrange(a, b+1)
File "/usr/local/lib/python3.9/random.py", line 316, in randrange raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))

Differences Between WhiteBox Attacks

Hi,

There are four padding attacks implemented in the library that add in bytes: c_padding_evasion, c_kreuk_evasion, c_fast_gradient_sign_evasion, c_suciu_evasion. I know that c_kreuk_evasion uses c_fast_gradient_sign_evasion when no indexes_to_perturb is given. But c_suciu_evasion doesn't seem to have its own _run() function, instead it calls c_kreuk_evasion. Are c_kreuk_evasion and c_suciu_evasion doing the same thing? Additionally, if I wanted to find indexes_to_perturb outside the ones calculated by c_kreuk_evasion is that possible?

c_blackbox_problem.py VisibleDeprecationWarning

BUG
When executing c_blackbox_problem.py line 100 expanded_sequence = np.array(expanded_sequence) produces a VisibleDeprecationWarning because creating an ndarray from ragged nested sequences is deprecated.

FIX
Changing line 100 to expanded_sequence = np.array(expanded_sequence, dtype=object) should do the trick.

can't attack EMBER model

Describe the bug
When EMBER predicts malware, it prompts that LightGBMError: The number of features in data (73802) is not the same as it was in training data (2381).

I searched for this bug and found that the version of lief did not match. But python 3.9 can't install lief==0.9.0

To Reproduce

Expected behavior
The malware can be predicted normally by EMBER.

Library info
lief == 0.12.1
python == 3.9.12

System info (please complete the following information):

OS: [Windows]
Version [0.2.4]

Additional context

The following is the detailed error reporting information

LightGBMError Traceback (most recent call last)
Input In [18], in <cell line: 6>()
18 max_length = max(max_length, len(code))
19 x = CArray(np.frombuffer(code, dtype=np.uint8)).atleast_2d()
---> 20 print(net.predict(x, return_decision_function=False))

File ~\anaconda3\envs\secml_malware_env\lib\site-packages\secml\ml\classifiers\c_classifier.py:293, in CClassifier.predict(self, x, return_decision_function)
266 def predict(self, x, return_decision_function=False):
267 """Perform classification of each pattern in x.
268
269 If preprocess has been specified,
(...)
291
292 """
--> 293 scores = self.decision_function(x, y=None)
295 # The classification label is the label of the class
296 # associated with the highest score
297 labels = scores.argmax(axis=1).ravel()

File ~\anaconda3\envs\secml_malware_env\lib\site-packages\secml\ml\classifiers\c_classifier.py:222, in CClassifier.decision_function(self, x, y)
194 def decision_function(self, x, y=None):
195 """Computes the decision function for each pattern in x.
196
197 If a preprocess has been specified, input is normalized
(...)
220
221 """
--> 222 scores = self.forward(x, caching=False)
223 return scores if y is None else scores[:, y].ravel()

File ~\anaconda3\envs\secml_malware_env\lib\site-packages\secml\ml\c_module.py:204, in CModule.forward(self, x, caching)
202 # Transform data using inner preprocess, if defined
203 x = self._forward_preprocess(x=x, caching=caching)
--> 204 return self._forward(x)

File ~\Desktop\secml_malware\secml_malware\models\c_classifier_ember.py:63, in CClassifierEmber._forward(self, x)
61 def _forward(self, x):
62 x = x.atleast_2d()
---> 63 scores = self._lightgbm_model.predict(x.tondarray())
64 confidence = [[1 - c, c] for c in scores]
65 confidence = CArray(confidence)

File ~\anaconda3\envs\secml_malware_env\lib\site-packages\lightgbm\basic.py:3538, in Booster.predict(self, data, start_iteration, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape, **kwargs)
3536 else:
3537 num_iteration = -1
-> 3538 return predictor.predict(data, start_iteration, num_iteration,
3539 raw_score, pred_leaf, pred_contrib,
3540 data_has_header, is_reshape)

File ~\anaconda3\envs\secml_malware_env\lib\site-packages\lightgbm\basic.py:848, in _InnerPredictor.predict(self, data, start_iteration, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape)
846 preds, nrow = self.__pred_for_csc(data, start_iteration, num_iteration, predict_type)
847 elif isinstance(data, np.ndarray):
--> 848 preds, nrow = self.__pred_for_np2d(data, start_iteration, num_iteration, predict_type)
849 elif isinstance(data, list):
850 try:

File ~\anaconda3\envs\secml_malware_env\lib\site-packages\lightgbm\basic.py:938, in _InnerPredictor.__pred_for_np2d(self, mat, start_iteration, num_iteration, predict_type)
936 return preds, nrow
937 else:
--> 938 return inner_predict(mat, start_iteration, num_iteration, predict_type)

File ~\anaconda3\envs\secml_malware_env\lib\site-packages\lightgbm\basic.py:908, in _InnerPredictor.__pred_for_np2d..inner_predict(mat, start_iteration, num_iteration, predict_type, preds)
906 raise ValueError("Wrong length of pre-allocated predict array")
907 out_num_preds = ctypes.c_int64(0)
--> 908 _safe_call(_LIB.LGBM_BoosterPredictForMat(
909 self.handle,
910 ptr_data,
911 ctypes.c_int(type_ptr_data),
912 ctypes.c_int32(mat.shape[0]),
913 ctypes.c_int32(mat.shape[1]),
914 ctypes.c_int(C_API_IS_ROW_MAJOR),
915 ctypes.c_int(predict_type),
916 ctypes.c_int(start_iteration),
917 ctypes.c_int(num_iteration),
918 c_str(self.pred_parameter),
919 ctypes.byref(out_num_preds),
920 preds.ctypes.data_as(ctypes.POINTER(ctypes.c_double))))
921 if n_preds != out_num_preds.value:
922 raise ValueError("Wrong length for predict results")

File ~\anaconda3\envs\secml_malware_env\lib\site-packages\lightgbm\basic.py:125, in _safe_call(ret)
117 """Check the return value from C API call.
118
119 Parameters
(...)
122 The return value from C API calls.
123 """
124 if ret != 0:
--> 125 raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))

LightGBMError: The number of features in data (73802) is not the same as it was in training data (2381).
You can set predict_disable_shape_check=true to discard this error, but please be aware what you are doing.

Implementing header fields perturbations

Including perturbations of header fields, as proposed in previous work.

attack_tutorial.ipynb incorrect reference in for loop

Hi I'm not sure if this is intended but

you first add all the sample malware into X and y

But later in the for loop you use CArray(x) which would be static and based on the last processed code.
Perhaps you meant to use CArray(sample)?

No data preprocessing for SorelNet?

In the Sorel-20M repository, in the train.py, the train_network() function calls get_generator() which initializes the Generator class, which in turn calls the Dataset class that calls the LMDBReader class. LMDBReader has a function called features_postproc_func which per my understanding is applying some logarithmic function on the ember features before using them. This chain is not followed in the training of the LGB model where the Ember features are read directly from the numpy arrays and no pre-processing is applied (as expected).

Looking at the code in secml_malware I see that the ember features are fed directly to the neural network without any preprocessing and I'm wandering if this should be added in the feature extractor.

As a side note, in my testing of the Sorel models and data, if I don't apply the features_postproc_func I get really bad results with the pretrained sorel nets, so I think this is needed.

About "random"

Describe the bug
ValueError: empty range for randrange() (1,1, 0)

To Reproduce
blackbox_tutorial.ipynb/[4] y_pred, adv_score, adv_ds, f_obj = engine.run(sample, CArray(label[1])).

Expected behavior
Perform evolution smoothly.

System info (please complete the following information):

OS: [Linux ]
Version [16.0.4]

Tutorial for FGSM Attack

Hello, Thank you for maintaining the library. It is very useful.

I was wondering if you have any tutorials on the FGSM / kreuk?

I am aware of the class and how to import it. But, due to my lack of knowledge on what to do after importing, it is hard. For example, I tried to substitute the attacks in the blackbox tutorial with FGSM and i got the error "valueerror: classifier is not a cclassifier".

If it is not too much trouble, could you provide a minimal reproducible FGSM example? I can continue from there.

Thanks in advance :)

DOS extension attack debugging

Describe the bug
Hi,

Again, I am still leveraging you tools to do my project, which is an amazing tool!!

Recently, I tried to validate if the functionality really preserved after applied your adversarial attacks (partial/full dos, dos extend and content shift attacks). I used binary editor (HxD) to change the bytes. By following your algorithms, partial/full DOS attacks preserve the functionality, while the DOS extend and content shift attacks not.

Theoretically, these attacks should be functionality-preserved, but I am not sure whether I was missing some steps when applied dos extend and content shift attacks. I am wondering have you ever tried to prove these attacks are actually practical, even though these are make sense in theory. if you did, could you share the way how to validate? if not, could you give some advice and suggestions about the way I did?

Here is my steps for two attacks which are not able to preserve the functionality:
DOS extend: 1) modify PE entry point in DOS header based on the amount of extending bytes; 2)shift PE header start offset based on the amount of extending bytes and initial theses new created space with 0 values; 3) shift each section header offset based on the amount of extending bytes. (these steps are based on you code, while there is one step presented in your paper algorithm 3 line5 is not included in above steps, wondering if this is the reason?)

Content shift: 1)shift each section header offset based on the amount of shifting bytes; 2)insert 0 values between PE header and first section based on the amount of shifting bytes

All these steps are based on your code and I applied the corresponding manipulations in binary editor (HxD) to craft the program, however, the modified file is not functional.

Looking forward to your reply1

Thanks,

Hao

TypeError: Can't instantiate abstract class with abstract methods

Involved file

secml_malware/attack_tutorial.ipynb

Bug

Command:

from secml_malware.attack.whitebox.c_header_evasion import CHeaderEvasion
partial_dos = CHeaderEvasion(net, random_init=False, iterations=50, optimize_all_dos=False, threshold=0.5)

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-16ce7795f43f> in <module>
      1 from secml_malware.attack.whitebox.c_header_evasion import CHeaderEvasion
      2 
----> 3 partial_dos = CHeaderEvasion(net, random_init=False, iterations=50, optimize_all_dos=False, threshold=0.5)

TypeError: Can't instantiate abstract class CHeaderEvasion with abstract methods f_eval, grad_eval, objective_function, objective_function_gradient

System info

Linux 5.4.0-45-generic #49-Ubuntu x86_64 x86_64 x86_64 GNU/Linux

FGSM Attacking Running for days

Hello Community,

I am trying to run FGSM given as part of Sample code, but it's been a week since I started the execution and It's still running, The FGSM is supposed to run faster than the gradient-based approach.
Malware 32-bit windows malware

python - 3.8.16

GPU - YES

please point to me If I am missing anything,

Thank You,
Prabhath Mummaneni.

lief.bad_format error

Hey!
When I execute the test_evasion_attack.py I get this error:

======================================================================
ERROR: test_pe_shift_attack (__main__.EvasionEnd2EndTestSuite)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_evasion_attack.py", line 140, in test_pe_shift_attack
    self.assert_evasion_result(shift_attack)
  File "test_evasion_attack.py", line 143, in assert_evasion_result
    y_pred, _, _, _ = attack.run(self.X, self.Y)
  File "/root/miniconda3/envs/012-secml-malware-env/lib/python3.7/site-packages/secml/adv/attacks/evasion/c_attack_evasion.py", line 101, in run
    x_opt, f_opt = self._run(x[k, :], y[k], x_init=xi, *args, **kargs)
  File "/root/012/secml_malware/secml_malware/attack/whitebox/c_format_exploit_evasion.py", line 36, in _run
    x_init, _ = self._craft_perturbed_c_array(x0)
  File "/root/012/secml_malware/secml_malware/attack/whitebox/c_format_exploit_evasion.py", line 40, in _craft_perturbed_c_array
    x_init, indexes_to_perturb = self._generate_list_adv_example(x0)
  File "/root/012/secml_malware/secml_malware/attack/whitebox/c_format_exploit_evasion.py", line 56, in _generate_list_adv_example
    x_init, index_to_perturb_pe = shift_pe_header_by(x_init, preferable_extension_amount=self.pe_header_extension)
  File "/root/012/secml_malware/secml_malware/utils/extend_pe.py", line 49, in shift_pe_header_by
    liefpe = lief.PE.parse(x) #change
lief.bad_format: This file is not a PE binary

======================================================================
ERROR: test_section_shift_attack (__main__.EvasionEnd2EndTestSuite)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_evasion_attack.py", line 130, in test_section_shift_attack
    self.assert_evasion_result(shift_attack)
  File "test_evasion_attack.py", line 143, in assert_evasion_result
    y_pred, _, _, _ = attack.run(self.X, self.Y)
  File "/root/miniconda3/envs/012-secml-malware-env/lib/python3.7/site-packages/secml/adv/attacks/evasion/c_attack_evasion.py", line 101, in run
    x_opt, f_opt = self._run(x[k, :], y[k], x_init=xi, *args, **kargs)
  File "/root/012/secml_malware/secml_malware/attack/whitebox/c_format_exploit_evasion.py", line 36, in _run
    x_init, _ = self._craft_perturbed_c_array(x0)
  File "/root/012/secml_malware/secml_malware/attack/whitebox/c_format_exploit_evasion.py", line 40, in _craft_perturbed_c_array
    x_init, indexes_to_perturb = self._generate_list_adv_example(x0)
  File "/root/012/secml_malware/secml_malware/attack/whitebox/c_format_exploit_evasion.py", line 55, in _generate_list_adv_example
    x_init, index_to_perturb_sections = shift_section_by(x_init, preferable_extension_amount=self.preferable_extension_amount)
  File "/root/012/secml_malware/secml_malware/utils/extend_pe.py", line 63, in shift_section_by
    liefpe = lief.PE.parse(x) #change
lief.bad_format: This file is not a PE binary

----------------------------------------------------------------------

Expected behavior / Additional context
I tried to execute the script with different PE-Samples to be sure the issue isn't on my side.
I suspect this occurs because the liefpe-variable somewhat doesn't initialize correctly resulting in a NoneType-Object.

Would be cool if you could take a look at it :)

Attack own created model

How could I use your created attacks for a custom model?
Thank you.

lief error: This file is not a PE binary (Ember black box attack)

Describe the bug
I am trying to use the Ember model instead of Malconv for a black box attack. When the model is called to calculate the initial confidence value there is an error from Lief library that complains that the file is not a PE. However a confidence value is calculated. When I run ember to calculate directly the confidence value of the same PE file there is no error and the confidence value is different. The confidence value changes pending on the datatype conversion in x = CArray(np.frombuffer(data, dtype=np.float32)).

Please see below the code. From a quick look in the wrapper class I could not find the place where the error occurs, but there are some things in the code that I'm not sure I understand. Maybe I am doing something wrong?

Let me know if you need the malware file to verify.

To Reproduce

net = CClassifierEmber(tree_path=os.path.join(model_dir, 'ember_model_2018.txt'))
net = CEmberWrapperPhi(net)

folder = data_dir  #INSERT MALWARE IN THAT FOLDER
X = []
y = []
file_names = []
for i, f in enumerate(os.listdir(folder)):
    print(f)
    path = os.path.join(folder, f)
    if "PE32" not in magic.from_file(path):
        continue
    with open(path, "rb") as file_handle:
        data = file_handle.read()
    
    x = CArray(np.frombuffer(data, dtype=np.float32))
    _, confidence = net.predict(CArray(x), True)
       
    if confidence[0, 1].item() < 0.5:
        continue

    print(f"> Added {f} with confidence {confidence[0,1].item()}")
    X.append(x)
    conf = confidence[1][0].item()
    y.append([1 - conf, conf])
    file_names.append(path)

08c7fe8e6248b90a7d9e7765fec09fb6e24f502c6bea44b90665ab522f863176
lief error:  This file is not a PE binary
> Added 08c7fe8e6248b90a7d9e7765fec09fb6e24f502c6bea44b90665ab522f863176 with confidence 0.9606711512418338

When I use the ember model directly there is no lief error and a different confidence value:

import lightgbm as lgb
import ember 

lgbm_model = lgb.Booster(model_file=os.path.join(model_dir, 'ember_model_2018.txt'))

binary_path = os.path.join(data_dir, '08c7fe8e6248b90a7d9e7765fec09fb6e24f502c6bea44b90665ab522f863176')
file_data = open(binary_path, "rb").read()

extractor = ember.PEFeatureExtractor(2, print_feature_warning=False)
features = np.array(extractor.feature_vector(file_data), dtype=np.float32)
lgbm_model.predict([features])[0]

0.9999999908623362

System info (please complete the following information):

OS: Ubuntu
Version 18.04

Additional context
This is tested with a fresh installation of secml and all required dependencies.

real sample generation

I am generating real samples using sec_malware library, working on white-box padding attack but the generated adversarial samples are same in size with the original one.

please suggest if any changes are required in code

SOREL ATTACK

Hi im currently trying to run Attack for sorel but i don't have goodware so are there any attack i can use to run Sorel (I tried several attack like Exploit Evasion... but it did not work at this liefpe = lief.PE.parse(x) as this returns NoneType (Can't read the file). In addition what format of malware file should i use (normal exe for sth) and how many goodwares needed for the c gamma evasion attack ? Thanks

Implementation of Fast Gradient Sign Attack

Hi,

I had a look into your code the c_fast_gradient_sign_evasion.py file but I can't figure out how to implement an attack.
I would really appreciate it if you could explain me what some of the functions do and how to use them with reference to the tutorial attack

Thank you very much!

lightGBM and SOREL model weights?

Is there a pointer/way to obtain lightGBM model weights, and for SOREL too?
Currently, the repository seems to have weights for only MalConv

How to run lightGBM and SOREL model using secml_malware?

I am currently looking into ways of using the lightGBM and SOREL models apart from the Malconv to see how effective are the attacks against those models. But I am not able to get a clear idea from either the code or documentation regarding how to integrate those models with the existing implementation of the secml_malware library in order to test it?

Does the library currently supports testing attacks against the Malconv and not against other models? If yes, kindly provide some directions in order to test against other models?

strange 'type error" when apply gamma_sections_attacks

Describe the bug
"type error" appears in when implement "lief_adv: lief.PE.Binary = lief.PE.parse(x_adv[0])"

I am wondering lief.PE.parse can only work on a exe binary file, but x_adv[0] here is the one in numpy format [0,256].

Couldn't figure out the issue here

To Reproduce
Code for reproducing the error.

Expected behavior
A clear and concise description of what you expected to happen.

System info (please complete the following information):

OS: [e.g. Windows, macOS, Linux distro]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

self.stagnation value is not respected when global min is stuck at inf

Description
In black-box GAMMA attacks sometimes the global min gets stuck at inf. There is a numpy warning after 5 iterations and the optimization continues instead of stopping. Please see below the output for the exact issue:

>0 - Global min: inf
>1 - Global min: inf
>2 - Global min: inf
>3 - Global min: inf
>4 - Global min: inf
2021-06-30 15:20:30,339 - py.warnings - WARNING - /home/mari/miniconda3/envs/adv_mal/lib/python3.8/site-packages/secml_malware/attack/blackbox/ga/c_base_genetic_engine.py:237: RuntimeWarning: invalid value encountered in subtract
  if len(last_n_best_fits) == self.stagnation and all((np.array(last_n_best_fits) - best_fitness) < 1e-6):

>5 - Global min: inf
>6 - Global min: inf
>7 - Global min: inf
>8 - Global min: inf
>9 - Global min: inf
>10 - Global min: inf
>11 - Global min: inf
>12 - Global min: inf
>13 - Global min: inf
>14 - Global min: inf

Expected behavior
The optimization should end with a Stagnating result! as output in case debug is enabled.

System info:

OS: Ubuntu 18.04.05

Wrong ember prediction

model.predict() retrieves class '1' even if the prediction score is under the ember model threshold

No such file or directory: 'secml_malware/data/malware_samples/test_folder'

As the question shows, does this file or folder need to be created by yourself? What does this folder do?

How to realize the class ExeVisualizationTests？

Hello

Would you mind sharing the implementation of visualizing the integrated gradients applied to malware programs?I remember you described it in your paper.I want to try to express what does MalConv learn intuitively.

Many Thanks

bug in CFormatExploitEvasion

Hi, may be I find a bug in CFormatExploitEvasion, my code is as below:

import os
import magic
import secml_malware
from secml.array import CArray

from secml_malware.attack.whitebox import CKreukEvasion, CFormatExploitEvasion
from secml_malware.models.malconv import MalConv
from secml_malware.models.c_classifier_end2end_malware import CClassifierEnd2EndMalware, End2EndModel
from secml_malware.attack.whitebox.c_header_evasion import CHeaderEvasion
net = MalConv()
net = CClassifierEnd2EndMalware(net)
net.load_pretrained_model()

partial_dos = CFormatExploitEvasion(
   net,
   preferable_extension_amount=0x200,
   pe_header_extension=0,
   iterations=2,
   is_debug=True,
   threshold=9.644263859742708e-12
)
folder = "./secml_malware/data/test_folder"
X = []
y = []
file_names = []
for i, f in enumerate(os.listdir(folder)):
   path = os.path.join(folder, f)
   with open(path, "rb") as file_handle:
       code = file_handle.read()
   x = End2EndModel.bytes_to_numpy(
       code, net.get_input_max_length(), 256, False
   )
   _, confidence = net.predict(CArray(x), True)

   if confidence[0, 1].item() > 0.5:
       continue

   print(f"> Added {f} with confidence {confidence[0,1].item()}")
   X.append(x)
   conf = confidence[1][0].item()
   y.append([1 - conf, conf])
   file_names.append(path)

for sample, label in zip(X, y):
   y_pred, adv_score, adv_ds, f_obj = partial_dos.run(CArray(sample), CArray(label[1]))
   print(partial_dos.confidences_)
   print(f_obj)

adv_x = adv_ds.X[0,:]
real_adv_x = partial_dos.create_real_sample_from_adv(file_names[0], adv_x, new_file_path="./secml_malware/data//1.exe")
print(len(real_adv_x))
real_x = End2EndModel.bytes_to_numpy(real_adv_x, net.get_input_max_length(), 0, True)
_, confidence = net.predict(CArray(real_x), True)


print(confidence)
print(type(real_adv_x))

I wrote a exe file, which only prints "Hello world" string.
The code successfully create an adversarial exe file, but the exe file crashes when I run it.
If I use the CKreukEvasion, the adversarial exe file works normally.
Maybe there is a bug in CFormatExploitEvasion?

Question Shift-Attack

Hey!

First of all great project!
I'm currently struggling to figure out (1) why the shuffling of the section content frees additional space and (2) which gradient descent method is used for crafting the actual payload (which is later inserted into the additional space).

Thank you for your efforts!
Cheers

request data

Hello, can you share the data in secml malware/data，we intend to quote some cases from the paper as part of the laboratory instruction manual . If everything goes well, this part of the work will be part of a future book publication .

the function to get PE position

in "c_header_evasion.py": pe position can be obtained by follows:
pe_position = x_init[0x3C:0x40].tondarray().astype(np.uint16)[0]
pe_position = struct.unpack("<I", bytes(pe_position.astype(np.uint8)))[0]
in "extend_pe.py": pe position can be obtained by follows:
pe_position = liefpe.dos_header.addressof_new_exeheader
my method:
pe_header_index = x_init.find(b'PE')

for my understanding, those three methods should return the same results, while 2 is different to 1 and 3. (1 and 3 are the same). I am wondering maybe package lief has an error in terms of identifying the e_lfanew (address of new exe header)

undefined symbol: PySlice_Unpack

I followed your instruction to create new virtual environment and pip install -r requirements.txt. But when I ran your notebook, I got an error like below. Can you help me address it?

ImportError Traceback (most recent call last)
in ()
1 import os
2 import magic
----> 3 import secml_malware
4 from secml.array import CArray
5 from secml_malware.models.malconv import MalConv
...
ImportError: /home/viet/anaconda3/envs/pesidious_secml/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack

CContentShiftingEvasion bug for programs compiled with VS C++ on debug (not release)

Describe the bug
When applying the CContentShiftingEvasion to a binary, if it has been compiled for debug, the manipulation corrupts the file.

To Reproduce
Compile an executable on Windows, using Visual Studio C++ in debug mode.
Apply the CContentShiftingEvasion to it.

Expected behavior
The program should run as expected, but it crashes instead.

Library info
secml-malware 0.2.5.1

issue installing secml-malware with pip with python 3.12

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [16 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Ponte\AppData\Local\Temp\pip-install-gkfs8wet\matplotlib_e822e290f1734ff89cd1031315ef9c0e\setup.py", line 53, in
version = versioneer.get_version()
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Ponte\AppData\Local\Temp\pip-install-gkfs8wet\matplotlib_e822e290f1734ff89cd1031315ef9c0e\versioneer.py", line 1410, in get_version
return get_versions()["version"]
^^^^^^^^^^^^^^
File "C:\Users\Ponte\AppData\Local\Temp\pip-install-gkfs8wet\matplotlib_e822e290f1734ff89cd1031315ef9c0e\versioneer.py", line 1344, in get_versions
cfg = get_config_from_root(root)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Ponte\AppData\Local\Temp\pip-install-gkfs8wet\matplotlib_e822e290f1734ff89cd1031315ef9c0e\versioneer.py", line 401, in get_config_from_root
parser = configparser.SafeConfigParser()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'configparser' has no attribute 'SafeConfigParser'. Did you mean: 'RawConfigParser'?
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

Question: what's parameter 'random_init" means?

Hi,

when I am testing the white-box attacks, there is a parameter "random_init", and I find that that evasion rate is a little bit higher when random_init=True. and I read the paper, there is no explanation about this setting. I am wondering could you explain the specific function of this parameter? Thank you so much in advance!!

GAMMA section injections might load sections at random

GAMMA section injection has listdir to load all samples from a folder. But this could load samples out of order, reducing reproducibility.

pralab / secml_malware Goto Github PK

secml_malware's Introduction

SecML Malware

Included Attacks

Installation

How to use

Docker

Cite

Bug reports

Testing

secml_malware's People

Contributors

Stargazers

Watchers

Forkers

secml_malware's Issues

The error:

The following is the detailed error reporting information

Recommend Projects

Recommend Topics

Recommend Org