phishingbaseline's Introduction

Phishing baseline

Implementations of phishing detection and identification baselines

EMD: Fu, A. Y., Wenyin, L., & Deng, X. (2006). Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD). IEEE transactions on dependable and secure computing, 3(4), 301-311. This paper uses Earth Mover Distance to detect the similarity between two webpage screenshots.
Phishzoo: Afroz, S., & Greenstadt, R. (2011, September). Phishzoo: Detecting phishing websites by looking at them. In 2011 IEEE fifth international conference on semantic computing (pp. 368-375). IEEE. This work applies SIFT algorithm to quantify the similarity between two webpage screenshots.
VisualPhishnet: Abdelnabi, S., Krombholz, K., & Fritz, M. (2020, October). VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (pp. 1681-1698). This work trains deep learning Siamese model to compare two webpage screenshots.
StackModel: Li, Y., Yang, Z., Chen, X., Yuan, H., & Liu, W. (2019). A stacking model using URL and HTML features for phishing webpage detection. Future Generation Computer Systems, 94, 27-39.
URLNet: Le, H., Pham, Q., Sahoo, D., & Hoi, S. C. (2018). URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv preprint arXiv:1802.03162.

Requirements

python == 3.6
opencv-python == 3.4.2.17
opencv-contrib-python == 3.4.2.17
tensorflow == 1.13.1

Instructions

The data folder should be organized in this format

To run EMD

cd EMD/ 
python emd.py -f [path_to_data_folder] \
             -m [benign|phish] # testing mode, which is the ground-truth label for the folder \
             -t [path_to_targetlist_folder]

To run PhishZoo

cd PhishZoo/
python phishzoo.py -f [path_to_data_folder] \
                   -m [benign|phish] # testing mode, which is the ground-truth label for the folder \
                   -t [path_to_targetlist_folder]

For VisualPhishnet (Fork from https://github.com/S-Abdelnabi/VisualPhishNet.git)

Download pretrained model here, Target list embedding, Targetlist labels, Targetlist filename list

cd VisualPhishnet/
python visualphish_manual.py -f [path_to_data_folder] \
                             -r [txt_path_to_save_result]

For StackModel

Download pretrained model here

cd StackModel
python test.py -f [path_to_data_folder] \
               -o [directory_to_save_output]

For URLNet (Fork from https://github.com/Antimalweb/URLNet)

Download pretrained model here

python test.py \
  --model.emb_mode 5 \
  --data.data_dir [path_to_data_folder] \
  --log.checkpoint_dir output_5/checkpoints/model-2430 \
  --log.output_dir [txt_path_to_save_result] \
  --data.word_dict_dir output_5/words_dict.p \
  --data.char_dict_dir output_5/chars_dict.p \
  --data.subword_dict_dir output_5/subwords_dict.p

phishingbaseline's People

Contributors

Stargazers

Watchers

phishingbaseline's Issues

this error can‘t be solved！

when I run this command python emd.py -f benign_sample_30k -m benign -t targetlist_fit
this error ocurred! I really can't deal with it!
Traceback (most recent call last):
File "emd.py", line 302, in
main(args.data_folder, args.mode, args.output_basedir, args.targetlist)
File "emd.py", line 216, in main
signatureB_this, md_colorB_this = get_signature(img2url)
File "emd.py", line 123, in get_signature
img = Image.open(path)
File "/home/flame/anaconda3/envs/baseline/lib/python3.6/site-packages/PIL/Image.py", line 2975, in open
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'targetlist_fit/targetlist_fit_copy_half_rename/homepage.png'

can you tell me why？ thanks very much！

Thresholds about different baselines

Hi, sorry to disturb you again.

I am working on figure 11 of phishintention and could you please share the thresholds you used in different models. For example, the visualphishnet the default threshold is 1.5, but if we want to get the figure 11 we need more thresholds.

Moreover, in the visualphishnet, when testing the screenshots, it seems not get the embedding, it leads to almost all images have the minimal distance with the same target image. I am wondering if there are some problems. And could you please also share the 9334 target screenshots?
Thanks

From: Fujiao

Recommend Projects

lindsey98 / phishingbaseline Goto Github PK

phishingbaseline's Introduction

Phishing baseline

Requirements

Instructions

To run EMD

To run PhishZoo

For VisualPhishnet (Fork from https://github.com/S-Abdelnabi/VisualPhishNet.git)

For StackModel

For URLNet (Fork from https://github.com/Antimalweb/URLNet)

phishingbaseline's People

Contributors

Stargazers

Watchers

Forkers

phishingbaseline's Issues

this error can‘t be solved！

Thresholds about different baselines

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent