Git Product home page Git Product logo

phishingbaseline's Introduction

Phishing baseline

Implementations of phishing detection and identification baselines

  • EMD: Fu, A. Y., Wenyin, L., & Deng, X. (2006). Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD). IEEE transactions on dependable and secure computing, 3(4), 301-311. This paper uses Earth Mover Distance to detect the similarity between two webpage screenshots.

  • Phishzoo: Afroz, S., & Greenstadt, R. (2011, September). Phishzoo: Detecting phishing websites by looking at them. In 2011 IEEE fifth international conference on semantic computing (pp. 368-375). IEEE. This work applies SIFT algorithm to quantify the similarity between two webpage screenshots.

  • VisualPhishnet: Abdelnabi, S., Krombholz, K., & Fritz, M. (2020, October). VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (pp. 1681-1698). This work trains deep learning Siamese model to compare two webpage screenshots.

  • StackModel: Li, Y., Yang, Z., Chen, X., Yuan, H., & Liu, W. (2019). A stacking model using URL and HTML features for phishing webpage detection. Future Generation Computer Systems, 94, 27-39.

  • URLNet: Le, H., Pham, Q., Sahoo, D., & Hoi, S. C. (2018). URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv preprint arXiv:1802.03162.

Requirements

python == 3.6
opencv-python == 3.4.2.17
opencv-contrib-python == 3.4.2.17
tensorflow == 1.13.1

Instructions

The data folder should be organized in this format

To run EMD

cd EMD/ 
python emd.py -f [path_to_data_folder] \
             -m [benign|phish] # testing mode, which is the ground-truth label for the folder \
             -t [path_to_targetlist_folder]

To run PhishZoo

cd PhishZoo/
python phishzoo.py -f [path_to_data_folder] \
                   -m [benign|phish] # testing mode, which is the ground-truth label for the folder \
                   -t [path_to_targetlist_folder]

Download pretrained model here, Target list embedding, Targetlist labels, Targetlist filename list

cd VisualPhishnet/
python visualphish_manual.py -f [path_to_data_folder] \
                             -r [txt_path_to_save_result]

For StackModel

Download pretrained model here

cd StackModel
python test.py -f [path_to_data_folder] \
               -o [directory_to_save_output]

Download pretrained model here

python test.py \
  --model.emb_mode 5 \
  --data.data_dir [path_to_data_folder] \
  --log.checkpoint_dir output_5/checkpoints/model-2430 \
  --log.output_dir [txt_path_to_save_result] \
  --data.word_dict_dir output_5/words_dict.p \
  --data.char_dict_dir output_5/chars_dict.p \
  --data.subword_dict_dir output_5/subwords_dict.p 

phishingbaseline's People

Contributors

lindsey98 avatar

Stargazers

Anton Nikolaev avatar Gustavo Gawryszewski avatar  avatar  avatar  avatar Sahil pawar avatar  avatar  avatar FrozenW avatar Olivier Thereaux avatar  avatar  avatar  avatar chenhao avatar Rizka avatar  avatar

Watchers

 avatar

phishingbaseline's Issues

this error can‘t be solved!

when I run this command python emd.py -f benign_sample_30k -m benign -t targetlist_fit
this error ocurred! I really can't deal with it!
Traceback (most recent call last):
File "emd.py", line 302, in
main(args.data_folder, args.mode, args.output_basedir, args.targetlist)
File "emd.py", line 216, in main
signatureB_this, md_colorB_this = get_signature(img2url)
File "emd.py", line 123, in get_signature
img = Image.open(path)
File "/home/flame/anaconda3/envs/baseline/lib/python3.6/site-packages/PIL/Image.py", line 2975, in open
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'targetlist_fit/targetlist_fit_copy_half_rename/homepage.png'

can you tell me why? thanks very much!

Thresholds about different baselines

Hi, sorry to disturb you again.

I am working on figure 11 of phishintention and could you please share the thresholds you used in different models. For example, the visualphishnet the default threshold is 1.5, but if we want to get the figure 11 we need more thresholds.

Moreover, in the visualphishnet, when testing the screenshots, it seems not get the embedding, it leads to almost all images have the minimal distance with the same target image. I am wondering if there are some problems. And could you please also share the 9334 target screenshots?
Thanks

From: Fujiao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.