wm-semeru / securereqnet Goto Github PK

We present a novel approach, called SecureReqNet, for automatically identifying whether issues in bug or issue tracking systems describe security related content that should be given careful attention. Our approach consists of a two-phase deep learning architecture that operates purely on the natural language descriptions of issues. The first phase of our approach learns high dimensional sentence embeddings from hundreds of thousands of descriptions extracted from software vulnerabilities listed in the CVE database and issue descriptions extracted from open source projects using an unsupervised learning process. The second phase then utilizes this semantic ontology of embeddings to train a deep convolutional neural network capable of predicting whether a given issue contains security- related information.

License: Apache License 2.0

Python 0.64% Jupyter Notebook 99.35% Dockerfile 0.01% Shell 0.01% Makefile 0.01%

securereqnet's People

Contributors

Stargazers

Watchers

Forkers

zeovan happygirlzt bxclib2 rmclanton

securereqnet's Issues

TFX Pipeline StatisticsGen

Configure TFX Pipeline StatisticsGen

Split ExampleGen input so that it may be ingested

Note: if it is decided to go with TFRecords, close this issue

Documentation for Endpoint

Train a version of alpha_securereqnet for the user to load

StatisticsGen/SchemaGen Documentation

Document StatisticsGen and SchemaGen in pipeline.py

I wonder where I could find the implementation of vectorize_sentence

Hi, danaderp. Many thanks for your work. I have a question, in the notebook "alpha_securereqnet
.ipynb", there is a line from vectorize_sentence import Embeddings, but I cannot figure out where to import this module. Is it possible that you did not upload this file? Thanks for your help! Best regards!

Deploy Flask Backend to Production Server

Current Flask backend is a development server - deploy a production server on an open port to allow querying from anywhere

Implement a train model function

Put a function to train a model in either preprocessing or the model files

Split preprocessing, model architecture and model evaluation into seperate files

Create ipynb's for preprocessing and model evaluation capable of loading and exporting data. Put all versions of securereqnet models into one file

Testing for Alpha SecureReqNet

Determine and implement tests for Alpha SecureReqNet to verify proper pretrained accuracy and that training works properly

Convert Python files to IPython notebooks

Specifically the Python files in the utils folder, these should be exportable with nbdev

TFX Pipeline Evaluator

Configure TFX Pipeline Evaluator

Configure TFX endpoint

Set up a Tensorflow Extended endpoint for internal use and as an API bridge to other team(s)

Test Suite for Processing_Dataset class

Determine and provide test cases for Processing_Dataset class to verify functionality

TFX Pipeline ExampleGen

Configure TFX Pipeline ExampleGen

Unit Testing: Transform Component

Serving on SEMERU servers

Serve Alpha SecureReqNet on SEMERU servers

Unit Testing for ExampleGen Component

Create unit tests for ExampleGen, verify TFRecords are being create properly

Add documentation and comments to files in utils

Remove or archive files without project functionality

02_Statistical Test and 03_Clustering seem to be tests of code necessary for running SecureReqNet, that has now been used elsewhere. Whether these have any use should be verified using coverage tests or by asking the project manager

TFX Pusher Component

Pushes blessed models to TFX serving

Update index.ipynb

As the project is migrated to nbdev, continually update index.ipynb with documentation and code examples

Put model file into nbdev and export functions

The file that contains model architectures should implement functions to allow users to load the prebuilt models

TFX Pipeline Trainer

Configure TFX Pipeline Trainer

Refactor settings.ini for WM_SEMERU/SecureReqNet

This is a last step, to be done at the end of development. While working on a development fork it is necessary to change the path to the library documents. This needs to be undone before or after merging back to the main project in order for the documents to generate properly.

Update project diagram

As functionality is added to securereqnet in each sprint the project diagram should reflect the additions

Setup settings.ini

The description, project name and keyword section need to be uncommented. The official description should be added and any keywords for nbdev decided on. The project will need to be named in the way that it will be named for package installation. index.ipynb specifically will need this information to compile correctly.

Unit Testing: Trainer

Testing for SecureReqNet Shallow

Determine and implement tests for SecureReqNet Shallow to verify proper pretrained accuracy and that training works properly

TFX Pipeline SchemaGen

Configure TFX Pipeline SchemaGen

Identify and correct problems with augmented_dataset_csv/issues

The ExampleGen component of the TFX pipeline officially supports few data formats for ingestion. As we have decided to go with CSV, this requires a conversion of the contents of augmented_dataset/ from .txt to .csv.

An initial conversion has been made in augmented_dataset_csv/, but beam_dag_runner.py rejects the files which appear to be of the correct format.

It would also be conducive to merge the rows in ground_truth.csv with their respective issues, but this crashes my machine for an unknown reason.