We present a novel approach, called SecureReqNet, for automatically identifying whether issues in bug or issue tracking systems describe security related content that should be given careful attention. Our approach consists of a two-phase deep learning architecture that operates purely on the natural language descriptions of issues. The first phase of our approach learns high dimensional sentence embeddings from hundreds of thousands of descriptions extracted from software vulnerabilities listed in the CVE database and issue descriptions extracted from open source projects using an unsupervised learning process. The second phase then utilizes this semantic ontology of embeddings to train a deep convolutional neural network capable of predicting whether a given issue contains security- related information.
Hi, danaderp. Many thanks for your work. I have a question, in the notebook "alpha_securereqnet
.ipynb", there is a line from vectorize_sentence import Embeddings, but I cannot figure out where to import this module. Is it possible that you did not upload this file? Thanks for your help! Best regards!
02_Statistical Test and 03_Clustering seem to be tests of code necessary for running SecureReqNet, that has now been used elsewhere. Whether these have any use should be verified using coverage tests or by asking the project manager
This is a last step, to be done at the end of development. While working on a development fork it is necessary to change the path to the library documents. This needs to be undone before or after merging back to the main project in order for the documents to generate properly.
The description, project name and keyword section need to be uncommented. The official description should be added and any keywords for nbdev decided on. The project will need to be named in the way that it will be named for package installation. index.ipynb specifically will need this information to compile correctly.
The ExampleGen component of the TFX pipeline officially supports few data formats for ingestion. As we have decided to go with CSV, this requires a conversion of the contents of augmented_dataset/ from .txt to .csv.
An initial conversion has been made in augmented_dataset_csv/, but beam_dag_runner.py rejects the files which appear to be of the correct format.
It would also be conducive to merge the rows in ground_truth.csv with their respective issues, but this crashes my machine for an unknown reason.
The read_data's method get_test_and_training doesn't correctly get the data. It needs to be modified to look into the zipped data file. The path from root to the content folder is ./data/augmented_dataset_augmented_dataset.zip
Before moving 08_alpha_securereqnet into nbdev it needs changes to allow it to run with the current code architecture. This issue relies on the read_data issue to be completed first
Create an initial Flask backend prototype once port forwarding is complete. Prototype endpoints will mirror the input of the TFX Serving models. Output will be translated to True/False. Once Transform is complete.
Vastly speed up preprocessing/training by utilizing the multi-threading functionality of python/tensorflow. Make this optional in case the user lacks the necessary computing resources.