bbowman410 / vgraph Goto Github PK
View Code? Open in Web Editor NEWVulnerable code clone detection technique. Published in EuroS&P2020
Home Page: https://bbowman410.github.io/
Vulnerable code clone detection technique. Published in EuroS&P2020
Home Page: https://bbowman410.github.io/
Hi @bbowman410
I wish to derive at optimum value for the thresholds.
The paper talks about looking at some cumulative distribution function: "we compute the Cumulative Distribution Function (CDF) of the context triplet scores, and the positive triplet scores, for each VGRAPH against all other functions in the database."
Is it done using plotting "cvg_score", "pvg_score" for target db created on source repos in repos.config ? or something else.
OR
Is it done over Source Repositories (repos.config) only? We don't have to involve target repos (on which we run find_vulns.py) for determining thresholds right ?
I would request some more explanation around that.
Is there any particular script which would plot the CDF so that we can check on optimum value of thresholds on the plotted graph ?
Hi,
Thanks for sharing your code. I encounter a issue.
When I run evaluate_vgraph.py. The error comes out.
No such file or directory: './manual_labels.txt'.
Could you supply a way to obtain ./manual_labels.txt
Hi @bbowman410
Can you share steps to install this particular version of joern. (v. 0.2.5)
Also, how to setup joern-parse
Many Thanks
Hi, I was trying out your project, after running find_vulns.py and generating hits.txt, we get the name and path of old file having the original vulnerability and new one in which the similar vulnerability was found.
But the files contains code of more 1k lines, how can we localize the vulnerable part?
Hi @bbowman410
Few Quick Questions.
Step 1: Mine Patch and Vulnerable Source Code (.c/.cpp) and identified Vulnerable Function Names
Question 1: What is the logic to detect which functions names that were modified by a commit in a file.
Step 2: Mined Source Code of Patched and Vulnerable functions and generate CPG + Triplets for those CPG
Question 2: What i see from generated files, apart from CPG and Triplet there is also a .vec file , what is the use of .vec file what does it represent (Can't find reference of any vec file in paper).
In short i was expecting two files for each function i.e, ".gpickle" for CPG and ".triples" for triplets, but i see another file which is ".vec" file, what it is ?
Step 3: Convert Triplets to VGraph (Positive Triplet + Negative Triplet + Context Triplet.)
Question 3: Again i was expecting three files to be generated for each functions, but i see total 6 files being generated for each function. What is the explanation / use of other three files in further steps of triplet matcher.
"_pvg.pkl", "_nvg.pkl", "_cvg.pkl", "_p.pkl", "_v.pkl" and "_vec.pkl"
I am going through code to understand the same, but i thought why not ask and clarify.
Many Thanks
Hi @bbowman410
While generating target graph db for any c/cpp repo cloned from github, i get following error with line 49 "parsed_to_networkx.py"
Nodes: parsed_target/content/linux/mm/rmap.c/nodes.csv
Edges: parsed_target/content/linux/mm/rmap.c/edges.csv
Output: /content/VGraph/data/target_graph_db
Traceback (most recent call last):
File "parsed_to_networkx.py", line 49, in
write_graph(g['graph'], output_dir + '/' + base_dir, g['name'])
TypeError: list indices must be integers or slices, not str
Any thoughts ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.