Light

mchalupa / dg Goto Github PK

[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.

License: MIT License

CMake 1.48% Shell 0.07% C++ 95.09% C 2.16% Python 1.13% Dockerfile 0.06%

llvm-bitcode llvm-slicer dependence-graph static-analysis static-code-analysis reaching-definitions slicing dependency-graph program-analysis slice

dg's Introduction

DG

DG is a library containing various bits for program analysis. However, the main motivation of this library is program slicing. The library contains implementation of a pointer analysis, data dependence analysis, control dependence analysis, and an analysis of relations between values in LLVM bitcode. All of the analyses target LLVM bitcode, but most of them are written in a generic way, so they are not dependent on LLVM in particular.

Further, DG contains an implementation of dependence graphs and a static program slicer for LLVM bitcode. Some documentation can be found in the doc/ directory.

You can find a high-level description of DG in DG: a program analysis library or DG: Analysis and slicing of LLVM bitcode papers. More detailed information about dg is in the doc/ folder or in my master thesis.

You can write e-mails with issues to [email protected] (or file issue in github).

dg's People

Contributors

Stargazers

Watchers

Forkers

pablo-aledo twang15 justme0 chubbymaggie dtzwill flyingbluefish hainingchen pzread mewbak tomsik68 cslongc michicho thierry-tct mshockwave ufwt gadget114514 channgo2203 gussmith23 daoxiangli wsnavely tum-i4 mansosec destinywck yichaolee naegling moroxus juqian xvitovs1 machiry andrejkubanda loongwalker davidhofman freemanzyq tigerly rainoftime cgy1992 jobnz giraffereversed gaybro8777 marvinmw y3noor byshen eboyu ruide zuxichen wxmandrew mcgrady1 lzaoral kumarak microsvuln mohannadcse dmankins frarese plassticity vwvw yuanjianggit ai4dev ris3-lab bloodycoder u88lx softwareimpacts jcarlson23 googol-lab ross-hr c0de3 polish-polish lizhouchi zenhumany whz20024117 wxzed0 thdusdl1219 ephemerua mahmoudzamani scukaiyuan wuts0301 smartxspark mrconnorkenway aagontuk shouguoyang zustin songfu1983 jiachunpeng tareq97-zz shirleyy-yang crackercat sheisc lituo33 timhe95 boofish 5c4lar canliture wliuxingxiangyu retinadb j-nil peiweihu matrixkoo binyu-xidian-university jj2nu rnshah9 hrshy0629

dg's Issues

fix offsets on 32-bit bitcode

Assertion `Offset.getBitWidth() == DL.getPointerSizeInBits(getPointerAddressSpace()) && "The offset must have exactly as many bits as our pointer."' failed.

Use offset ranges in reaching-defs

Instead of pointer-> nodes set
use records of the form
memory [from : to] -> nodes set
meaning that memory with offset from to to was defined on these nodes. It will simplify it a lot

rename Node::getBasicBlock() to getBBlock()

We use BBlock everywhere, so use it here too

points-to: handle arguments of main

create memory object for argv and inside of it mem object for the "strings" (probably one with UNKNOWN_OFFSET)

bug in computing control dependences

There's bug in computing post-dominance frontiers. Don't know where - the code is from llvm (check if the output is the same as in llvm) and the slicing ignores the frontiers in some cases...

Cannot handle indefinite loops

For indefinite loops (that can be identified while compiling, e. g. while(1) {...} ) there's no post-dominator tree, since it has no end. Therefore there are no control dependencies and the slicer incorrectly removes the loop even when it should be in the slice.
Fix it (probably) by implementing algorithm from [1]

[1] Danicic Sebastian, Barraclough, R. W., et al. A unifying theory of control dependence and its application to arbitrary program structures

call via unknown pointer is unsound

When we call a functoin via unknown pointer, we slice it away. We must assume that any function that has the same prototype can be called

check if we handle bitcast correctly

What if we for example have a int and re-cast it to char to access particular bytes?

make slicing more precise

we need to keep it correct, but we can make it more precise. Take a look at test4, it is correct, but can be sliced much more

if we let an instruction in the function, we should keep entry node

atm we have control dependency only on first block's node and arguments - if we slice away all of this, but leave some other instruction there, the function will go away

store params in an array/vector

we always know how many of them there will be

handle intrinsic functions

this is initialization of structure with function pointers

%struct.callbacks = type { i32 (i32)*, void (i32*)* }
@main.C = private unnamed_addr constant %struct.callbacks { i32 (i32)* @inc, void (i32*)* @zero }, align 8

  %C = alloca %struct.callbacks, align 8
  %0 = bitcast %struct.callbacks* %C to i8* 
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* bitcast (%struct.callbacks* @main.C to i8*), i64 16, i32 8, i1 false)

and there are also zeroing using these function that should be considered as a setting pointer to null

points-to: does not handle negative offsets

if GEP has constant negative offset (e. g. -8), it is converted to unsigned value and cropped, because the value is almost UINT64_MAX. We get UNKOWN_OFFSET, so it should be correct, but it is unprecise.

If we slice away a BB, other BBs still have refrences to it

In post-dominators and post-dominance frontiers

TODO

Just metabug for taking notes

fix handling undefined functions #40
handle function pointers #19
- handle constant global function pointers
- handle intrinsic functions initializers
handle intrinsic functions
we handle them as undefined functions, which should be correct,
but not so precise (e. g. memcpy marks both pointers as modified,
but actually it modifies only one)
handle null pointers
handle constant expressions
full support for globals
interprocedural slicing

split def-use computing into standalone analysis

It is
we can then just replace reaching definition analysis and keep the def-use analysis

points-to: undefined values are ignored by points-to

loads of pointers that were not initialized do not have any points-to set

def-use: use new functions

in addOutParamsEdges on line 295, use getObjectRange instead of manually iterating over the definitions

fix #include

Use <> and set correct include paths instead of "" with relative paths

implement test-suite

Any! Some easy is enough,.

DG2Dot: it can break compilation

There are no guards around parts of code that use CFG (ENABLE_CFG)

slicer fails due to intrinsic functions

fails on this program

char *
remove_newline(char *str)
{
        char *p = str;
        if (!p)
                return NULL;

        while (*p) {
                if (*p == '\n') {
                        *p = 0;
                        break;
                }

                ++p;
        }

        return str;
}

int main(void)
{
        char str[] = "This is a string\n";
        remove_newline(str);
        assert(str[sizeof str - 1] == 0); 
        return 0;
}

add tests

Tests for:

NodesWalk and BBlockWalk (DFS, BFS)
removing nodes and bblocks
- there are still bugs, add new tests

-- Future --

slicing
- tests for recursive programs

filter command: fix parsing id's

filter takes as id anything that starts with a number (probably atoi?) use str_to_uint

(wldbg) f d 1e
Didn't find filter with id '1'

def-use: StoreInst has a bug

If there's ConstantExpr in pointer operand, then the ConstantExpr has no use

store i32 3, getelementptr i32 *@g, (i32 0)

this insturuction uses @g and that should be reflected by the edge from last definition of @g

points-to: global arrays initialization is not implemented

when we have
const char *array[] = { "first", "second" }
then we'd should have points-to info:
node pts-to array
mem[0] pts-to "first"
mem[8] pts-to "second"

but we have only node pts-to array

do not use the fuc*ing auto all the time and everywhere

If that is everywhere the code is not better readable...

undefined functions taking parameters are buggy

we must assume they have modified the memory, otherwise it is unsound. Also I'm not sure we are adding def-use edges properly, because the callinst can be in constant expr (bitcast) and in that case we won't add the def-use edges at all

bug in adding parameter globals

we look only to store and load, but if we have callinst, there can be it too (even via constant expr)

handle function pointers

critical

make return node a parameter of func

so that we can slice more precisely

llvm: rename LLVMDependenceGraph to llvmdg::DependenceGraph

make use of namespaces when we have them! This is not a C.

DGParams contains garbage (re-opened: add tests)

The BB's in DGParams has no key set, but we cannot set them to nullptr, because key does not have to be pointer

unify add/setParameters

for node is is addParameters and for graph it is setParameters. I'm pro setParameters for both

add DependenceGraph generic inteface

Now we have only generic template and we specialize it. So we cannot use the full power of inheritace. Add generic abstract interface, so that we can do something like:

DependenceGraph *dg = new llvmdg::DependenceGraph();
Node *node = new llvmdg::Node(value);

dg->addNode(value);
dg->slice(...)

NOTE: this depends on adding our edges iterator

#include <assert.h>

int a;
void foo(void)
{
        assert(a == 1); 
}

int main(void)
{
        a = 1;
        foo();
        return 0;
}

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.