Git Product home page Git Product logo

mchalupa / dg Goto Github PK

View Code? Open in Web Editor NEW
466.0 20.0 130.0 5.86 MB

[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.

License: MIT License

CMake 1.48% Shell 0.07% C++ 95.09% C 2.16% Python 1.13% Dockerfile 0.06%
llvm-bitcode llvm-slicer dependence-graph static-analysis static-code-analysis reaching-definitions slicing dependency-graph program-analysis slice

dg's Introduction

DG

Linux CI macOS CI

DG is a library containing various bits for program analysis. However, the main motivation of this library is program slicing. The library contains implementation of a pointer analysis, data dependence analysis, control dependence analysis, and an analysis of relations between values in LLVM bitcode. All of the analyses target LLVM bitcode, but most of them are written in a generic way, so they are not dependent on LLVM in particular.

Further, DG contains an implementation of dependence graphs and a static program slicer for LLVM bitcode. Some documentation can be found in the doc/ directory.


You can find a high-level description of DG in DG: a program analysis library or DG: Analysis and slicing of LLVM bitcode papers. More detailed information about dg is in the doc/ folder or in my master thesis.

You can write e-mails with issues to [email protected] (or file issue in github).

dg's People

Contributors

davidhofman avatar dtzwill avatar gadget114514 avatar giraffereversed avatar guilhermeleobas avatar h1994st avatar hotpeperoncino avatar justme0 avatar lzaoral avatar mchalupa avatar moroxus avatar pablo-aledo avatar pzread avatar thierry-tct avatar tomsik68 avatar vmihalko avatar vwvw avatar xlauko avatar xvitovs1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dg's Issues

fix offsets on 32-bit bitcode

Assertion `Offset.getBitWidth() == DL.getPointerSizeInBits(getPointerAddressSpace()) && "The offset must have exactly as many bits as our pointer."' failed.

Use offset ranges in reaching-defs

Instead of pointer-> nodes set
use records of the form
memory [from : to] -> nodes set
meaning that memory with offset from to to was defined on these nodes. It will simplify it a lot

bug in computing control dependences

There's bug in computing post-dominance frontiers. Don't know where - the code is from llvm (check if the output is the same as in llvm) and the slicing ignores the frontiers in some cases...

Cannot handle indefinite loops

For indefinite loops (that can be identified while compiling, e. g. while(1) {...} ) there's no post-dominator tree, since it has no end. Therefore there are no control dependencies and the slicer incorrectly removes the loop even when it should be in the slice.
Fix it (probably) by implementing algorithm from [1]

[1] Danicic Sebastian, Barraclough, R. W., et al. A unifying theory of control dependence and its application to arbitrary program structures

call via unknown pointer is unsound

When we call a functoin via unknown pointer, we slice it away. We must assume that any function that has the same prototype can be called

make slicing more precise

we need to keep it correct, but we can make it more precise. Take a look at test4, it is correct, but can be sliced much more

handle intrinsic functions

this is initialization of structure with function pointers

%struct.callbacks = type { i32 (i32)*, void (i32*)* }
@main.C = private unnamed_addr constant %struct.callbacks { i32 (i32)* @inc, void (i32*)* @zero }, align 8

  %C = alloca %struct.callbacks, align 8
  %0 = bitcast %struct.callbacks* %C to i8* 
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* bitcast (%struct.callbacks* @main.C to i8*), i64 16, i32 8, i1 false)

and there are also zeroing using these function that should be considered as a setting pointer to null

points-to: does not handle negative offsets

if GEP has constant negative offset (e. g. -8), it is converted to unsigned value and cropped, because the value is almost UINT64_MAX. We get UNKOWN_OFFSET, so it should be correct, but it is unprecise.

TODO

Just metabug for taking notes

  • add tests
    • for heap memory handling
    • for recursive programs
    • try e. g. list or similar dynamic ADT
    • floats? it should be working, but check
    • points-to, def-use and reaching defs analysis
    • post-dominance frontiers and control dependencies
  • implement better slicing
    • implement summary edges
    • implement NodesWalk via summary edges
    • slice first BBs away and then slice away instructions from the rest of BBs
      so that we'll keep the projection to llvm consistent
  • control dependencies
    • infinite loops are broken
    • implement better algorithm (that handles inifinite loops)
  • handle instructions
    • varargs
    • phi nodes
    • switch (not handled but looks working)
    • indirectbr
    • [x ] inlined assembler
      • either just give up with inlined assembler or go through it, gather the
        top-level variables that can be modified in it (if it is possible) and
        mark them as modified, so it'll be unprecise but sound
    • inttoptr, ptrtoint
      • workaround it by always setting to unknown location for now
      • make it working properly!
    • cmpxchg (modifies memory)
    • atomicrmw (modifies memory)
    • exctractvalue (aggregate type - like struct, array etc.)
      similar to GEP
    • (exctractelement) vectors
    • (invoke, resume, catchpad) WTH?
  • handle dynamic allocation (malloc, calloc, etc.)
    • has malloc without store a definition? It's like alloca, hadle we like alloca?
    • handle realloc
  • handle carefully unknown offsets in pointers #20 and reaching defs
  • refactor old ugly code

  • fix handling undefined functions #40
  • handle function pointers #19
    • handle constant global function pointers
    • handle intrinsic functions initializers
  • handle intrinsic functions
    we handle them as undefined functions, which should be correct,
    but not so precise (e. g. memcpy marks both pointers as modified,
    but actually it modifies only one)
  • handle null pointers
  • handle constant expressions
  • full support for globals
  • interprocedural slicing

def-use: use new functions

in addOutParamsEdges on line 295, use getObjectRange instead of manually iterating over the definitions

fix #include

Use <> and set correct include paths instead of "" with relative paths

slicer fails due to intrinsic functions

fails on this program

char *
remove_newline(char *str)
{
        char *p = str;
        if (!p)
                return NULL;

        while (*p) {
                if (*p == '\n') {
                        *p = 0;
                        break;
                }

                ++p;
        }

        return str;
}

int main(void)
{
        char str[] = "This is a string\n";
        remove_newline(str);
        assert(str[sizeof str - 1] == 0); 
        return 0;
}

add tests

Tests for:

  • NodesWalk and BBlockWalk (DFS, BFS)
  • removing nodes and bblocks
    • there are still bugs, add new tests

-- Future --

  • slicing
    • tests for recursive programs

filter command: fix parsing id's

filter takes as id anything that starts with a number (probably atoi?) use str_to_uint

(wldbg) f d 1e
Didn't find filter with id '1'

def-use: StoreInst has a bug

If there's ConstantExpr in pointer operand, then the ConstantExpr has no use

store i32 3, getelementptr i32 *@g, (i32 0)

this insturuction uses @g and that should be reflected by the edge from last definition of @g

undefined functions taking parameters are buggy

we must assume they have modified the memory, otherwise it is unsound. Also I'm not sure we are adding def-use edges properly, because the callinst can be in constant expr (bitcast) and in that case we won't add the def-use edges at all

unify add/setParameters

for node is is addParameters and for graph it is setParameters. I'm pro setParameters for both

add DependenceGraph generic inteface

Now we have only generic template and we specialize it. So we cannot use the full power of inheritace. Add generic abstract interface, so that we can do something like:

DependenceGraph *dg = new llvmdg::DependenceGraph();
Node *node = new llvmdg::Node(value);

dg->addNode(value);
dg->slice(...)

NOTE: this depends on adding our edges iterator

handle malloc carefully

most of malloc's pointers have cropped offset, because the type of returned memory is *i8, which is of size 1, and handleGep then cuts off the offsets

DFS does not go over CFG edges

BBlockDFS is OK, but DFS inherits from NodesWalk and NodesWalk uses CD and DD for walking graph, so DFS is not DFS...

add DFSRunner

A class that will take DG and runs DFS on it -- instead of hardcoding the DFS right into the code. Create some Generic class and then derive DFS and BFS from it

cannot run NodesWalk nested

If we run DataFlowAnalysis and in a runOnBlock/Node for some reason e. g. a DFS, then it does not work as expected - it is because both classes are derived from NodesWalk and nodes walk has only one global walk_run_counter.

We could fix it by replacing walk_run_counter by the address of the analysis (it is a number too), but two analyses can have the same address (new -> delete -> new)...

DFS looks like BFS

According to dfsorder numbers printed by DG2Dot it looks like BFS or some random order, not DFS

Make points-to analysis back-end independent

Let analysis use pointer state subgraph that will be build by backend but in some independent, generic way, so that points-to analysis won't need to know the implementation of nodes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.