Git Product home page Git Product logo

trailofbits / vast Goto Github PK

View Code? Open in Web Editor NEW
337.0 25.0 24.0 4.38 MB

VAST is an experimental compiler pipeline designed for program analysis of C and C++. It provides a tower of IRs as MLIR dialects to choose the best fit representations for a program analysis or further program abstraction.

Home Page: https://trailofbits.github.io/vast/

License: Apache License 2.0

CMake 5.78% C++ 73.96% Python 1.30% C 18.18% Dockerfile 0.10% Shell 0.67%
clang mlir program-analysis c compiler-frontend compilers cpp intermediate-representation

vast's Introduction

Build & Test C++ Linter License

VAST: MLIR for Program Analysis

VAST is a library for program analysis and instrumentation of C/C++ and related languages. VAST provides a foundation for customizable program representation for a broad spectrum of analyses. Using the MLIR infrastructure, VAST provides a toolset to represent C/C++ program at various stages of the compilation and to transform the representation to the best-fit program abstraction.

For further information check trailofbits.github.io/vast/.

Try VAST

You can experiment with VAST on compiler explorer. Feel free to use VAST and produce MLIR dialects. To specify the desired MLIR output, utilize the -vast-emit-mlir=<dialect> option. Currently, the supported options are:

  • -vast-emit-mlir=hl to generate high-level dialect.
  • -vast-emit-mlir=llvm to generate LLVM MLIR dialect.

Refer to the vast-front documentation for additional details.

License

VAST is licensed according to the Apache 2.0 license. VAST links against and uses Clang and LLVM APIs. Clang is also licensed under Apache 2.0, with LLVM exceptions.

This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Distribution Statement A โ€“ Approved for Public Release, Distribution Unlimited

vast's People

Contributors

artemdinaburg avatar frabert avatar jezurko avatar kumarak avatar lkorenc avatar pappasbrent avatar pgoodman avatar sinotca529 avatar sschriner avatar surovic avatar xlauko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vast's Issues

Create a low-level dialect

We eventually want to lower the high-level dialect to an llvm dialect. A low-level dialect should be in between and closer to the llvm dialect. The difference to llvm dialect is that vast low-level dialect will keep additional vast information, e.g., provenance. The low-level dialect might utilise parts of llvm dialect.

  • create dialect definition
  • create operation/types definitions
  • register low-level dialect to the vast tool

Implement high-level basic C types

Add definitions of builtin types to high-level dialect, and implement transformation from clang ast:

  • integer types
  • floating point types
  • pointer types
  • type qualifiers

SymbolTables & Scopes

Redesign scoping to reflect actual C/C++ scope rules for symbols:
i.e., following operations should become a SymbolTable:

  • TranslationUnit (Module)
  • Functions/Methods
  • Every block
  • Capture
  • Record
  • Enums
  • Namespace

Create a command-line tool to list symbol uses, to test symbol API.

ABI compatible function prototype lowering

We want to emit ABI compatible low-level dialect. In clang, this is performed before lowering to LLVM. We want to give the same result when lowering from high-level in the low-level dialect. Lowering requires data layout modeling first.

Problems to solve:

  • implement mlir type transformation pass that will split types/merge types/promote to allocas according to ABI rules
  • keep provenance to original type

Add missing Symbol interfaces.

Each operation that defines a new symbol should implement this interface, otherwise the mlir::SymbolTable won't be usable.

hl: Add support for literals.

Implement lowering of:

  • clang::CharacterLiteral
  • clang::CompoundLiteralExpr
  • clang::FloatingLiteral
  • clang::IntegerLiteral
  • clang::StringLiteral

Create provenance metadata API

The task is to provide an API to attach and obtain metadata from MLIR primitives.
Provenance is tracked as a unique 64-bit ID that relates source location (clang-ast node) and MLIR primitive:

using primitive_t = DeclOrStmtOrType;
using meta_t = uint64_t;

void add_meta(primitive_t prim, meta_t meta);

primitive_t get_primitive(meta_t meta);

meta_t get_meta(primitive_t prim);

hl: Add trait to allow multiple bool-like types

Currently compare allows as its result only hl.bool however if we lower types, we would want it to return i1. This is currently not possible as it has hardcoded requirement of hl.bool.

Proposed solution: Add trait that accepts both i1 and hl.bool.

Implement function lowering from clang ast

Implement transformation that will generate high-level function representation from clang ast.

  • function prototype lowering

For now we plan to use mlir::FunctionType, until we reach it's limits.

cf: Add support for control flow lowering.

Implement lowering to standard dialect of high level control flow structures from:

https://github.com/trailofbits/vast/blob/master/include/vast/Dialect/HighLevel/HighLevelCF.td

  • hl::IfOp
  • hl::WhileOp
  • hl::ForOp
  • hl::DoOp
  • hl::BreakOp
  • hl::ContinueOp
  • hl::SwitchOp, hl::CaseOp, hl::DefaultOp

Facade of LowerControlFlowPass is prepared here:

https://github.com/trailofbits/vast/blob/master/lib/vast/Dialect/HighLevel/Transforms/LowerControlFlow.cpp

To run control flow lowering use vast-opt tool on high-level mlir:

vast-opt -vast-hl-lower-cf <file.mlir>

To transform source directly pipe result from vast compiler:

vast-cc --from-source <source.c> | vast-opt -vast-hl-lower-cf

Assert failure with lowering BinaryOperator

I see assert failure while lowering the BinaryOperator stmt in make_value_builder. numResults in mlir::Operation is zero.

BinaryOperator 0x11906e460 'unsigned char' '='
|-UnaryOperator 0x11906e3f8 'unsigned char' lvalue prefix '*' cannot overflow
| `-UnaryOperator 0x11906e3e0 'unsigned char *' postfix '++'
|   `-MemberExpr 0x11906e3b0 'unsigned char *' lvalue ->_p 0x11a0dd218
|     `-ImplicitCastExpr 0x11906e398 'FILE *' <LValueToRValue>
|       `-DeclRefExpr 0x11906e378 'FILE *' lvalue ParmVar 0x11906ded0 '_p' 'FILE *'
`-ImplicitCastExpr 0x11906e448 'unsigned char' <IntegralCast>
  `-ImplicitCastExpr 0x11906e430 'int' <LValueToRValue>
    `-DeclRefExpr 0x11906e410 'int' lvalue ParmVar 0x11906de58 '_c' 'int'
    frame #3: 0x00007ff8095ac0be libsystem_c.dylib`__assert_rtn + 314
    frame #4: 0x00000001015d2d0d mx-index`auto vast::hl::CodeGenVisitor::make_value_builder(this=0x0000700003ebc190, bld=0x0000700003ebd3d8, loc=Location @ 0x0000700003ebbd78)::'lambda'(auto&, auto)::operator()<mlir::OpBuilder, mlir::Location>(auto&, auto) const at HighLevelVisitor.hpp:494:17
    frame #5: 0x00000001015d2c8a mx-index`void llvm::function_ref<void (mlir::OpBuilder&, mlir::Location)>::callback_fn<vast::hl::CodeGenVisitor::make_value_builder(callable=123145368093072, params=0x0000700003ebd3d8, params=Location @ 0x0000700003ebbdb8)::'lambda'(auto&, auto)>(long, mlir::OpBuilder&, mlir::Location) at STLExtras.h:177:12
    frame #6: 0x0000000106c7f5c9 mx-index`llvm::function_ref<void (mlir::OpBuilder&, mlir::Location)>::operator(this=0x0000700003ebbe90, params=0x0000700003ebd3d8, params=Location @ 0x0000700003ebbe08)(mlir::OpBuilder&, mlir::Location) const at STLExtras.h:200:12
    frame #7: 0x0000000106c7f539 mx-index`vast::hl::detail::build_region(bld=0x0000700003ebd3d8, st=0x0000700003ebc060, callback=vast::hl::BuilderCallback @ 0x0000700003ebbe90)>) at HighLevelOps.cpp:29:17
    frame #8: 0x0000000106c823e7 mx-index`vast::hl::ExprOp::build(bld=0x0000700003ebd3d8, st=0x0000700003ebc060, rty=vast::hl::Type @ 0x0000700003ebbf28, expr=vast::hl::BuilderCallback @ 0x0000700003ebbf18)>) at HighLevelOps.cpp:325:9
    frame #9: 0x00000001015fa501 mx-index`auto mlir::OpBuilder::create<vast::hl::ExprOp, mlir::Type, vast::hl::CodeGenVisitor::make_value_builder(clang::Stmt*)::'lambda'(auto&, auto)>(this=0x0000700003ebd3d8, location=Location @ 0x0000700003ebc050, args=0x0000700003ebc1a0, args=0x0000700003ebc190)::'lambda'(auto&, auto)&&) at Builders.h:402:5
    frame #10: 0x00000001015fa269 mx-index`auto vast::hl::HighLevelBuilder::make<vast::hl::ExprOp, mlir::Location&, mlir::Type&, vast::hl::CodeGenVisitor::make_value_builder(clang::Stmt*)::'lambda'(auto&, auto)&>(this=0x0000700003ebd3d0, args=0x0000700003ebc298, args=0x0000700003ebc288, args=0x0000700003ebc270)::'lambda'(auto&, auto)&) at HighLevelBuilder.hpp:35:28

Setup CI

Setup github actions to build and run lit tests.

Prepare for LWIP challenge

Challenge info:

Isolate Domain objects and operations

Isolate IPv4 domain - discover all structures and functions that uses IPv4 structure as arguments,
local variables, or fields.

Refine DSL

  1. Define DSL for parsing received IPv4 packets.

  2. Change IPv4 checksum to use crc16

  3. Change IPv4 checksum to use crc32

  4. Add means of tunneling by encapsulating IP over IP

Address space when lowering pointers

Currently conversion of hl::PointerType is unranked mlir::MemRef. It does however require address space information - by default 0 is used, however this may be something that will require attention down the road.

Verify value categories of cast operations.

Verify that cast operation produce correct value categories. Add constraints (type-traits) on conversion of lvalues and non-lvalues.

This will require decoupling of cast kinds to multiple operations.

Forward Typedefs

Allow use of a typedef in an mlir document without defining it in that mlir document

  • how this means having some kind of underlying type mapping/sizing/something for it
    maybe type aliases are actually special, though, maybe I can relax on them not needing definitions
    but I'd like to be able to use them as if they were does a "hinted sizeof" and "hinted alignof" make sense?
  • it's a symbolic expression with a hinted concrete value
  • from an api standpoint, any time a (re)definition type if synthesized, I'd want to know an alternative, wholly crazy notion, could be that we abstract structure field access entirely by function call, but I can see how that would also be a big ask, and not necessarily any better
  • though field access by function call would have some desirable lowering and provenance-maintenance properties
    specifically, field access would operate on an opaque pointer, and so we could lower down to the llvm-like levels, where we lose the true "structure" of a structure, but more accurately maintain the access patterns
  • it'd also be interesting from an defs/use perspective of "finding everywhere a field is accessed"
    unless there is a way of using some kind of token value in place of a gep index like llvm does it?

Figure out re-usability of existing transformation

When we transform/translate IR in vast, there are metadata we want to keep attached to operations to help us keep track of provenance. In our own passes, this is fine as we can ensure they are emitted properly. However, there is a lot of transformations that are not written by us and yet we would like to use them (such as std -> llvm for example).

If the external transformation works in a 1:N fashion only, we should be able to re-use it by utilising the Listener class that can be registered into Builder (from which the rewriter should inherit).

  • I am not sure Listener is 100% reliable (it does not seem to be used much by the codebase)
  • We must be certain that the external transformation always creates/deletes new nodes via the API it should.

The core idea is that each newly created node will be given the attribute of its origin by the Listener class. This will require a possibly verbose wrapper around the external transformation, but still better than rewriting it from scratch.

In case transformation is N:M (from multiple instruction another bunch is created), I am not sure what to do.

Implement data layout modeling

In the lowering procedure from high-level dialect to low-level dialect, we need to know the source data layout to model ABI compatible code. We will leverage MLIR's data layout utilities:
https://mlir.llvm.org/docs/DataLayout/

Problems to solve:

  • extraction of data layout from pasta source
  • representation of data layout in vast dialects

Lower enum types

Will require more extensive modification of the IR since there are some named constant values.

Exception on getting the symbol attributes for direct call expr

On lowering the Function decl, it throws exception if finds an expr with direct call to the function. It fails to find the symbol reference attribute.

  * frame #0: 0x0000000106eed0ac mx-index`mlir::detail::StorageUserBase<mlir::DictionaryAttr, mlir::Attribute, mlir::detail::DictionaryAttrStorage, mlir::detail::AttributeUniquer, mlir::SubElementAttrInterface::Trait>::getImpl(this=0x0000000000000038) const at StorageUniquerSupport.h:157:68
    frame #1: 0x0000000106eed085 mx-index`mlir::DictionaryAttr::getValue(this=0x0000000000000038) const at BuiltinAttributes.cpp.inc:147:76
    frame #2: 0x0000000106ef0f3c mx-index`mlir::DictionaryAttr::getNamed(this=0x0000000000000038, name=(Data = "sym_name", Length = 8)) const at BuiltinAttributes.cpp:187:37
    frame #3: 0x0000000106ef0e75 mx-index`mlir::DictionaryAttr::get(this=0x0000000000000038, name=(Data = "sym_name", Length = 8)) const at BuiltinAttributes.cpp:177:35
    frame #4: 0x0000000106dc9a49 mx-index`mlir::Operation::getAttr(this=0x0000000000000000, name=(Data = "sym_name", Length = 8)) at Operation.h:332:52
    frame #5: 0x00000001070072e5 mx-index`mlir::StringAttr mlir::Operation::getAttrOfType<mlir::StringAttr>(this=0x0000000000000000, name=(Data = "sym_name", Length = 8)) at Operation.h:338:12
    frame #6: 0x0000000106ee3bce mx-index`mlir::Builder::getSymbolRefAttr(this=0x0000700009414338, value=0x0000000000000000) at Builders.cpp:213:14
    frame #7: 0x0000000106ca4b1b mx-index`vast::hl::CallOp::build(odsBuilder=0x0000700009414338, odsState=0x0000700009412e00, callee=FuncOp @ 0x0000700009412cc8, operands=ValueRange @ 0x0000700009412cb8) at HighLevel.cpp.inc:4926:50
    frame #8: 0x00000001057b9fdc mx-index`vast::hl::CallOp mlir::OpBuilder::create<vast::hl::CallOp, mlir::FuncOp, llvm::SmallVector<mlir::Value, 2u> >(this=0x0000700009414338, location=Location @ 0x0000700009412df0, args=0x0000700009412f48, args=0x0000700009412f88) at Builders.h:402:5
    frame #9: 0x00000001057b9ccb mx-index`auto vast::hl::HighLevelBuilder::make<vast::hl::CallOp, mlir::Location&, mlir::FuncOp&, llvm::SmallVector<mlir::Value, 2u>&>(this=0x0000700009414330, args=0x0000700009413080, args=0x0000700009413070, args=0x00007000094130a8) at HighLevelBuilder.hpp:35:28
    frame #10: 0x0000000105782fd7 mx-index`mlir::Value vast::hl::HighLevelBuilder::make_value<vast::hl::CallOp, mlir::Location&, mlir::FuncOp&, llvm::SmallVector<mlir::Value, 2u>&>(this=0x0000700009414330, args=0x0000700009413080, args=0x0000700009413070, args=0x00007000094130a8) at HighLevelBuilder.hpp:40:20
    frame #11: 0x0000000105782eea mx-index`vast::hl::CodeGenVisitor::VisitDirectCall(this=0x0000700009414328, expr=0x000000011b03e060) at HighLevelVisitor.cpp:458:24
    frame #12: 0x00000001057831f8 mx-index`vast::hl::CodeGenVisitor::VisitCallExpr(this=0x0000700009414328, expr=0x000000011b03e060) at HighLevelVisitor.cpp:470:36

I see the following call expr:

CallExpr 0x11a0f2ed0 'int'
|-ImplicitCastExpr 0x11a0f2eb8 'int (*)(int, int)' <FunctionToPointerDecay>
| `-DeclRefExpr 0x11a0f2e30 'int (int, int)' Function 0x11a0f28b0 'internal' 'int (int, int)'
|-ImplicitCastExpr 0x11a0f2f00 'int' <LValueToRValue>
| `-DeclRefExpr 0x11a0f2e50 'int' lvalue Var 0x11a0f2c58 'one' 'int'
`-ImplicitCastExpr 0x11a0f2f18 'int' <LValueToRValue>
  `-DeclRefExpr 0x11a0f2e70 'int' lvalue Var 0x11a0f2d10 'two' 'int'

`time_t` tracking

Rethink time_t tracking, and just how "deep" down the stack we can reasonably keep it around or ensure that we know a given i32 is actually a time_t.

Unnamed function arguments

Currently error is emitted (and possibly the module is therefore ill-formed)
Input:
void foo(unsigned) {}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.