fraunhofer-aisec / cpg Goto Github PK

A library to extract Code Property Graphs from C/C++, Java, Go, Python, Ruby and every other language through LLVM-IR.

Home Page: https://fraunhofer-aisec.github.io/cpg/

License: Apache License 2.0

Kotlin 91.70% Java 0.90% C++ 1.32% C 0.90% Go 0.32% Shell 0.08% Python 0.12% TypeScript 0.13% JavaScript 0.02% LLVM 4.51% Ruby 0.01%

analysis c code code-property-graph cpg cpp golang graph java llvm-ir python ruby

cpg's People

Contributors

Stargazers

Watchers

cpg's Issues

EOG of IfStatement is incorrect

Consider the following simple snippet:

if (inlen <= 0)
{
   EVP_EncryptUpdate(myctx, outbuf, &outlen, inbuf, inlen);
}
EVP_EncryptFinal_ex(ctx, outbuf, &outlen);

It results in the following EOG:

There is a single EOG edge from the IfStatement to ctx (the argument of the function call after the then block), but there are no EOG edges leading into the then block). That is, code within then and else blocks never reachable by EOG edges.

This is incorrect and should be fixed as follows:

in parallel to CONDITION, there should be an EOG edge from IfStatement to the condition (the opposite EOG edge from condition to IfStatement is correct and can remain)
from the IfStatement there should be EOG edges
- to the first expression in the then block (this one is missing)
- to the first expression in the else block or if no explicit else exists, to the first expression after the IfStatement.

It should be guaranteed that an IfStatement has at least two outgoing EOG edges, plus the one to the condition and all nodes must be reachable via EOG

Add support to configure symbols

Mostly related for C++, already supported by CDT it seems, just need to make it configurable in a TranslationConfiguration.

Refers-to broken

E.g. for testLocalVarsCpp in branch julian-16-wrong-fields, there are FieldDeclarations for x, foo, and a.

The problem might be in VariableUsageResolver.resolveLocalVarUsage

FQN of types and method names

Seems to be very inconsistent. Some examples (can expand on this later):

C++ namespaces, such as std::string are converted to Java-style std.string. Is this intentional? Looks very weird
FQNs to methods in Java are indistinguishable to FQNs of types, i.e. the FQN for println is java.io.PrintStream.println. We could stick to the Javadoc-style of java.io.PrintStream#println. On the other hand C++ also does not really distinguish between types and method references.

Auto-Release Strategy

Once (in the future) we are stable enough so that we do not need to fiddle around on master, @titze and I discussed the following strategy:

All development work should be done on branches and merged into master using pull requests
The master branch will automatically release to maven central and increase its version number, i.e. 1.2.X or 1.X. If more functionality was added, the developer is in charge of manually bumping the version number to a major version or similar
Optionally, also a GitHub release could be created, although it will not add too much benefit

Any thoughts on this? I am not 100 % sure yet how we can achieve the automatic versioning, but i think it should be doable.

The expression -2147483648 cannot be parsed

int narf = -2147483648;

cannot be parsed, as Javaparser thinks the right side is a UnaryExpression (MINUS) containing a IntegerLiteralExpression (2147483648).
We try to parse the Integer which is not in the Integer Range anymore -> NumberFormatException

We could directly parse a Unary containing an Integer as a negative Integer, but this would be a bit of a hacky fix

Improve test coverage again

The test coverage dropped to less than 70 % after I removed the regression tests. However, they were not really testing anything, just checking if the CPG did not crash.

I will aim to write some additional unit tests probably after Christmas.

Nodes referring to Java fields are missing parent and location

The following snippet instantiates a field cipher in an incorrect/unsafe way. However, the node referring to cipher does not have a parent, nor a range and consequently no location.

public class SealObject {
    private final Cipher cipher;

    public SealObject(String Password) throws Exception  {
        cipher = Cipher.getInstance("SomeUnsafeCipher");
    }
}

DeclarationHandler#handleFunctionDefinition adds dummy return statement even though it already contains a return statement

Seems to be a bug that an extra dummy return statement is added, if the function contains exactly one statement.

InitializerHandler::handleConstructorInitializer returns a ConstructExpression

The handleConstructorInitializer handles a CPPASTConstructorInitializer this is an initialiser in the declaration of an constructor (and not the institution of it), and it incorrectly returns an ConstructExpression.

It is not quite clear what a ConstructExpression actually seems to be. In the ExpressionHandler::handleObjectCreationExpr it seems to be used as an initializer of a NewExpression.

DFG is not control-flow, context, or field sensitive

The samples show that the DFG is not control flow sensitive.
int a = 1; if(args.length > 3){ a = 2; }else{ System.out.println(a); }
The usage of a in the print statement has a dfg edge to the declaration. The branch where a is used and where a gets the value 2 are mutually exclusive. There should be a DFG path from 1 to a but not from 2 to a.
In other words, the DFG edges currently are drawn intra-transactionally and are not transitive: paths of edges can not be considered to be true positive data-flows.
We could change the DFG-construction to follow the EOG and keep the last definitions of a variable, e.g. right-hand-side of an assignment, and draw edges between those.
So between the right-hand-side of a definition and the references in an expression where the variable is used.
We have ongoing work where context and field-sensitivity will be solved as well, so this issue could be solved temporarily by the method described above, or by waiting until the more sophisticated approach with push-down systems or slicing is incorporated.

DFGSample.zip

EOG edges missing for Short-circuit evaluation of && and ||

In both Java and C++ the evaluation of the first Operand in && and || determines whether or not the second operator is evaluated or skipped: true || x = true and false && x = false.
Therefore the EOG must have two edges originating from the first operand, one going to the second operand and the other returning to the operator root node.

Unparseable function pointer declaration crashes CPG

See

cpg/src/main/java/de/fraunhofer/aisec/cpg/frontends/cpp/DeclaratorHandler.java

Lines 286 to 290 in 5898c90

 // not in a record and not in a field, strange. This should not happen 

 log.error( 

 "Function pointer declaration that is neither in a function nor in a record. " 

 + "This should not happen!"); 

 return null;

There are occasions where this can happen outside of a function. I will provide an example tomorrow.

Regardless, we should NOT just return null here I guess but rather return a empty declaration. The rest of the code is unfortunately fragile enough to crash if a handler returns null (which is also not good).

Crash related to function pointer call

Nondeterministic crash in CallResolver for this code:

#include <iostream>

class A {
  public:
    void fun() {
      std::cout << "Hello" << std::endl;
    }

    int mainA() {
      A a;

      void (A::* f_ptr) () = &A::fun;
      a.fun();
      (a.*f_ptr)();
    }
};

int main() {
  A a;

  void (A::* f_ptr) () = &A::fun;
  a.fun();
  (a.*f_ptr)();
}

Additionally, some quick checks showed some further problems in this snippet:

(a.*f_ptr)() has name ".", which makes totally no sense
a.fun() is currently not resolved as the member call on class A that it actually is
Function pointer handling needs to be added to the call resolver
Tests for function pointer call resolving are needed

Scope Manager is needed for all passes

Frontends are now created per file and therefore the contained ScopeManager is always recreated. Because in the end only one language-frontend is set to all passes after the frontend-run, the Scope Manager must be independent from the frontends. Scopes are created and can then be reused in later passes, but between different analysis runs the ScopeManager must be different (No Singleton).
This allows for passes to enter scopes and their parent scopes, without having to traverse the AST in the right order but randomly access scopes.
This is done in the process to fix #52 and reenable the VariableUsageResolver to work with the ScopeManager.
NOTE: If we allow different filetypes in one analysis (.java, .cpp) we should also change the way Frontends are given to passes, either we create a frontend per file, or they are the same for files of the same language, only one Frontend per supported Language.

Change refersTo to distinguish between class or instance and different instance accesses

We have to distinguis wheter or not a reference points to an instance of a class or the class itself. Also at some point when taking into consideration data flow analysis results we should try to distinguish between instances that are known to be different.
Currently I am considering multiedges or adding an intermediate node to transform ref --referseTo-->Class into ref --referseTo-->instance --instanceOf-->Class.
Every instance will increase the graphs size so we should be conservative and not add node for every appearance of an unknown reference but rather have an edge or node representing all unknown references, preferably an edge, that prevents the code from assuming references pointing there use the same instance.

GitHub actions always fails on forked repositories

The current CI workflow executes sonarqube analysis and is dependent on the presence of the secret SONAR_TOKEN which is (of course) not available on forks. This is ok, however the build always fails, making it hard to judge whether the code will successfully run. We should detect the presence of a fork in our GitHub actions workflow file and skip the sonarqube task.

Passes get the last LanguageFrontend that was created.

// Set frontend so passes know what language they are working on. for (Pass pass : config.getRegisteredPasses()) { pass.setLang(frontend); }
This is executed at the end of runFrontends with frontend being set in the loop that executes the frontends on the files. Therefore only the last frontend is set to all passes.
This can stay as long as frontends are static in relation to the processed files (no TranslationUnit dependent state is internally stored), but will have to change as soon as we allow the analysis of code that uses different Frontends.

Better error logging

Just a few examples where things can be improved:

All problems should be logged as errors or at least a warning, i.e. in

cpg/src/main/java/de/fraunhofer/aisec/cpg/passes/VariableUsageResolver.java

Line 219 in 5898c90

 log.info("did not find a declaration for {} (line {})", current.getCode(), startLine); 

I like the idea of logging the line when an error occurred, this should be done by all error logs (maybe we should have a special function to enforce it), but then it should also log the file, otherwise it is useless in a multi-file/header scenario.

Other candidates include

cpg/src/main/java/de/fraunhofer/aisec/cpg/frontends/Handler.java

Line 110 in d6d9b46

log.error("Parsing of type {} is not supported (yet)", ctx.getClass());

cpg/src/main/java/de/fraunhofer/aisec/cpg/frontends/cpp/DeclarationHandler.java

Lines 159 to 163 in 02dd217

 log.error( 

 "Unknown Declspecifier in SimpleDeclaration: {}", ctx.getDeclSpecifier().getClass()); 

 } 

 } else { 

 log.error("Declspecifier is null");

Pretty much everytime we log an error but have either a raw ast node or a cpg node available.

constructors and methods in cpp file are not parsed properly / not linked to declaration of header

Consider the following code (parse it with loadIncludes(true)

CPP file

#include "SimpleClass.h"

SimpleClass::SimpleClass()
{
}

int SimpleClass::SomeFunction()
{
	return 0;
}

Header file

#pragma once
class SimpleClass
{
	SimpleClass();

	int SomeFunction();
};

This will create a graph with 3 declarations:

A RecordDeclaration for SimpleClass containing the declaration of the constructor and the function
and
A FunctionDeclaration for SimpleClass::SimpleClass, which is the constructor, incorrectly parsed and
A FunctionDeclaration for SimpleClass::SomeFunction

In reality, of course those two function declarations should at least be method declaration, the first one should be a constructor declaration and they should somehow be linked to the record declaration.

Support anonymous functions in the Graph

Something like Java lambda functions or C++ lambda expressions, such as:

SSL_CTX_set_verify(ctx, SSL_VERIFY_PEER, [](int preverify_ok, X509_STORE_CTX *x509_ctx) {
    return 1;
  });

Create a class representing the graph node. Extend from FunctionDeclaration? Is it a FunctionDeclaration without name?
Assign a handler in CDT to parse it
Assign a handler in JavaParser to parse it
Adjust CFG passes to handle it

NullPointerException in ScopeManager

I am running the CPG on a large code base and I discovered this NPE. I am trying to isolate the file where it occurs, unfortunately I cannot share the whole code base.

Stracktrace

Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NullPointerException
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
	at com.aybaze.AnalyzeU.AppKt.main(App.kt:37)
Caused by: java.lang.NullPointerException
	at de.fraunhofer.aisec.cpg.passes.scopes.ScopeManager.lambda$leaveScope$7(ScopeManager.java:253)
	at de.fraunhofer.aisec.cpg.passes.scopes.ScopeManager.getFirstScopeThat(ScopeManager.java:275)
	at de.fraunhofer.aisec.cpg.passes.scopes.ScopeManager.getFirstScopeThat(ScopeManager.java:269)
	at de.fraunhofer.aisec.cpg.passes.scopes.ScopeManager.leaveScope(ScopeManager.java:253)
	at de.fraunhofer.aisec.cpg.passes.EvaluationOrderGraphPass.handleDeclaration(EvaluationOrderGraphPass.java:211)
	at de.fraunhofer.aisec.cpg.passes.EvaluationOrderGraphPass.handleDeclaration(EvaluationOrderGraphPass.java:190)
	at de.fraunhofer.aisec.cpg.passes.EvaluationOrderGraphPass.handleDeclaration(EvaluationOrderGraphPass.java:177)
	at de.fraunhofer.aisec.cpg.passes.EvaluationOrderGraphPass.accept(EvaluationOrderGraphPass.java:100)
	at de.fraunhofer.aisec.cpg.passes.EvaluationOrderGraphPass.accept(EvaluationOrderGraphPass.java:59)
	at de.fraunhofer.aisec.cpg.TranslationManager.lambda$analyze$0(TranslationManager.java:104)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run$$$capture(CompletableFuture.java:1700)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1692)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)

Potential unecessary or wrong computation of region for Java-CaseSwitchStatements

cpg/src/main/java/de/fraunhofer/aisec/cpg/frontends/java/StatementAnalyzer.java

Lines 446 to 477 in 9231e14

 if (optionalTokenRange.isPresent()) { 

 /* 

  TODO: not sure if this is really necessary, it seems to be the same location as 

  parentLocation, except that column starts 1 character later and I am not sure if 

  this is correct anyway 

  */ 

 // Compute region and code for self generated default statement to match the c++ versions 

 caseTokens = 

 getOuterTokensWithText( 

 "default", 

 ":", 

 optionalTokenRange.get().getBegin(), 

 optionalTokenRange.get().getEnd()); 

 } 

 DefaultStatement defaultStatement = 

 NodeBuilder.newDefaultStatement(getCodeBetweenTokens(caseTokens.a, caseTokens.b)); 

 defaultStatement.setLocation( 

 getLocationsFromTokens(parentLocation, caseTokens.a, caseTokens.b)); 

 return defaultStatement; 

 } 

 if (optionalTokenRange.isPresent()) { 

 // Compute region and code for self generated case statement to match the c++ versions 

 caseTokens = 

 getOuterTokensWithText( 

 "case", ":", optionalTokenRange.get().getBegin(), optionalTokenRange.get().getEnd()); 

 } 

 CaseStatement caseStatement = 

 NodeBuilder.newCaseStatement(getCodeBetweenTokens(caseTokens.a, caseTokens.b)); 

 caseStatement.setCaseExpression( 

 (Expression) lang.getExpressionHandler().handle(caseExpression));

I am not really sure if this piece of code is really necessary. I just made it work like before but I do not really see the point in this. I have included a comment about it, so we might decide to deal with it later

Originally posted by @oxisto in #77

Problems with SonarQube PR decorator

It seems like that the PR decorator is not running if the branch that the PR is based on was already analyzed as a branch in SonarCloud.

As (temporary) workaround I suggest to run sonarqube on master and PRs only.

Cannot parse zero literals in c++

Seems to be a bug introduced by #36. The following literals all crash the CPG:

  long l_with_suffix = 0l;
  long long l_long_long_with_suffix = 0ll;
  unsigned long long l_unsigned_long_long_with_suffix = 0ll;

Seems to be that they are all recognized as octal, then the offset is set to 1 and the suffix is stripped, leaving an empty string which crashes the BigInteger constructor

cpg/src/main/java/de/fraunhofer/aisec/cpg/frontends/cpp/ExpressionHandler.java

Lines 797 to 806 in 5898c90

 } else if (value.startsWith("0") && value.length() > 1) { 

 radix = 8; // octal 

 offset = 1; // len("0") 

 } 

 String suffix = getSuffix(value); 

 String strippedValue = value.substring(offset, value.length() - suffix.length()); 

 // basically we parse everything as BigInteger and then decide what to do 

 bigValue = new BigInteger(strippedValue, radix);

MemberExpression references a wrongly created field

The sample "InstanceShenanagans" shows how the Base edge points to a field of the class. This field is not part of the original class and has type and name of the used variable name. This is definitly wrong.
The Base currently seams to reference manydifferent things, e.g. RecordDeclarations, Fields etc..
IMO they should only reference the variable used for the access over a DeclaredReferenceExpression.
For now i will investigate this problem and remove this wrongly added field. A PR will be used for the fix and for further changes if we decide to limit the node types that can be pointed to by the base.

InstanceShenenegans.zip

Create a ThisExpression

Currently, this is handled as a dummy field of a record declaration. While this works, it is not very clean in my opinion and also leads to funny errors, such as object.this.call() is parsed.

We should replace it with a ThisExpression that refers to the record declaration and can be used as a base for member field access and calls.

Refactor package structure for nodes

We shouldn't have one big graph package containing both nodes and any related helper classes. I would propose having at least a node subpackage, probably even node.declaration, node.expression and node.statement, so that the broader inheritance structure is reflected. This would make things a lot more organized, while still providing close to zero refactoring effort. Of course this is a breaking change for library users, but nothing semantic, so it can just be resolved by fixing any imports.

Surely this is not of top priority, but I still would like to have this inside an issue for further discussion.

EOG is incorrect for assignments

Consider this snippet:

int main() {
  int foo;
  Test t();
  foo = t.call();
}

I would expect the following EOG order: int foo-->t()-->t-->call()-->foo-->foo = t.call(). But in fact it is as shown in the image: int foo-->t()-->foo-->t-->call()-->foo = t.call().

Wrong Base-Reference for Static calls

for

private static byte[] setMasterPass(String mastPass, byte[] salt) throws IOException, NoSuchAlgorithmException {
        //Create new hash with SHA512
        MessageDigest md = MessageDigest.getInstance("SHA-512");
        ByteArrayOutputStream bout = new ByteArrayOutputStream();
        bout.write(salt);
        bout.write(mastPass.getBytes());
        return md.digest(bout.toByteArray());
    }

(MessageDigest.getInstance is a static Call)

the base of getInstance should not be this!

Extend the README.md to be more informative

It would be a benefit for people which are unfamiliar with Code Property Graphs to get a short introduction to the topic in the README.md. Links for further reading would be really nice too.

Serialize additional type information into database

Currently, we combine all type information, such as modifiers directly into the DB as the type field, we might consider to serialise the additional information as well, similar to the RegionConverter.

Non-deterministic error: FieldDeclaration created instead of VariableDeclaration

The following C++ code is part of a test. In most cases it works, however sometimes and non-deterministically, the CPG represents foo as a FieldDeclaration of class Test, which is obviously wrong and fails the constant resolution.

class Test {
public:
 int call(int a) {
  return a + 1;
 }
};

int main() {
  int foo = 1;
  foo = (1,2,3,4,42);
  Test t;
  t.call(foo);
}

The attached pictures show two graphs created by two test runs of exactly the same code. The graph containing the (dark) green FieldDeclaration is wrong.

FQN wrong for some static calls

For the following snippet, the fqn is Cipher.getInstance(...) but should be javax.crypto.Cipher.getInstance(...)

import javax.crypto.Cipher;
import org.bouncycastle.jce.provider.BouncyCastleProvider;

public class BCProviderCipher {

	public static void main(String[] args) {
		Cipher c4 = Cipher.getInstance("AES", new BouncyCastleProvider());
	}
}

This probably has something to do with BouncyCastleProvider not being available. But still, the CPG could "guess" the fqn quite easily here i imagine

Typedef are not parsed correctly

Typedef statements are parsed incorrectly or ignored
Example:

typedef aaaa int; //ignored
typedef bbbb aaaa; // parsed as variable declaration

int main(){
	typedef cccc int; // empty statement
	typedef dddd cccc; // parsed as variable declaration
}

Handle type of ConstructExpression

From @JulianSchuette:

To exemplify: This code ...

const MyType my_variable(arg_0, arg_1);

... is turned into a VariableDeclaration for my_variable with type const MyType and an edge INITIALIZER pointing to the ConstructExpression (which does not have a type property).

Is this intended behavior?

Unused DeclaredReferenceExpression still connected via DFG

When parsing the following code, we get a DeclaredReferenceExpression "i" that is only connected to the graph via a DFG edge:

class Inner {
public:
    int value;

    int init(int a) {
      value = a;
      return a;
    }
};

class Outer {
public:
    Inner inner;

    int init(Inner i) {
        inner = i;
        return i.value;    // <-- HERE
    }
};



int main() {
  Outer o;
  Inner i;

  i.init(17);

  return o.init(i);
}

According to its region, the unused node is related to return i.value in line 17. I suspect that this is the DeclaredReferenceExpression that was originally a child of the MemberExpression for i.value. It has since been replaced by a direct connection between the MemberExpression and the corresponding FieldDeclaration, so it is no longer present in the official AST. But as there is still a remaining DFG edge, it is not completely purged from the graph.

Missed parsing of constructor calls

Analyzing the following C++ Test file, it seems like the constructor calls for a1-a3 are missing.

class A {
public:
    A() {}
    A(int x) {}
    A(int x, int y) {}
};

int main() {
   A a1;
   A a2(5);
   A a3(5,6);
   A a4 = A();
   A* a5 = new A;
   A* a6 = new A();
   return 0;
}

missing Constructor

For

void nok1() {
    Botan p = new Botan(1);
    p.set_key(key); // not allowed as start
}

there is no constructorexpression in the cpg. seems to be a bug

refersTo of DeclaredReferenceExpression is empty for some expressions

In C++ it is valid (however with a warning), to directly use an expression as a statement. The way we populate the refersTo variable of a DeclaredReferenceExpression in C++ (using a function called CXXLanguageFrontend#expressionRefersToDeclaration is that we only do it in a couple of expressions:

Unary Expressions
Function Call Expressions
Binary Expressions (for rhs and lhs)

However, there are potentially more expressions, for example Array Subscription Expressions and it is a pain to do it this way.

It seems that for Java it is done differently, however not sure how at the moment.

Steps to reproduce:

int main() {
  int x[] = { 1, 2, 3 };

  x[0]; // yes this will produce a warning but is still valid and easier to parse for the test
}

In the DeclaredReferenceExpression for x the refersTo will be null.

Type-System needs rework

The current way we parse and store types will likely not work in all cases, especially for C-Types.

Code like static const char * const somearray[] can currently not be parsed, and I am not sure if it can even be stored correctly in the adjustment and type-modifier

Inner Classes are not added to CPG when parsing C++ code

When parsing C++ Code the handler is called for the inner class but the resulting RecordDeclaration is never added to the CPG object-structure.
Also, the fully qualified class name is not using the prefix of outer classes or namespaces

Missing CPP/Java Features

Java

javaparser.ast.expr.SuperExpr, Example: ConnPoolByRoute.java
javaparser.ast.stmt.LocalClassDeclarationStmt, Example: TestGroup.java
javaparser.ast.expr.LambdaExpr, Example: CommunityPojo.java
javaparser.ast.body.AnnotationDeclaration, Example: RatingFile.java

C++

Utility helper function to create consistent language frontends for unit tests

Just a suggestion: @konradweiss: Would it make sense to have an (abstract) test class somewhere and define a utility function like newLanguageFrontend for unit tests that ensure that all the tests (at least the ones without the pass invocations) have a consistent creation-pattern for the language frontend?, i.e.

The desired language frontend class
A "dummy" translation manager configuration
Path to the correct file (as argument)
a new ScopeManger

This function could then be used to document that this will NOT run the passes. Otherwise we have 20+ tests that need to be adjusted and this way it is only 1 function.

Originally posted by @oxisto in #56 (comment)

Add child evaluation order directly to nodes

Even before constructing the EOG, we sometimes need to ensure that we visit nodes in their specific evaluation order. Otherwise things like #16 happen and it might not quite clear what the actual cause for nondeterministic behavior is. Thus we need to add a method to Node that allows the retrieval of direct AST children in their exact evaluation order instead of some arbitrary sorting.

Example use case: When resolving variables, we could encounter things like the following:

public class X {
    private int field = 0;

    public int foo() {
        field = 42;  // refers to the actual field
        int field = 123;
        return field;  // refers to the shadowing local var!
    }

Here, the correctness of any variable resolving relies on the correct visiting order of the statements.

CXXLanguageFrontend#getRegionFromRawNode parses region incorrect if includes are used

To reproduce, parse any cpp file with the loadIncludes option on and supply the path to the include files.

It seems that it tries to parse the region by accessing the source field and storing it in the parentRawSig variable. However, this contains the source for the parent cpp source file and not the header files that are parsed as a result of it. This leads to errors and in many cases crashes since the graph nodes of the include file are looking for code/region in the source of the cpp file.

To crash, simply make the header file substantially longer than the cpp file, this way, the nodes of the header file will definitely refer to code positions that do not exist in the cpp file.

Is the alternative with using the supplied getRawSignature that much worse performance-wise? This function seems to be very hacky.

Endless parsing for some files

Sometimes I try to get CPG for one file. But it looks like endless process. I haven't got any errors but the process takes more than 1 hour and I stop it.

I can provide with such several files.

Interfile analysis

First of all, Thank you for such great tool.

I would like to ask the following questions. Does the tool allow to do interfile analysis?

Refactor EvaluationOrderGraphPass class

It currently has 840 lines of code and multiple functions with a LOT of if/else/elseif and lot a lot of comments or explanations. Unfortunately it has become very hard to read and understand.

Looks like a good candidate for some refactoring if we have some spare time.

ExpressionHandler#expressionTypeProxy should return cpg type class

ExpressionHandler#expressionTypeProxy is currently returning a IType (CDT type), then depending on the handling function which calls it, the type is parsed completely different. Sometimes it looks for pointers, sometime it does not. The function should directly return a CPG Type to reduce the difference in handling this.

	// not in a record and not in a field, strange. This should not happen
	log.error(
	"Function pointer declaration that is neither in a function nor in a record. "
	+ "This should not happen!");
	return null;

	log.error(
	"Unknown Declspecifier in SimpleDeclaration: {}", ctx.getDeclSpecifier().getClass());
	}
	} else {
	log.error("Declspecifier is null");

	if (optionalTokenRange.isPresent()) {
	/*
	TODO: not sure if this is really necessary, it seems to be the same location as
	parentLocation, except that column starts 1 character later and I am not sure if
	this is correct anyway
	*/
	// Compute region and code for self generated default statement to match the c++ versions
	caseTokens =
	getOuterTokensWithText(
	"default",
	":",
	optionalTokenRange.get().getBegin(),
	optionalTokenRange.get().getEnd());
	}
	DefaultStatement defaultStatement =
	NodeBuilder.newDefaultStatement(getCodeBetweenTokens(caseTokens.a, caseTokens.b));
	defaultStatement.setLocation(
	getLocationsFromTokens(parentLocation, caseTokens.a, caseTokens.b));
	return defaultStatement;
	}

	if (optionalTokenRange.isPresent()) {
	// Compute region and code for self generated case statement to match the c++ versions
	caseTokens =
	getOuterTokensWithText(
	"case", ":", optionalTokenRange.get().getBegin(), optionalTokenRange.get().getEnd());
	}

	CaseStatement caseStatement =
	NodeBuilder.newCaseStatement(getCodeBetweenTokens(caseTokens.a, caseTokens.b));
	caseStatement.setCaseExpression(
	(Expression) lang.getExpressionHandler().handle(caseExpression));

	} else if (value.startsWith("0") && value.length() > 1) {
	radix = 8; // octal
	offset = 1; // len("0")
	}

	String suffix = getSuffix(value);
	String strippedValue = value.substring(offset, value.length() - suffix.length());

	// basically we parse everything as BigInteger and then decide what to do
	bigValue = new BigInteger(strippedValue, radix);

fraunhofer-aisec / cpg Goto Github PK

cpg's People

Contributors

Stargazers

Watchers

Forkers

cpg's Issues

Recommend Projects

Recommend Topics

Recommend Org