vazexqi / jflow Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 4.0 59.25 MB

Interactive Source-to-source Transformations for Flow-based Applications

License: Other

Java 99.86% Perl 0.14%

jflow's People

Contributors

Stargazers

Watchers

Forkers

hdkwon2 reprogrammer joeylee0201 april1989

jflow's Issues

Pretty print the mods/refs from PipelineStage

Find a way to pretty print, store, filter, etc of the potentially 1000's of PointerKeys. We need to make sense of this so that we can decide if it is possible to parallelize. Some of the JDK classes are definitely safe to point to but we need to get the list first. We will focus on Lire since that is our real application for now.

Test the dependency analysis for scalar variables

Write tests for our PDGExtractClosure class. For now, test the scalar variables.

Create transfer function for x := new instruction

This instruction only handles the updating. It assumes that x has been killed with a previous statement x := nil (done through some preprocessor).

Summarize where the main loops are located in the example programs for OoOJava

OoOJava evaluated its approach on 10 programs from several domains. What do those programs look like and where are the sources of parallelism? Summarize them.

Create custom IR for the linked list example

There are currently too many things in the WALA IR (SSA form, Phi nodes, etc) that make it hard to test my dataflow solver. Create a simplified version of the IR that mimics what we have in the paper Solving Shape-Analysis Problems in Languages with Destructive Updating and then proceed from there.

Refactor tests to use ONE canonical call graph (and IR)

All interprocedural analysis, e.g., modref rely on the call graph. However, nodes in the call graph, CGNode are compared for equality using reference equality (!). This means that if we construct two different call graphs (at two different times) we are going to run into issues of not being able to locate certain values properly.

As a concrete example, consider points-to-set. It takes a PointerKey. Some PointerKeys such as LocalPointerKey take the CGNode into consideration. Thus, when we compare them we have to take CGNode equality into consideration. This happens many times especially when PointerKeys (the KEY in the name) implies that it will be used as they key to some map.

Additionally, the IRs must probably match too since instructions can also be factored in when doing a comparison.

See also #45 and the weird issues of points-to-set being empty that we were having.

Start testing partitioner with LIRE

It makes sense to try the tool out as we build in on our canonical example to identify issues as early as possible.

Create transfer function for x := y.sel instruction

Detect all the "copy-in" for selected statements

This must also include fields (current class, superclasses, ancestors). Basically this should capture the full dependence list of variables that need to be "copied-in" like the OpenMP directive. By determining the copy-in variables, we can then decide on how to schedule the execution of different tasks.

This will probably be done using the JDT infrastructure since it is cleaner when trying to identify the right variable (no need to worry about mapping back from WALA IR).

Pretty print the VariableEdges and SelectorEdges of the StaticShapeGraph

Create transfer function for x := nil instruction

Trim as much as possible from ExtractClosureAnalyzer now that we have the PDG

Try not to introduce any duplicate functionality that we already have in the PDGAnalyzer.

Once we are confident enough with our own PDGAnalyzer, we should just delegate the ExtractClosureAnalyzer to check for simple errors (like selecting a for loop, selecting partial statements, etc).

Visualize our internal Program Dependence Graph

This helps with debugging and also to double-check that we are not screwing something up.

Bug when trying to resolve the type of "this" parameter since it is implicit

See #33

When trying to resolve the dependencies for "this" we run into issues because of the way we are trying to search for a variable with "this" in the scope. There is no such variable!

Create transfer function for x.sel := nil instruction

Make ExtractClosureAnalyzer disable any selection except statement-based selections

Disallow anything smaller than statement level. That's just because for flow-based parallelization, it just doesn't make any more sense to go finer-grained than statement level.

Write unit tests for the PDG

Let's have some simple automated tests for different scenarios so we can verify certain weird situations with PDGs (e.g., method params, variable names, etc).

The tests must evolve since at this time, the interface to the PDG is not stable yet.

Create custom IR for the insert into linked list example

Follows Figure 14 of the paper Solving Shape-Analysis Problems in Languages with Destructive Updating.

Make heap dependencies be program point specific

If we have a statements like such:

Object a;  // 1
a.doSomething(); // 2
a.doSomethingElse(); // 3
someMethod(a); // 4

Currently, on line 4, the a is still data dependent only on the a object on line 1. This is correct. But we want something a bit more program point specific – almost like SSA for heap variables. So that anytime a statement accesses (modifies, if we can be that precise because reads are safe) a heap object, we should have a dependency on the heap object at that point.

The ability to store the program point where the last access happened is very useful when we start splitting things into threads since we need to track which thread wrote which value and compute some ordering.

Rewrite a simpler version of FlowAnalyzer using our ProgramDependenceGraph

Ignore control dependencies for now and assume that the old FlowAnalyzer with the checks can handle it. We just want to rely on our PDG for full data dependencies. We will start with scalar dependencies and then progress to heap dependencies.

isShared should be made into its own special subclass of HashMap

There are some special semantics with the way that mapper is used (if something doesn't exist then its default value is false) and it is easier to implement that logic inside a new subclass than to always check for null conditions.

Try analysis on WALA frontend

See https://groups.google.com/d/topic/wala-sourceforge-net/Cs2VcUwX0ak/discussion about how the front end might actually contain more information and less optimization. This could help our mapping difficulties.

Convert between IR variable numbers to JDTs' IVariableBinding/ITypeBinding

These conversions are necessary for the code generation part to work.

Map from selected statements in Eclipse JDT to physical line numbers in the File

Because we decided to use WALA exclusively to do the analysis (see #20), we need a way to map back the results to the actual physical file. Mapping SSA variables to local variables is one part of it. Mapping WALA IR back to physical line numbers is another.

Liveness Analysis

Does WALA come with a Liveness Analysis that we can leverage? Or does its internal SSA form help us enough with liveness analysis?

Visualize Pointer Analysis using WALAViewer

Useful to be able to follow some of the pointers in an interactive manner using the WALAViewer. Since WALAViewer is written using Swing and we are using SWT (because we need an instance of Eclipse running for JDTSourceAnalysisEngine), we need to change the way WALAViewer is invoked.

See [http://www.eclipse.org/articles/article.php?file=Article-Swing-SWT-Integration/index.html]

Enhance the IR view with the information about mapping SSA variables to original variable names

Investigate whether we can use neo4j as a backend for analysis

One of the problems with static analysis is the scalability. Sometimes this scalability problem arises just because we run out of memory. However, implementing parsimonious data structures is difficult. What if we could just use one of the existing solutions, e.g., in-memory databases?

Other researchers have used some query languages on top of effective data structures in the past:

Instead of using a research project (slightly less unsupported or tested) to do this, can we use a commercial product such as neo4j or even datomic? neo4j is preferred since it uses Java whereas datomic is mostly clojure.

Clean up WALA Program Dependence Graph View

I need to compare this view to our own PDG view. To make the comparison fair I should only enable data dependencies on Wala's PDG and disable control dependencies.

I should also clean up the view so that we are not displaying the full context until mouse hover.

Create tests for extract closure (need to introduce Groovy dependency)

Include the Groovy dependency.
Be sure to ignore all Groovy while analyzing (or else we increase our analysis time).
Follow the testing style for "Extract Method" in JDT tests.

Why does invoking the View IR or View PDG action for the second time take so long?

The first time I invoke the view, it takes a few seconds to display - it is no immediate but tolerable. The second time I invoke it, Eclipse proceeds to crunch away (it does not seem to be memory bound since there is still plenty of RAM left) and takes a few minutes to display.

PDG View doesn't work when you don't select the method name

If you just select a particular statement in the method that you are interested in, the PDG will display the nodes but the edges are missing.

You also see the following trace:

!STACK 0
java.lang.NullPointerException
    at edu.illinois.jflow.ui.tools.pdg.PDGLabelProvider.getText(PDGLabelProvider.java:45)
    at org.eclipse.zest.core.viewers.internal.GraphItemStyler.styleItem(GraphItemStyler.java:83)
    at org.eclipse.zest.core.viewers.internal.AbstractStylingModelFactory.styleItem(AbstractStylingModelFactory.java:121)
    at org.eclipse.zest.core.viewers.internal.AbstractStylingModelFactory.createConnection(AbstractStylingModelFactory.java:182)
    at org.eclipse.zest.core.viewers.internal.GraphModelEntityFactory.doBuildGraph(GraphModelEntityFactory.java:98)
    at org.eclipse.zest.core.viewers.internal.AbstractStylingModelFactory.refreshGraph(AbstractStylingModelFactory.java:288)
    at org.eclipse.zest.core.viewers.AbstractStructuredGraphViewer.internalRefresh(AbstractStructuredGraphViewer.java:359)
    at org.eclipse.jface.viewers.StructuredViewer$7.run(StructuredViewer.java:1508)
    at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1443)
    at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1404)
    at org.eclipse.jface.viewers.StructuredViewer.refresh(StructuredViewer.java:1506)
    at org.eclipse.zest.core.viewers.GraphViewer.refresh(GraphViewer.java:307)
    at org.eclipse.jface.viewers.StructuredViewer.refresh(StructuredViewer.java:1465)
    at edu.illinois.jflow.ui.tools.pdg.PDGContentProvider.inputChanged(PDGContentProvider.java:23)
    at org.eclipse.jface.viewers.ContentViewer.setInput(ContentViewer.java:276)
    at org.eclipse.jface.viewers.StructuredViewer.setInput(StructuredViewer.java:1690)
    at edu.illinois.jflow.wala.ui.tools.graph.view.WalaGraphView.updateGraph(WalaGraphView.java:50)
    at edu.illinois.jflow.ui.tools.pdg.ViewPDGAction.run(ViewPDGAction.java:38)
    at org.eclipse.jface.action.Action.runWithEvent(Action.java:498)
...

Add dependency edges between method parameters

When a method has parameters, we should add those dependencies. This is particularly important for virtual methods where the "this" parameter is implicitly passed as the first parameter.

Prune LIRE to its fundamentals

First step: remove all the logging calls

Can we make use of JDTPosition to figure out correspondences with JDT?

There is the class JDTPosition in Wala. Could we use that information for mapping? It seems quite precise and might be better than what we are doing in #33.

Select feasible lines in the Eclipse editor and have their dependencies calculated

Basically, this is similar to Extract Method except that we will also include dependencies from fields (Extract Method does not by default). We will replace the flow analyzer from Eclipse with the one that we have built using Wala.

First handle the case where we are just doing the selection for one set of lines. Determine it's copy-in and copy-out values and connect them.
Then, handle the case where there is an existing closure that also needs to forward some of its variables. This might be handled already by the mechanics of step 1 but let's split them up for now to make it easier to prototype/reason.

Reduce the number of copied PointerVariable, Selector, VariableEdge, SelectorEdge

These classes are pretty much value classes with final (immutable) fields. So we don't have to create too many references to them.

Run Context-Insensitive (CI) and Context-Sensitive (CS) slicer on our program to see what it gives

I believe that the CI slicer is the ThinSlicer and that the normal CS slicer is the SDG. Either way, let's try it and see what happens.

Might be able to use this to explain when a data race can happen.

Print out the original names of the variables in the IR PDF

This will give me some chance to experiment with how the SSA form in WALA works. Some questions that I have:

How can I determine what variable in the original source code is actually referred to by the SSA variables?
What happens during a phi-instruction?
How can we perform shape-analysis on this representation?

Create a partitioner based on our PDG and modref analysis

Technically we can extract a closure without doing modref analysis since everything is local. But if we don't incorporate modref analysis early, we could end up with closures that cannot be parallelized later. So we should somehow try to incorporate this analysis early. Might even decide to use //placemarkers to extract the closures.

// Stage
statement
statement
// End stage

Setup custom dataflow solver for shape analysis

This will be similar to the BitVectorFramework that already exists for Java except that we will be operating on the join complete semi-lattice of static shape graphs. The ordering and join follow the definitions in the paper Solving Shape-Analysis Problems in Languages with Destructive Updating.

Create transfer function for x.sel := y instruction

Integrate our PDGAnalyzer into Extract Closure Refactoring

Start to slowly replace parts of the non-flow sensitive FlowAnalyzer with ours. Retain the FlowAnalyzer so that we make use of its precondition checking, i.e., its ability to check for weird control flow of selected statements.

Find out about the peculiar way equals()/hashCode() is defined for UnaryOperators

I just followed how I saw other UnaryOperators define their equals as

        @Override
        public boolean equals(Object o) {
            return (o instanceof AssignInstruction.SSGAssign);
        }

But this is quite weird and it is worth investigating.

Enhance the IR view with line number information

Clean up the view for the call graph

It's impossible to interpret anything with all the clutter from the context information.

Create shape analysis of the typical linked list example

Example

Create a program that reverses a list
Use only basic classes (don't use Lists, Vector, etc)
Visualize the IR of the program

Shape analysis infrastructure

Create the data structures to support the static shape graph (and hook it into WALA).

Create transfer function for x := y instruction

This instruction only handles the updating. It assumes that x has been killed with a previous statement x := nil (done through some preprocessor).

Unexpected type: class com.ibm.wala.analysis.typeInference.JavaPrimitiveType

Here's the stack trace

Caused by: com.ibm.wala.util.debug.UnimplementedError: Unexpected type: class com.ibm.wala.analysis.typeInference.JavaPrimitiveType
    at com.ibm.wala.util.debug.Assertions.UNREACHABLE(Assertions.java:55)
    at com.ibm.wala.analysis.typeInference.PointType.meet(PointType.java:69)
    at com.ibm.wala.analysis.typeInference.TypeInference$PrimitivePropagateOperator.evaluate(TypeInference.java:392)
    at com.ibm.wala.cast.java.analysis.typeInference.AstJavaTypeInference$PrimAndStringOp.evaluate(AstJavaTypeInference.java:142)
    at com.ibm.wala.analysis.typeInference.TypeInference$PrimitivePropagateOperator.evaluate(TypeInference.java:1)
    at com.ibm.wala.fixedpoint.impl.GeneralStatement.evaluate(GeneralStatement.java:36)
    at com.ibm.wala.fixedpoint.impl.AbstractFixedPointSolver.solve(AbstractFixedPointSolver.java:149)
    at com.ibm.wala.analysis.typeInference.TypeInference.solve(TypeInference.java:126)
    at com.ibm.wala.analysis.typeInference.TypeInference.solve(TypeInference.java:117)
    at com.ibm.wala.analysis.typeInference.TypeInference.<init>(TypeInference.java:113)
    at com.ibm.wala.cast.analysis.typeInference.AstTypeInference.<init>(AstTypeInference.java:70)
    at com.ibm.wala.cast.java.analysis.typeInference.AstJavaTypeInference.<init>(AstJavaTypeInference.java:98)
    at edu.illinois.jflow.jflow.wala.dataflowanalysis.ProgramDependenceGraph.<init>(ProgramDependenceGraph.java:80)
    at edu.illinois.jflow.jflow.wala.dataflowanalysis.ProgramDependenceGraph.makeWithSourceCode(ProgramDependenceGraph.java:65)
    at edu.illinois.jflow.core.transformations.code.ExtractClosureRefactoring.createPDGAnalyzer(ExtractClosureRefactoring.java:302)

Here's the offending code:

package extractclosure;

public class Project1 {
    public static void main(String[] args) {
        int a = producer(1);
        /*[*/
        int b = producer(a);
        int c = producer(b);
        System.out.println("a: " + a + ", c: " + c); // <--- This line here, if we remove it al is well.
        /*]*/
    }

    static int producer(int input) {
        return input + 2;
    }
}

Try out how accurate the type inference is for WALA IR

When we are doing an extract method, it is important to be able to get the type of the variable. In JDT land, this information is quite easily obtained through an IVariableBinding. How can we get the information for WALA IR?

Does the TypeInference class in WALA work properly? Incorporate into our PDG and try it out.