vazexqi / jflow Goto Github PK
View Code? Open in Web Editor NEWInteractive Source-to-source Transformations for Flow-based Applications
License: Other
Interactive Source-to-source Transformations for Flow-based Applications
License: Other
Find a way to pretty print, store, filter, etc of the potentially 1000's of PointerKeys. We need to make sense of this so that we can decide if it is possible to parallelize. Some of the JDK classes are definitely safe to point to but we need to get the list first. We will focus on Lire since that is our real application for now.
Write tests for our PDGExtractClosure class. For now, test the scalar variables.
This instruction only handles the updating. It assumes that x has been killed with a previous statement x := nil (done through some preprocessor).
OoOJava evaluated its approach on 10 programs from several domains. What do those programs look like and where are the sources of parallelism? Summarize them.
There are currently too many things in the WALA IR (SSA form, Phi nodes, etc) that make it hard to test my dataflow solver. Create a simplified version of the IR that mimics what we have in the paper Solving Shape-Analysis Problems in Languages with Destructive Updating and then proceed from there.
All interprocedural analysis, e.g., modref rely on the call graph. However, nodes in the call graph, CGNode
are compared for equality using reference equality (!). This means that if we construct two different call graphs (at two different times) we are going to run into issues of not being able to locate certain values properly.
As a concrete example, consider points-to-set. It takes a PointerKey
. Some PointerKey
s such as LocalPointerKey
take the CGNode
into consideration. Thus, when we compare them we have to take CGNode
equality into consideration. This happens many times especially when PointerKey
s (the KEY in the name) implies that it will be used as they key to some map.
Additionally, the IRs must probably match too since instructions can also be factored in when doing a comparison.
See also #45 and the weird issues of points-to-set being empty that we were having.
It makes sense to try the tool out as we build in on our canonical example to identify issues as early as possible.
This must also include fields (current class, superclasses, ancestors). Basically this should capture the full dependence list of variables that need to be "copied-in" like the OpenMP directive. By determining the copy-in variables, we can then decide on how to schedule the execution of different tasks.
This will probably be done using the JDT infrastructure since it is cleaner when trying to identify the right variable (no need to worry about mapping back from WALA IR).
Try not to introduce any duplicate functionality that we already have in the PDGAnalyzer.
Once we are confident enough with our own PDGAnalyzer, we should just delegate the ExtractClosureAnalyzer to check for simple errors (like selecting a for loop, selecting partial statements, etc).
This helps with debugging and also to double-check that we are not screwing something up.
See #33
When trying to resolve the dependencies for "this" we run into issues because of the way we are trying to search for a variable with "this" in the scope. There is no such variable!
Disallow anything smaller than statement level. That's just because for flow-based parallelization, it just doesn't make any more sense to go finer-grained than statement level.
Let's have some simple automated tests for different scenarios so we can verify certain weird situations with PDGs (e.g., method params, variable names, etc).
The tests must evolve since at this time, the interface to the PDG is not stable yet.
Follows Figure 14 of the paper Solving Shape-Analysis Problems in Languages with Destructive Updating.
If we have a statements like such:
Object a; // 1
a.doSomething(); // 2
a.doSomethingElse(); // 3
someMethod(a); // 4
Currently, on line 4, the a
is still data dependent only on the a object on line 1. This is correct. But we want something a bit more program point specific โ almost like SSA for heap variables. So that anytime a statement accesses (modifies, if we can be that precise because reads are safe) a heap object, we should have a dependency on the heap object at that point.
The ability to store the program point where the last access happened is very useful when we start splitting things into threads since we need to track which thread wrote which value and compute some ordering.
Ignore control dependencies for now and assume that the old FlowAnalyzer with the checks can handle it. We just want to rely on our PDG for full data dependencies. We will start with scalar dependencies and then progress to heap dependencies.
There are some special semantics with the way that mapper is used (if something doesn't exist then its default value is false) and it is easier to implement that logic inside a new subclass than to always check for null conditions.
See https://groups.google.com/d/topic/wala-sourceforge-net/Cs2VcUwX0ak/discussion about how the front end might actually contain more information and less optimization. This could help our mapping difficulties.
These conversions are necessary for the code generation part to work.
Because we decided to use WALA exclusively to do the analysis (see #20), we need a way to map back the results to the actual physical file. Mapping SSA variables to local variables is one part of it. Mapping WALA IR back to physical line numbers is another.
Does WALA come with a Liveness Analysis that we can leverage? Or does its internal SSA form help us enough with liveness analysis?
Useful to be able to follow some of the pointers in an interactive manner using the WALAViewer. Since WALAViewer is written using Swing and we are using SWT (because we need an instance of Eclipse running for JDTSourceAnalysisEngine), we need to change the way WALAViewer is invoked.
See [http://www.eclipse.org/articles/article.php?file=Article-Swing-SWT-Integration/index.html]
One of the problems with static analysis is the scalability. Sometimes this scalability problem arises just because we run out of memory. However, implementing parsimonious data structures is difficult. What if we could just use one of the existing solutions, e.g., in-memory databases?
Other researchers have used some query languages on top of effective data structures in the past:
Instead of using a research project (slightly less unsupported or tested) to do this, can we use a commercial product such as neo4j or even datomic? neo4j is preferred since it uses Java whereas datomic is mostly clojure.
I need to compare this view to our own PDG view. To make the comparison fair I should only enable data dependencies on Wala's PDG and disable control dependencies.
I should also clean up the view so that we are not displaying the full context until mouse hover.
The first time I invoke the view, it takes a few seconds to display - it is no immediate but tolerable. The second time I invoke it, Eclipse proceeds to crunch away (it does not seem to be memory bound since there is still plenty of RAM left) and takes a few minutes to display.
If you just select a particular statement in the method that you are interested in, the PDG will display the nodes but the edges are missing.
You also see the following trace:
!STACK 0 java.lang.NullPointerException at edu.illinois.jflow.ui.tools.pdg.PDGLabelProvider.getText(PDGLabelProvider.java:45) at org.eclipse.zest.core.viewers.internal.GraphItemStyler.styleItem(GraphItemStyler.java:83) at org.eclipse.zest.core.viewers.internal.AbstractStylingModelFactory.styleItem(AbstractStylingModelFactory.java:121) at org.eclipse.zest.core.viewers.internal.AbstractStylingModelFactory.createConnection(AbstractStylingModelFactory.java:182) at org.eclipse.zest.core.viewers.internal.GraphModelEntityFactory.doBuildGraph(GraphModelEntityFactory.java:98) at org.eclipse.zest.core.viewers.internal.AbstractStylingModelFactory.refreshGraph(AbstractStylingModelFactory.java:288) at org.eclipse.zest.core.viewers.AbstractStructuredGraphViewer.internalRefresh(AbstractStructuredGraphViewer.java:359) at org.eclipse.jface.viewers.StructuredViewer$7.run(StructuredViewer.java:1508) at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1443) at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1404) at org.eclipse.jface.viewers.StructuredViewer.refresh(StructuredViewer.java:1506) at org.eclipse.zest.core.viewers.GraphViewer.refresh(GraphViewer.java:307) at org.eclipse.jface.viewers.StructuredViewer.refresh(StructuredViewer.java:1465) at edu.illinois.jflow.ui.tools.pdg.PDGContentProvider.inputChanged(PDGContentProvider.java:23) at org.eclipse.jface.viewers.ContentViewer.setInput(ContentViewer.java:276) at org.eclipse.jface.viewers.StructuredViewer.setInput(StructuredViewer.java:1690) at edu.illinois.jflow.wala.ui.tools.graph.view.WalaGraphView.updateGraph(WalaGraphView.java:50) at edu.illinois.jflow.ui.tools.pdg.ViewPDGAction.run(ViewPDGAction.java:38) at org.eclipse.jface.action.Action.runWithEvent(Action.java:498) ...
When a method has parameters, we should add those dependencies. This is particularly important for virtual methods where the "this" parameter is implicitly passed as the first parameter.
There is the class JDTPosition in Wala. Could we use that information for mapping? It seems quite precise and might be better than what we are doing in #33.
Basically, this is similar to Extract Method except that we will also include dependencies from fields (Extract Method does not by default). We will replace the flow analyzer from Eclipse with the one that we have built using Wala.
These classes are pretty much value classes with final (immutable) fields. So we don't have to create too many references to them.
I believe that the CI slicer is the ThinSlicer and that the normal CS slicer is the SDG. Either way, let's try it and see what happens.
Might be able to use this to explain when a data race can happen.
This will give me some chance to experiment with how the SSA form in WALA works. Some questions that I have:
Technically we can extract a closure without doing modref analysis since everything is local. But if we don't incorporate modref analysis early, we could end up with closures that cannot be parallelized later. So we should somehow try to incorporate this analysis early. Might even decide to use //placemarkers to extract the closures.
// Stage
statement
statement
// End stage
// Stage
statement
statement
// End stage
This will be similar to the BitVectorFramework that already exists for Java except that we will be operating on the join complete semi-lattice of static shape graphs. The ordering and join follow the definitions in the paper Solving Shape-Analysis Problems in Languages with Destructive Updating.
Start to slowly replace parts of the non-flow sensitive FlowAnalyzer with ours. Retain the FlowAnalyzer so that we make use of its precondition checking, i.e., its ability to check for weird control flow of selected statements.
I just followed how I saw other UnaryOperators define their equals as
@Override
public boolean equals(Object o) {
return (o instanceof AssignInstruction.SSGAssign);
}
But this is quite weird and it is worth investigating.
This instruction only handles the updating. It assumes that x has been killed with a previous statement x := nil (done through some preprocessor).
Here's the stack trace
Caused by: com.ibm.wala.util.debug.UnimplementedError: Unexpected type: class com.ibm.wala.analysis.typeInference.JavaPrimitiveType
at com.ibm.wala.util.debug.Assertions.UNREACHABLE(Assertions.java:55)
at com.ibm.wala.analysis.typeInference.PointType.meet(PointType.java:69)
at com.ibm.wala.analysis.typeInference.TypeInference$PrimitivePropagateOperator.evaluate(TypeInference.java:392)
at com.ibm.wala.cast.java.analysis.typeInference.AstJavaTypeInference$PrimAndStringOp.evaluate(AstJavaTypeInference.java:142)
at com.ibm.wala.analysis.typeInference.TypeInference$PrimitivePropagateOperator.evaluate(TypeInference.java:1)
at com.ibm.wala.fixedpoint.impl.GeneralStatement.evaluate(GeneralStatement.java:36)
at com.ibm.wala.fixedpoint.impl.AbstractFixedPointSolver.solve(AbstractFixedPointSolver.java:149)
at com.ibm.wala.analysis.typeInference.TypeInference.solve(TypeInference.java:126)
at com.ibm.wala.analysis.typeInference.TypeInference.solve(TypeInference.java:117)
at com.ibm.wala.analysis.typeInference.TypeInference.<init>(TypeInference.java:113)
at com.ibm.wala.cast.analysis.typeInference.AstTypeInference.<init>(AstTypeInference.java:70)
at com.ibm.wala.cast.java.analysis.typeInference.AstJavaTypeInference.<init>(AstJavaTypeInference.java:98)
at edu.illinois.jflow.jflow.wala.dataflowanalysis.ProgramDependenceGraph.<init>(ProgramDependenceGraph.java:80)
at edu.illinois.jflow.jflow.wala.dataflowanalysis.ProgramDependenceGraph.makeWithSourceCode(ProgramDependenceGraph.java:65)
at edu.illinois.jflow.core.transformations.code.ExtractClosureRefactoring.createPDGAnalyzer(ExtractClosureRefactoring.java:302)
Here's the offending code:
package extractclosure;
public class Project1 {
public static void main(String[] args) {
int a = producer(1);
/*[*/
int b = producer(a);
int c = producer(b);
System.out.println("a: " + a + ", c: " + c); // <--- This line here, if we remove it al is well.
/*]*/
}
static int producer(int input) {
return input + 2;
}
}
When we are doing an extract method, it is important to be able to get the type of the variable. In JDT land, this information is quite easily obtained through an IVariableBinding
. How can we get the information for WALA IR?
Does the TypeInference
class in WALA work properly? Incorporate into our PDG and try it out.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.