Git Product home page Git Product logo

Comments (24)

jrfaller avatar jrfaller commented on July 19, 2024

Yes I think this would be a good idea. GumTree has an internal XML format to load ASTs so the only thing required is to write a script that outputs a XML in this format. Then you need to write a tree generator that reads that standard output. You can look how it is done in the gen.c module that works exactly like that.

from gumtree.

macsj200 avatar macsj200 commented on July 19, 2024

Would it be possible to just read in Json directly using something like google's gson library for example? Alternatively, maybe we can use something like jython or pyjamas to traverse the ast using python native APIs from Java, similar to the JavaScript rhino approach.

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

I am not sure that there is already a json internal format for GumTree, but it would not be hard to implement. Why it is easier to generate Json?

from gumtree.

HalleyYoung avatar HalleyYoung commented on July 19, 2024

Hi,
I'm very interested in using GumTree for my research involving comparing python programs. I already have a representation of Python code in XML. How difficult would it be for me to try to add Python functionality? Or if someone is working on this already, can I be of any use? :-)

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

Hi! This is very cool since python support is top wanted feature. If you have an XML representation of a python program you are halfway through, there is only the step to convert the XML to GumTree's Tree format remaining. Maybe you can post a sample of an XML representation, I will check if there are is all the required information. BTW how do you obtain this representation? Via a Python program?

from gumtree.

HalleyYoung avatar HalleyYoung commented on July 19, 2024

I've attached an example file, saved as a text file to fit with Github's requirements. Basically, the syntactic parts are the XML tags, and the individual values (such as the number "1" in a Num element), are stored under the attribute "value." The program to generate the XML from a python file is a python program.
example_python.txt

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

OK that's cool there is almost everything we need, it only lacks the position of the nodes (absolute or line/character) in the original source code text file, do you think it's possible to add that?

from gumtree.

HalleyYoung avatar HalleyYoung commented on July 19, 2024

How is this?
example_python.txt

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

OK this is really nice

The only thing lacking is the end position but if this information is not present in python's parser, it can be deduced from the position of the children and sibilings. Once you have all this information, you can take a look at then gen.srcml that basically do the same thing as we want to do except that it is based upon the XML produced by srcml.

from gumtree.

HalleyYoung avatar HalleyYoung commented on July 19, 2024

Is there an example of the format of the XML that needs to be used (for example, a sample Java program in XML format)?

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

You do not have to produce one particular format, as each tree generator will furnish the "decoder" to produce a GumTree AST

from gumtree.

HalleyYoung avatar HalleyYoung commented on July 19, 2024

Alright, here it is with the end column/line values. Am I correct that all I need is something akin to the Ruby example (https://github.com/GumTreeDiff/gumtree/blob/develop/gen.ruby/src/main/java/com/github/gumtreediff/gen/ruby/RubyTreeGenerator.java) and something like SrcmlCTreeGenerator.java? Where do I specify what each XML symbol means?
example_python.txt

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

OK this is perfect.

Now take a look at srcml generator which is very similar: it launches a native srcml command that produces XML that is then converted to a TreeContext coming from GumTree's API. srcml support three languages therefore there is one base class and three subclasses, for your purpose you only need one, see this one: https://github.com/GumTreeDiff/gumtree/blob/develop/gen.srcml/src/main/java/com/github/gumtreediff/gen/srcml/AbstractSrcmlTreeGenerator.java

from gumtree.

HalleyYoung avatar HalleyYoung commented on July 19, 2024

Sorry, but I'm still a little confused (my knowledge of Java and github isn't great). My assumption is that the public getXML method is where I specify how to convert the file to XML, and the getTreeContext method is where I convert the XML into the working data structure. However, I don't see exactly where in each of these methods I do these things, so I don't know what to modify. I'm looking at SrcmlCTreeGenerator.java, and it doesn't specify either of these processes. How exactly am I letting the program know how to convert the file and interpret the resulting XML?

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

Hi Halley,

You do not have to use the getXML method. You need to subclass the TreeGenerator class and implement the TreeContext generate(Reader) that will implements the conversion from your XML to a TreeContext. By the way, maybe we can create a repository in GumTreeDiff's organization so that we also have the source code for the Python tool, and I will be able to integrate it. Is it okay?

from gumtree.

HalleyYoung avatar HalleyYoung commented on July 19, 2024

That would be perfect! I've also attached the python file stored as a .txt file so you can see what it looks like. Running "python parsepython.py filename.py" in command line prints out filename.py's XML representation. I assume it will be easy to call this from the Java program?
parsepython.txt

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

OK nice!

I have created the repo and I will grant you access to it in case it needs debugging. I will try to add a generator ASAP. BTW does this work with python 2 or 3 (or both?).

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

Okay Halley, I have a tree generator for Python, however there are some bugs. Some XML elements have no start position and end position, and some have no end position. Also, some position are not correct e.g. <body lineno="defaultdict(<class 'jsontree.jsontree'>, {'__trunc__': defaultdict(<class 'jsontree.jsontree'>, {})})" col="defaultdict(<class 'jsontree.jsontree'>, {})">(I get that by running pythonparser on pythonparser itself. Can you fix it (normally you have acces to the repo).

from gumtree.

HalleyYoung avatar HalleyYoung commented on July 19, 2024

Hi,

This should be better!

(Where in the repo do you want me to stick it? I'm a beginner at github.)

parsepython.txt

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

Hi!

I integrated this one, but for the next ones, you can clone the repository I created for pythonparser here : https://github.com/GumTreeDiff/pythonparser
You have push access to it.

There is one new bug : sometimes there are forbidden characters in the value attribute (example: value="
Usage:
parse_python.py

")

I have fixed this bug by using this function for escaping: from xml.sax.saxutils import quoteattr, seems to do the job well.

I will release the python module so you can try it, it is very probable that there are some other bugs.

from gumtree.

HalleyYoung avatar HalleyYoung commented on July 19, 2024

When I tried running, I got the following:

java.io.IOException: Cannot run program "pythonparser" (in directory "/var/folders/cr/zv3cc1gd16nbn32s01h5yhkm0000gn/T"): error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at com.github.gumtreediff.gen.python.PythonTreeGenerator.getXml(PythonTreeGenerator.java:130)
at com.github.gumtreediff.gen.python.PythonTreeGenerator.generate(PythonTreeGenerator.java:60)
at com.github.gumtreediff.gen.TreeGenerator.generateFromReader(TreeGenerator.java:35)
at com.github.gumtreediff.gen.TreeGenerator.generateFromFile(TreeGenerator.java:41)
at com.github.gumtreediff.gen.Generators.getTree(Generators.java:43)
at com.github.gumtreediff.client.diff.AbstractDiffClient.getTreeContext(AbstractDiffClient.java:128)
at com.github.gumtreediff.client.diff.AbstractDiffClient.getSrcTreeContext(AbstractDiffClient.java:114)
at com.github.gumtreediff.client.diff.AbstractDiffClient.matchTrees(AbstractDiffClient.java:105)
at com.github.gumtreediff.client.diff.TextDiff.run(TextDiff.java:98)
at com.github.gumtreediff.client.Run.startClient(Run.java:87)
at com.github.gumtreediff.client.Run.main(Run.java:115)
Caused by: java.io.IOException: error=2, No such file or directory
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.(ProcessImpl.java:339)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:270)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)
... 12 more
** Error while running client diff: java.lang.NullPointerException

Thanks!

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

you need pythonparser in the path.

from gumtree.

jrfaller avatar jrfaller commented on July 19, 2024

ok normally python support is landed, more test is needed by this will be treated in separate issues.

from gumtree.

njuxc avatar njuxc commented on July 19, 2024

you need pythonparser in the path.

how can i add a pythonparser in my path? thanks.

from gumtree.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.