Git Product home page Git Product logo

Comments (4)

skye95git avatar skye95git commented on September 13, 2024

Whether CodeGen's preprocessing steps can extract comments? I find there is a parameter --keep_comments in TransCoder. But CodeGen doesn't have.

from codegen.

malachaux avatar malachaux commented on September 13, 2024

1/2 - This format python.*[0-4][0-9][0-9].json.gz includes python.000000000000.json.gz because of the *, so this works with this format as well. If this would be the error, then you will have had an error at the very first step "no json files found in folder". The error you have is very often due to the facts you ask to learn 50k codes on only a signle small python json. You should consider learning fewer codes and learning on more data.

3 - The others file are intermediate files necessary to run the pipeline, what you want to keep is only the .pth and their symlinks.

4- no feature in the pipeline to have description

5- we only keep the keep-comments=True by default as it is what we use in our models, but you can easily add a param to do so and set this params to False, as we still have keep-comments params in all tokenization/function extraction functions.

from codegen.

skye95git avatar skye95git commented on September 13, 2024

1/2 - This format python.*[0-4][0-9][0-9].json.gz includes python.000000000000.json.gz because of the *, so this works with this format as well. If this would be the error, then you will have had an error at the very first step "no json files found in folder". The error you have is very often due to the facts you ask to learn 50k codes on only a signle small python json. You should consider learning fewer codes and learning on more data.

3 - The others file are intermediate files necessary to run the pipeline, what you want to keep is only the .pth and their symlinks.

4- no feature in the pipeline to have description

5- we only keep the keep-comments=True by default as it is what we use in our models, but you can easily add a param to do so and set this params to False, as we still have keep-comments params in all tokenization/function extraction functions.

Do you keep the keep-comments=True by default? So I don't need to set keep-comments=True in CodeGen, right?
But I run the preprocess code and get the functions without comments:
捕获5

When keep-comments is set to True, doesn't it mean keeping comments in the code? Why does the extracted function have no comments?

from codegen.

skye95git avatar skye95git commented on September 13, 2024

1/2 - This format python.*[0-4][0-9][0-9].json.gz includes python.000000000000.json.gz because of the *, so this works with this format as well. If this would be the error, then you will have had an error at the very first step "no json files found in folder". The error you have is very often due to the facts you ask to learn 50k codes on only a signle small python json. You should consider learning fewer codes and learning on more data.

3 - The others file are intermediate files necessary to run the pipeline, what you want to keep is only the .pth and their symlinks.

4- no feature in the pipeline to have description

5- we only keep the keep-comments=True by default as it is what we use in our models, but you can easily add a param to do so and set this params to False, as we still have keep-comments params in all tokenization/function extraction functions.

The description in the fourth question means comment. Now I can extract functions. But the comments can't be extracted. How can I extract functions with comments?

from codegen.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.