I have downloaded the raw source code on my machine, for example, <code class="notrans

How to extract functions? about codegen HOT 4 CLOSED

facebookresearch commented on September 13, 2024

How to extract functions?

from codegen.

Comments (4)

skye95git commented on September 13, 2024

Whether CodeGen's preprocessing steps can extract comments? I find there is a parameter --keep_comments in TransCoder. But CodeGen doesn't have.

from codegen.

malachaux commented on September 13, 2024

1/2 - This format python.*[0-4][0-9][0-9].json.gz includes python.000000000000.json.gz because of the *, so this works with this format as well. If this would be the error, then you will have had an error at the very first step "no json files found in folder". The error you have is very often due to the facts you ask to learn 50k codes on only a signle small python json. You should consider learning fewer codes and learning on more data.

3 - The others file are intermediate files necessary to run the pipeline, what you want to keep is only the .pth and their symlinks.

4- no feature in the pipeline to have description

5- we only keep the keep-comments=True by default as it is what we use in our models, but you can easily add a param to do so and set this params to False, as we still have keep-comments params in all tokenization/function extraction functions.

from codegen.

skye95git commented on September 13, 2024

1/2 - This format python.*[0-4][0-9][0-9].json.gz includes python.000000000000.json.gz because of the *, so this works with this format as well. If this would be the error, then you will have had an error at the very first step "no json files found in folder". The error you have is very often due to the facts you ask to learn 50k codes on only a signle small python json. You should consider learning fewer codes and learning on more data.

3 - The others file are intermediate files necessary to run the pipeline, what you want to keep is only the .pth and their symlinks.

4- no feature in the pipeline to have description

5- we only keep the keep-comments=True by default as it is what we use in our models, but you can easily add a param to do so and set this params to False, as we still have keep-comments params in all tokenization/function extraction functions.

Do you keep the keep-comments=True by default? So I don't need to set keep-comments=True in CodeGen, right?
But I run the preprocess code and get the functions without comments:

When keep-comments is set to True, doesn't it mean keeping comments in the code? Why does the extracted function have no comments?

from codegen.

skye95git commented on September 13, 2024

1/2 - This format python.*[0-4][0-9][0-9].json.gz includes python.000000000000.json.gz because of the *, so this works with this format as well. If this would be the error, then you will have had an error at the very first step "no json files found in folder". The error you have is very often due to the facts you ask to learn 50k codes on only a signle small python json. You should consider learning fewer codes and learning on more data.

3 - The others file are intermediate files necessary to run the pipeline, what you want to keep is only the .pth and their symlinks.

4- no feature in the pipeline to have description

5- we only keep the keep-comments=True by default as it is what we use in our models, but you can easily add a param to do so and set this params to False, as we still have keep-comments params in all tokenization/function extraction functions.

The description in the fourth question means comment. Now I can extract functions. But the comments can't be extracted. How can I extract functions with comments?

from codegen.

How to extract functions? about codegen HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent