Comments (4)
Whether CodeGen's preprocessing steps can extract comments? I find there is a parameter --keep_comments
in TransCoder. But CodeGen doesn't have.
from codegen.
1/2 - This format python.*[0-4][0-9][0-9].json.gz includes python.000000000000.json.gz because of the *
, so this works with this format as well. If this would be the error, then you will have had an error at the very first step "no json files found in folder". The error you have is very often due to the facts you ask to learn 50k codes on only a signle small python json. You should consider learning fewer codes and learning on more data.
3 - The others file are intermediate files necessary to run the pipeline, what you want to keep is only the .pth and their symlinks.
4- no feature in the pipeline to have description
5- we only keep the keep-comments=True by default as it is what we use in our models, but you can easily add a param to do so and set this params to False, as we still have keep-comments params in all tokenization/function extraction functions.
from codegen.
1/2 - This format python.*[0-4][0-9][0-9].json.gz includes python.000000000000.json.gz because of the
*
, so this works with this format as well. If this would be the error, then you will have had an error at the very first step "no json files found in folder". The error you have is very often due to the facts you ask to learn 50k codes on only a signle small python json. You should consider learning fewer codes and learning on more data.3 - The others file are intermediate files necessary to run the pipeline, what you want to keep is only the .pth and their symlinks.
4- no feature in the pipeline to have description
5- we only keep the keep-comments=True by default as it is what we use in our models, but you can easily add a param to do so and set this params to False, as we still have keep-comments params in all tokenization/function extraction functions.
Do you keep the keep-comments=True by default? So I don't need to set keep-comments=True in CodeGen, right?
But I run the preprocess code and get the functions without comments:
When keep-comments is set to True, doesn't it mean keeping comments in the code? Why does the extracted function have no comments?
from codegen.
1/2 - This format python.*[0-4][0-9][0-9].json.gz includes python.000000000000.json.gz because of the
*
, so this works with this format as well. If this would be the error, then you will have had an error at the very first step "no json files found in folder". The error you have is very often due to the facts you ask to learn 50k codes on only a signle small python json. You should consider learning fewer codes and learning on more data.3 - The others file are intermediate files necessary to run the pipeline, what you want to keep is only the .pth and their symlinks.
4- no feature in the pipeline to have description
5- we only keep the keep-comments=True by default as it is what we use in our models, but you can easily add a param to do so and set this params to False, as we still have keep-comments params in all tokenization/function extraction functions.
The description
in the fourth question means comment
. Now I can extract functions. But the comments can't be extracted. How can I extract functions with comments?
from codegen.
Related Issues (20)
- Are Decompilation Model Checkpoints available to share?
- Is there a method to directly evaluate my trasnlated code snippets?
- Evaluate Transcoder_model_1 on CodeXGlue benchmark HOT 3
- Failed to generate create self-training dataset per transcoder-st doc HOT 1
- Script to create a new dataset? HOT 2
- Training transcoder Generates same token at the beginning of training HOT 1
- Assertion `srcIndex < srcSelectDimSize` failed HOT 4
- Pretrain modell HOT 1
- Bug in epoch calculation
- Could you please build a website to support API to translate the program languages?
- Empty .sa.tok files after select_functions & request to release self_training dataset
- typo in readme: donwload HOT 2
- `fastBPE` fix path
- Lang embeddings loading
- Untokenized version of transcoder_test_set.zip
- [Question] How to write extract_function method?
- Training MLM with reload model as TransCoder_model_1 on csharp monolingual data generated from pre_processing
- [Question]retrained the original TransCoder model, translation was not good
- Parallel dataset generated by TransCoder-ST
- Go and Rust models HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codegen.