Comments (6)
For Sam:
- Run Code2Vec and checkout the architecture
- Read the three papers and discuss the results
- Set-up ds4se environment
from ds4se.
Updated work plan:
- Check and analyze literature review in structural models indeep learning
- Set up DS4SE environment
- Test out the original code2vec
- Using code2vec as a library to generate embedding vectors from CodeSearchNet-Java
- Use RNNs to build an autoencoder to compress the dimension of embedding vectors
- Move our autoencoder to use code2vec encoder
- Later, to use transformers decoder
- Evaluate our model for clone detection
- Test our autoencoder for interpretability by actively trans-forming the data and evaluate the output
- If time permits, extend the work to traceability and codegeneration
from ds4se.
Meeting note 2021-03-18:
- Use the Keras autoencoder as a template to train against CodeSearchNet-Java.
- Use google/sentencepiece for input tokenization.
- Add an embedding layer after it.
- Use multiclass cross entropy for loss function (binary doesn't work).
- Don't deploy code2vec yet. It doesn't fit.
References:
- https://blog.keras.io/building-autoencoders-in-keras.html
- https://www.tensorflow.org/tutorials/text/nmt_with_attention
from ds4se.
Meeting note 2021-03-25:
- Ignore excessive length snippets
- Prepare for "design of the case studies"
- Refine previous sections
from ds4se.
Update 2021-04-01:
- Finally got the autoencoder training running.
- The network is shrunk in its dimension to deal with OOM situations.
- Not seeing significant loss reduction.
- Need help connecting to GPU inside Docker. (Someone has eaten the VRAM.)
- Need some help in "design of the case studies", if possible, a guideline in text version.
from ds4se.
Meeting note 2021-04-02:
Tasks:
- Complete sampling
- How to incorporate code2vec
Sampling (2 types):
- focus on encoder, obtain the middle vectors (<- for now)
- create random noise, how do they generate code
Case of studies:
The experiments we are going to run.
- Checking the clones (clone library provided)
- Test the encoder & decoder of GRU
- The other case: encoder is code2vec, decoder is GRU
from ds4se.
Related Issues (20)
- do(code): A Causal Inference Framework to Understand and Explain Source Code Properties
- Integrate pydriller tool
- Integrate Comet into DS4SE
- BPE 32K and 128K SACP
- DS4SE Analysis
- Related work for traceability
- Causality library exploration
- Complete part 1 and 2 of the doWhy tutorial presented at ACM KDD 2018 HOT 1
- Learn how to instantiate the CausalModel class HOT 1
- Learn how to call the identify_effect method of the CausalModel class HOT 1
- Learn how to call the estimate_effect method of the CausalModel class HOT 1
- Learn how to call the refute_estimate method of the CausalModel class HOT 1
- Look into the new do-sampler feature of doWhy
- Integrate traceability data into the causal prototype established
- Develop potential outcome graphical model with preprocessing as intervention
- Create .gml representation of causal graph for preprocessing intervention =
- Final tasks iS2S HOT 1
- Final Task do(code) HOT 3
- Unconditional Generation Status HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ds4se.