Git Product home page Git Product logo

Comments (6)

danaderp avatar danaderp commented on July 17, 2024

For Sam:

  • Run Code2Vec and checkout the architecture
  • Read the three papers and discuss the results
  • Set-up ds4se environment

from ds4se.

m13253 avatar m13253 commented on July 17, 2024

Updated work plan:

  • Check and analyze literature review in structural models indeep learning
  • Set up DS4SE environment
  • Test out the original code2vec
  • Using code2vec as a library to generate embedding vectors from CodeSearchNet-Java
  • Use RNNs to build an autoencoder to compress the dimension of embedding vectors
  • Move our autoencoder to use code2vec encoder
  • Later, to use transformers decoder
  • Evaluate our model for clone detection
  • Test our autoencoder for interpretability by actively trans-forming the data and evaluate the output
  • If time permits, extend the work to traceability and codegeneration

from ds4se.

m13253 avatar m13253 commented on July 17, 2024

Meeting note 2021-03-18:

  • Use the Keras autoencoder as a template to train against CodeSearchNet-Java.
  • Use google/sentencepiece for input tokenization.
  • Add an embedding layer after it.
  • Use multiclass cross entropy for loss function (binary doesn't work).
  • Don't deploy code2vec yet. It doesn't fit.

References:

from ds4se.

m13253 avatar m13253 commented on July 17, 2024

Meeting note 2021-03-25:

  • Ignore excessive length snippets
  • Prepare for "design of the case studies"
  • Refine previous sections

from ds4se.

m13253 avatar m13253 commented on July 17, 2024

Update 2021-04-01:

  • Finally got the autoencoder training running.
  • The network is shrunk in its dimension to deal with OOM situations.
  • Not seeing significant loss reduction.
  • Need help connecting to GPU inside Docker. (Someone has eaten the VRAM.)
  • Need some help in "design of the case studies", if possible, a guideline in text version.
    image

from ds4se.

m13253 avatar m13253 commented on July 17, 2024

Meeting note 2021-04-02:

Tasks:

  1. Complete sampling
  2. How to incorporate code2vec

Sampling (2 types):

  1. focus on encoder, obtain the middle vectors (<- for now)
  2. create random noise, how do they generate code

Case of studies:

The experiments we are going to run.

  • Checking the clones (clone library provided)
  • Test the encoder & decoder of GRU
  • The other case: encoder is code2vec, decoder is GRU

from ds4se.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.