Light

myroslavarm / bachelor_thesis Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 4.47 MB

Bachelor thesis on the topic of Improving Code Completion in Pharo Using N-gram Language Models

TeX 100.00%

bachelor_thesis's Introduction

Hi there 👋

about me

product manager. ex software engineering and research intern. CS grad
used to work with pharo a lot, you can find my projects here

get in touch

linkedin: https://www.linkedin.com/in/myroslavarm/
medium blog: https://medium.com/@myroslavarm

bachelor_thesis's People

Contributors

Watchers

bachelor_thesis's Issues

Add screenshots where needed

Benchmarking in the implementation chapter
Examples of code (how the models / sorting strategies were implemented)

Add a summary for every chapter

To shortly wrap up everything that was explained.

Perform manual qualitative evaluation and record the results

Find 10 cases and test them on all three sorting strategies and compare the accuracy of results.

General ToDo

Evaluation ToDo

qualitative

example 1 : collection vs col -- does it give a better result?
example 2 : what is the token that it takes here?
example 3 : fix formulation without the 'only'
example 4 : fix token type to tokens
example 6 : check 'word before' if we are at the beginning
example 7 : check why it suggests so many wrong things (maybe)

quantitative

check manual results & do a write-up on why bigram performs worse
i.e. does it deal worse with parantheses, brackets, etc.

general

in the conclusion chapter in future work describe what i would do with the qualitative or quantitative evaluation if i had more time

Explain why bigram performs worse

so there are several ideas but the main suspicions are:

when training, some problems involving punctuation and delimiters, such as ; ] ) etc.
when training, essentially tokenisation problems in the ngram model, because in a lot of cases something like parantheses might be missed and so the bigram dependency will be trained differently (think something = 1 ifTrue: the completion itself is wrong because it's for an integer, but also when we train the ngram sequence is also number-based, and then during the actual evaluation when we are comparing the actual code that was typed in and calculating the accuracy it will be way off because our typing/completion is different from the end result)
another thing is with error nodes, because we might be getting the wrong ast, and hence getting wrong placeholder stuff for the actual code at the end, so again training/actual code discrepancy

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.