Git Product home page Git Product logo

automated-essay-scoring's Introduction

Automated Student Assessment and Essay Generator

Background

The Hewlett Foundation has provided a set of high school student essays along with scores generated by human expert graders. The initial data was released in 20121 as part of a Kaggle competition to produce an automated student assessment algorithm to closely match the human scores. Scores are evaluated with the quadratic weighted kappa error metric, which measures the agreement between two raters.

Since then, a few teams have published their attempts to match or improve on the original challenge. Initial success was somewhat disappointing2 as the Kappa scores were around 0.5, but improved significantly by incorporating modern NLP with neural networks3.

Sadly, it is no longer possible to submit kernels and the human graded scores for the validation and test sets have not be made public. Therefore it is perhaps unreasonable to compare kappa scores from the training data set with Kaggle leaderboard where models were trained on the full training data set, validated on the validation set and evaluated on the test data set.

Further confounding the issue, some published kappa scores are based on a subset of essay topics4, or evaluated by combining all scores together instead of individually by topic.

Approach

My goal is to see if current NLP algorithms can improve upon the 2012 attempts. Additionally, can the essay and score combinations be used to automatically generate essays?

Assessment can be performed in a variety of ways. For example, in an unsupervised approach, topic modeling can be performed to assign scores based on derived word probabilities.

A supervised approach is possible using machine learning on extracted features, such as named entities, syntax or labelled dependencies.

Neural networks have the advantage of working very well with word embeddings and their typically large number of features (dimensions).

Essay generation can be performed with recurrent neural network algorithms.

The data is provided as separate training, test and validation sets. The training data contains about 1700 essays for each of 7 topics and about 500 essay for an eighth topic. Essays are either source dependent responses or persuasive/narrative/expository on a given topic. The code is executed across four notebooks:

1 EDA and Topic Modeling with LDA

2 Automatic Scoring with Machine Learning

3 Automatic Scoring with Neural Networks

4 Automatic Student Essay Generation

Outlook

This problem set has commercial impact far beyond student assessment and many applications can be tackled with nearly the same approach. For example:

  • Given a set of financial documents, which one should a manager read first?

  • Which products can be effectively marketed to users based on their social media postings?

  • Detection of fake news vs real news.

  • Sentiment analysis on a graded scale, e.g. very upset - upset - satisfied - happy - very happy.

References

The original Kaggle competition can be found here:

https://www.kaggle.com/c/asap-aes/data

A selection of published work on the Kaggle ASAP data is given below:

1 https://www.kaggle.com/c/asap-aes

2 https://nlp.stanford.edu/courses/cs224n/2013/reports/song.pdf

3http://aclweb.org/anthology/D/D16/D16-1193.pdf

4 https://github.com/vasu5235/Kaggle-Automated-Essay-Checking-System/blob/master/Capstone%20report/capstone_report.pdf

5 https://github.com/m-chanakya/AutoEssayGrading/blob/master/papers/paper1.pdf

6 http://dspace.bracu.ac.bd/xmlui/bitstream/handle/10361/5399/12101114.pdf?sequence=1&isAllowed=y

automated-essay-scoring's People

Contributors

turanga1 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.