Challenges in the Automatic Analysis of Students' Diagnostic Reasoning

The following repository contains the code for our different evalution metrics applicable to multi-label sequence-labelling tasks such as epistemic activity identification. It also provides the code for training single- and multi-output Bi-LSTMs. The new corpora can be obtained on request, allowing to replicate all experiments of our paper.

Citation

If you find the implementation useful, please cite the following two papers:

@inproceedings{Schulz:2019:AAAI,
	title = {Challenges in the Automatic Analysis of Students' Diagnostic Reasoning},
	author = {Schulz, Claudia and Meyer, Christian M. and Gurevych, Iryna},
	publisher = {AAAI Press},
	booktitle = {Proceedings of the 33rd AAAI Conference on Artificial Intelligence},
	year = {2019},
	note = {(to appear)},
	address = {Honolulu, HI, USA}
}

@misc{SchulzEtAl2018_arxiv,
	author = {Schulz, Claudia and Meyer, Christian M. and Sailer, Michael and Kiesewetter, Jan and Bauer, Elisabeth and Fischer, Frank and Fischer, Martin R. and Gurevych, Iryna},
	title = {Challenges in the Automatic Analysis of Students' Diagnostic Reasoning},
	year = {2018},
	howpublished = {arXiv:1811.10550},
	url = {https://arxiv.org/abs/1811.10550}
}

Abstract: We create the first corpora of students' diagnostic reasoning self-explanations from two domains annotated with the epistemic activities hypothesis generation, evidence generation, evidence evaluation, and drawing conclusions. We propose a separate performance metric for each challenge we identified for the automatic identification of epistemic activities, thus providing an evaluation framework for future research:

the correct identification of epistemic activity spans,

the reliable distinction of similar epistemic activities, and the

detection of overlapping epistemic activities.

Contact person: Claudia Schulz, [email protected]

Alternative contact person: Jonas Pfeiffer, [email protected]

https://www.ukp.tu-darmstadt.de/

http://famulus-project.de

Please send us an e-mail if you want to get access to the corpora. Don't hesitate to contatct us to report issues or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Experimental setup

All code is run using Python 3. In all scripts, we specify where the user has to adapt the code (mostly file paths) with 'USER ACTION NEEDED'.

Neural Network Experiments

The folder "neuralNetwork_experiments" contains the code required to train the neural networks. Our Bi-LSTM architectures are based on the implementation of Nils Reimers (NR): https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf

neuralnets -- contains BiLISTM2.py for the single-output architecture and BiLSTM2_multipleOutput.py for the multi-output architecture
util -- various scripts for processing data and other utilities by NR
data -- on request we provide train.txt, dev.txt, test.txt for all experimental setups

Setup with virtual environment (Python 3)

Set up a Python virtual environment (optional):

virtualenv --system-site-packages -p python3 env
source env/bin/activate

Install the requirements:

.env/bin/pip3 install -r requirements.txt

Get the word embeddings

Download German (text) fastText embeddings from github and place it in the neuralNetwork_experiments folder
Run embeddingsFirstLine.py to remove the first line (header)

Run the Experiments

to train models for prefBaseline, concat, or separate, use train_singleOutput.py
to train models for multiOutput, use train_multiOutput.py
to use a trained model for prediction run runModel_singleOutput.py and trainModel_multiOutput.py NOTE: the loading of multiOutput models assumes a static layout, this needs to be changed if the model parameters are changed

Evaluation Metrics

The folder "evaluation" contains the code required to use our evaluation framework. evaluate.py implements our different evaluation metrics.

use the runModel scripts to create predictions for all (test) files
evaluate.py assumes the following folder structure of prediction results:
- MeD / TeD for the two domains
  - separate, pref, concat, separate - folders for each method
    - MeD_pref1, MeD_pref2, ... - 10 folders with predicition files for 10 models trained for this model
    - note that "separate" has 4 subfolders (separate_dc, separate_hg, separate_ee, separate_eg) for the 4 epistemic activities, each with 10 subfolders for the results of the 10 models
  - goldData - gold annotations for the prediction files
  - human - different set of files used to evaluate human upper bound (all files annotated by all annotators)
    - MeD_human1, ... - annotations of each annotator
    - goldData - gold labels for the files used to evaluate human performance

titaofdata / aaai19-diagnostic-reasoning Goto Github PK

aaai19-diagnostic-reasoning's Introduction

Challenges in the Automatic Analysis of Students' Diagnostic Reasoning

Citation

Experimental setup

Neural Network Experiments

Setup with virtual environment (Python 3)

Get the word embeddings

Run the Experiments

Evaluation Metrics

aaai19-diagnostic-reasoning's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent