Git Product home page Git Product logo

aaai19-diagnostic-reasoning's Introduction

Challenges in the Automatic Analysis of Students' Diagnostic Reasoning

The following repository contains the code for our different evalution metrics applicable to multi-label sequence-labelling tasks such as epistemic activity identification. It also provides the code for training single- and multi-output Bi-LSTMs. The new corpora can be obtained on request, allowing to replicate all experiments of our paper.

Citation

If you find the implementation useful, please cite the following two papers:

@inproceedings{Schulz:2019:AAAI,
	title = {Challenges in the Automatic Analysis of Students' Diagnostic Reasoning},
	author = {Schulz, Claudia and Meyer, Christian M. and Gurevych, Iryna},
	publisher = {AAAI Press},
	booktitle = {Proceedings of the 33rd AAAI Conference on Artificial Intelligence},
	year = {2019},
	note = {(to appear)},
	address = {Honolulu, HI, USA}
}

@misc{SchulzEtAl2018_arxiv,
	author = {Schulz, Claudia and Meyer, Christian M. and Sailer, Michael and Kiesewetter, Jan and Bauer, Elisabeth and Fischer, Frank and Fischer, Martin R. and Gurevych, Iryna},
	title = {Challenges in the Automatic Analysis of Students' Diagnostic Reasoning},
	year = {2018},
	howpublished = {arXiv:1811.10550},
	url = {https://arxiv.org/abs/1811.10550}
}

Abstract: We create the first corpora of students' diagnostic reasoning self-explanations from two domains annotated with the epistemic activities hypothesis generation, evidence generation, evidence evaluation, and drawing conclusions. We propose a separate performance metric for each challenge we identified for the automatic identification of epistemic activities, thus providing an evaluation framework for future research:

  1. the correct identification of epistemic activity spans,
  2. the reliable distinction of similar epistemic activities, and the
  3. detection of overlapping epistemic activities.

Contact person: Claudia Schulz, [email protected]

Alternative contact person: Jonas Pfeiffer, [email protected]

https://www.ukp.tu-darmstadt.de/

http://famulus-project.de

Please send us an e-mail if you want to get access to the corpora. Don't hesitate to contatct us to report issues or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Experimental setup

All code is run using Python 3. In all scripts, we specify where the user has to adapt the code (mostly file paths) with 'USER ACTION NEEDED'.

Neural Network Experiments

The folder "neuralNetwork_experiments" contains the code required to train the neural networks. Our Bi-LSTM architectures are based on the implementation of Nils Reimers (NR): https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf

  • neuralnets -- contains BiLISTM2.py for the single-output architecture and BiLSTM2_multipleOutput.py for the multi-output architecture
  • util -- various scripts for processing data and other utilities by NR
  • data -- on request we provide train.txt, dev.txt, test.txt for all experimental setups

Setup with virtual environment (Python 3)

Set up a Python virtual environment (optional):

virtualenv --system-site-packages -p python3 env
source env/bin/activate

Install the requirements:

.env/bin/pip3 install -r requirements.txt

Get the word embeddings

  • Download German (text) fastText embeddings from github and place it in the neuralNetwork_experiments folder
  • Run embeddingsFirstLine.py to remove the first line (header)

Run the Experiments

  • to train models for prefBaseline, concat, or separate, use train_singleOutput.py
  • to train models for multiOutput, use train_multiOutput.py
  • to use a trained model for prediction run runModel_singleOutput.py and trainModel_multiOutput.py NOTE: the loading of multiOutput models assumes a static layout, this needs to be changed if the model parameters are changed

Evaluation Metrics

The folder "evaluation" contains the code required to use our evaluation framework. evaluate.py implements our different evaluation metrics.

  • use the runModel scripts to create predictions for all (test) files
  • evaluate.py assumes the following folder structure of prediction results:
    • MeD / TeD for the two domains
      • separate, pref, concat, separate - folders for each method
        • MeD_pref1, MeD_pref2, ... - 10 folders with predicition files for 10 models trained for this model
        • note that "separate" has 4 subfolders (separate_dc, separate_hg, separate_ee, separate_eg) for the 4 epistemic activities, each with 10 subfolders for the results of the 10 models
      • goldData - gold annotations for the prediction files
      • human - different set of files used to evaluate human upper bound (all files annotated by all annotators)
        • MeD_human1, ... - annotations of each annotator
        • goldData - gold labels for the files used to evaluate human performance

aaai19-diagnostic-reasoning's People

Contributors

claudiaschulz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.