Git Product home page Git Product logo

eduktm's Introduction

EduKTM

PyPI test codecov Download License DOI

The Model Zoo of Knowledge Tracing Models.

Brief introduction to KTM

Knowledge Tracing (KT), which aims to monitor students’ evolving knowledge state, is a fundamental and crucial task to support these intelligent services. Therefore, an increasing amount of research attention has been paid to this emerging area and considerable progress has been made[1]. However, the code of these works may use different program languages (e.g., python, lua) and different deep learning frameworks (e.g., tensorflow, torch and mxnet). Furthermore, some works did not well organize the codes systemly (e.g., the missing of running environments and dependencies), which brings difficulties in reproducing the models. To this end, we put forward the Model Zoo of Knowledge Tracing Models, named EduKTM, which collects most of concurrent popular works.

List of models

Contribute

EduKTM is still under development. More algorithms and features are going to be added and we always welcome contributions to help make EduKTM better. If you would like to contribute, please follow this guideline.

Citation

If this repository is helpful for you, please cite our work

@misc{bigdata2021eduktm,
  title={EduKTM},
  author={bigdata-ustc},
  publisher = {GitHub},
  journal = {GitHub repository},
  year = {2021},
  howpublished = {\url{https://github.com/bigdata-ustc/EduKTM}},
}

Reference

[1] Liu Q, Shen S, Huang Z, et al. A Survey of Knowledge Tracing[J]. arXiv preprint arXiv:2105.15106, 2021.

eduktm's People

Contributors

0russwest0 avatar fannazya avatar ljyustc avatar sone47 avatar tswsxk avatar weizhehuang0827 avatar xbh0720 avatar xubihan0720 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eduktm's Issues

Data processing of LPKT

Description

There is only code to preprocess data on ASSISTChallenge in EduKTM-LPKT.
I tried to imitate your code to process the data of ASSIST12 and Ednet, but the experimental results are far from the results in the paper.
Can other code for processing data be open sourced?

关于LPKT的数据处理

image
在LPKT的assistment2017数据集的预处理中
一个problem不是可以对应多个skill吗,这样子处理的话在字典中一个problem只能对应一个skill,我没理解错的话...

The training module can't be found,do you delete the file?

Description

(A clear and concise description of what the feature is.)

  • If the proposal is about a new dataset, provide description of what the dataset is and
    attach the basic data analysis with it.
  • If the proposal is about an API, provide mock examples if possible.

References

  • list reference and related literature
  • list known implementations

Knowledge State Depiction for DKVMN

Dear Dr. Tong,

first of all, thank you so much for putting this comprehensive overview of KTMs together. Based on your survey on Knowledge Tracing and the examples and algorithms you collected here I was able to analyse my data using a DKVMN.

While the results of the RNN are very promising I would like to now generate the knowledge state depictions. Do you by any chance are aware of whether the code for the calculations and visualisations can be found somewhere?

Kind regards,
Carl Klukkert

[Bug: AKT]

🐛 Description

attention function in AKTNet.py has a padding error, which may cause data leakage.

What have you tried to solve it?

original:
image

solution:
image

Environment

(Conda) Python3.8

Additional Info

This bug won't cause any runtime error. It use students' response information in error.

process_raw_pred方法计算逻辑

首先非常感谢贵校提供的代码,我在看这段的时候有点疑问,不知道是不是代码逻辑问题呢🙋

def process_raw_pred(raw_question_matrix, raw_pred, num_questions: int) -> tuple:
questions = torch.nonzero(raw_question_matrix)[1:, 1] % num_questions ##torch.nonzero(raw_question_matrix)[1:, 1]表示raw_question_matrix中非0的位置,即用户的有效回答
length = questions.shape[0]
pred = raw_pred[: length]
pred = pred.gather(1, questions.view(-1, 1)).flatten()
truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions #truth表示真实作答情况,0表示回答正确,1表示回答错误,与原始数据相反?
# truth = 1 - truth#这里逻辑是不是写错了!! 0表示回答正确,1表示回答错误,与原始数据相反?

return pred, truth

关键在这里➡️
truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions
比如one -hot encode后 125错了 numofq=100 那就是encode_q[125]=1 ,但是还原时候,125//100 =1 ,这代表的是回答正确呀?

希望您的解答,谢谢

Questions about `process_raw_pred` function in DKT.py

Hi Dr. Tong,

There's a function named process_raw_pred in EduKTM/EduKTM/DKT/DKT.py .

EduKTM/EduKTM/DKT/DKT.py

Lines 31 to 37 in c9912f0

def process_raw_pred(raw_question_matrix, raw_pred, num_questions: int) -> tuple:
questions = torch.nonzero(raw_question_matrix)[1:, 1] % num_questions
length = questions.shape[0]
pred = raw_pred[: length]
pred = pred.gather(1, questions.view(-1, 1)).flatten()
truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions
return pred, truth

According to the codes below (line 56), we can learn thatprocess_raw_pred is used to process the raw input and the output of the DKT model.

EduKTM/EduKTM/DKT/DKT.py

Lines 50 to 58 in c9912f0

for e in range(epoch):
all_pred, all_target = torch.Tensor([]), torch.Tensor([])
for batch in tqdm.tqdm(train_data, "Epoch %s" % e):
integrated_pred = self.dkt_model(batch)
batch_size = batch.shape[0]
for student in range(batch_size):
pred, truth = process_raw_pred(batch[student], integrated_pred[student], self.num_questions)
all_pred = torch.cat([all_pred, pred])
all_target = torch.cat([all_target, truth.float()])

I have three questions.

  1. I noticed that questions = torch.nonzero(raw_question_matrix)[1:, 1] % num_questions in line 32. [1:, 1] here we goes from index 1, which means we throw away the first answer whose index is 0. Do you mean that the first value is not predicted and it is meaningless because it depends on no history answer records?
  2. About pred = raw_pred[: length] in line 34, here we goes from index 0. Why don't we throw away the first predicted value just like what we did in line 32? e.g. pred = raw_pred[1 : length+1]
  3. About truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions in line 36, we use // so that we can get 0 if non-zeros are in the first half part (correct answers), get 1 if non-zeros are in the second half part(wrong answers).
    However, according to the encode_onehot function in examples/DKT/prepare_dataset.ipynb, correct answers are in the first half part and wrong answers are in the second half part. Also, 1 stands for a correct answer and 0 stands for a wrong answer.
def encode_onehot(sequences, max_step, num_questions):
    result = []

    for q, a in tqdm.tqdm(sequences, 'convert to one-hot format: '): # e.g. q: [1,2,3]  a: [1,0,0]
        length = len(q)
        # append questions' and answers' length to an integer multiple of max_step
        mod = 0 if length % max_step == 0 else (max_step - length % max_step)
        onehot = np.zeros(shape=[length + mod, 2 * num_questions])
        print(length+mod)
        for i, q_id in enumerate(q):
            # if a[i]>0(correct answer),index=question id(first half part),else index=question id + question number(second half part)
            index = int(q_id if a[i] > 0 else q_id + num_questions)
            onehot[i][index] = 1 # correct answers are in the first half part
        result = np.append(result, onehot)
    
    return result.reshape(-1, max_step, 2 * num_questions)

So truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions is not consistent with the encoding. To validate my thoughts, I ran the DKT example and print the torch.nonzero(raw_question_matrix) and truth, which are used to compare with the encoding result stored in test.txt.

Here are two screenshots which i got from console and test.txt,

viewfile

image

This shows my thought is right.

Then I added a line truth = torch.tensor([1 if i == 0 else 0 for i in truth]) to test performence.

add

The average AUC is about 0.73, equal to the moment before adding this line.

image

image

Sorry for the long description : ( .Your answers are appreciated!😀

question about dataprepare

Hello, thank you very much for making the LPKT code public, but I overfit when reproducing the assist12 dataset, can you share with me the data processing code of assist12 and ednet?

DKT training fails for batch size of 1

🐛 Description

DKT training fails for batch size of 1, but works for larger batch sizes (i.e. 64)

Error Message

RuntimeErrorTraceback (most recent call last)
<ipython-input-15-0a528344bb33> in <module>
      5 # Initialize and train model
      6 dkt = DKT(NUM_QUESTIONS, HIDDEN_SIZE, NUM_LAYERS)
----> 7 dkt.train(train_loader, epoch=50)
      8 
      9 # Save weights

/usr/local/lib/python3.6/dist-packages/EduKTM/DKT/DKT.py in train(self, train_data, test_data, epoch, lr)
     61                 # back propagation
     62                 optimizer.zero_grad()
---> 63                 loss.backward()
     64                 optimizer.step()
     65 

/usr/local/lib/python3.6/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    305                 create_graph=create_graph,
    306                 inputs=inputs)
--> 307         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    308 
    309     def register_hook(self, hook):

/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    154     Variable._execution_engine.run_backward(
    155         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 156         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    157 
    158 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

To Reproduce

Run the example notebook (https://github.com/bigdata-ustc/EduKTM/blob/main/examples/DKT/DKT.ipynb) and set the BATCH_SIZE variable to 1.

What have you tried to solve it?

Increasing the batch size avoids the error.

Environment

Environment Information

Operating System: Ubuntu in docker image: tensorflow/tensorflow:2.6.0-gpu-jupyter.
Also tested in Google Colab notebook.

Python Version: Python 3.6.9

pip freeze:

absl-py==0.13.0
aiocontextvars==0.2.2
altair==4.1.0
ansi2html==1.6.0
anyio==3.3.4
argon2-cffi==20.1.0
asgiref==3.4.1
asn1crypto==0.24.0
astor==0.8.1
astunparse==1.6.3
async-generator==1.10
attrs==21.2.0
Babel==2.9.1
backcall==0.2.0
backports.zoneinfo==0.2.1
base58==2.1.0
beautifulsoup4==4.10.0
bleach==4.0.0
blinker==1.4
Brotli==1.0.9
bs4==0.0.1
cached-property==1.5.2
cachetools==4.2.2
certifi==2021.5.30
cffi==1.14.6
charset-normalizer==2.0.4
clang==5.0
click==7.1.2
contextlib2==21.6.0
contextvars==2.4
cryptography==2.1.4
cycler==0.10.0
dash==2.0.0
dash-core-components==2.0.0
dash-html-components==2.0.0
dash-table==5.0.0
dataclasses==0.8
decorator==4.4.2
defusedxml==0.7.1
EduData==0.0.18
EduKTM==0.0.9
entrypoints==0.3
et-xmlfile==1.1.0
fastapi==0.70.0
fire==0.4.0
Flask==2.0.2
Flask-Compress==1.10.1
flatbuffers==1.12
gast==0.4.0
gitdb==4.0.8
GitPython==3.1.18
google-auth==1.34.0
google-auth-oauthlib==0.4.5
google-pasta==0.2.0
grpcio==1.39.0
h11==0.12.0
h5py==3.1.0
idna==3.3
immutables==0.16
importlib-metadata==4.6.3
importlib-resources==5.3.0
ipykernel==5.5.6
ipython==7.16.1
ipython-genutils==0.2.0
ipywidgets==7.6.3
itsdangerous==2.0.1
jedi==0.18.0
Jinja2==3.0.1
joblib==1.1.0
json5==0.9.6
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyter-dash==0.4.0
jupyter-http-over-ws==0.0.8
jupyter-server==1.11.1
jupyterlab==3.0.16
jupyterlab-pygments==0.1.2
jupyterlab-server==2.8.2
jupyterlab-widgets==1.0.0
keras==2.6.0
Keras-Preprocessing==1.1.2
keyring==10.6.0
keyrings.alt==3.0
kiwisolver==1.3.1
loguru==0.5.3
longling==1.3.32
lxml==4.6.3
Markdown==3.3.4
MarkupSafe==2.0.1
matplotlib==3.3.4
mistune==0.8.4
nbclassic==0.3.3
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
networkx==2.5.1
notebook==6.4.3
numpy==1.19.5
oauthlib==3.1.1
openpyxl==3.0.9
opt-einsum==3.3.0
opyrator==0.0.12
packaging==21.0
pandas==1.1.5
pandocfilters==1.4.3
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.3.1
plotly==5.3.1
prometheus-client==0.11.0
prompt-toolkit==3.0.19
protobuf==3.17.3
ptyprocess==0.7.0
pyarrow==5.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pycrypto==2.6.1
pydantic==1.8.2
pydeck==0.6.2
Pygments==2.9.0
PyGObject==3.26.1
pygraphviz==1.6
pyparsing==2.4.7
pyrsistent==0.18.0
python-apt==1.6.5+ubuntu0.7
python-dateutil==2.8.2
pytz==2021.3
pytz-deprecation-shim==0.1.0.post0
pyxdg==0.25
PyYAML==6.0
pyzmq==22.2.1
qtconsole==5.1.1
QtPy==1.9.0
rarfile==4.0
requests==2.26.0
requests-oauthlib==1.3.0
requests-unixsocket==0.2.0
retrying==1.3.3
rsa==4.7.2
scikit-learn==0.24.2
scipy==1.5.4
seaborn==0.11.2
SecretStorage==2.3.1
Send2Trash==1.8.0
six==1.15.0
sklearn==0.0
smmap==5.0.0
sniffio==1.2.0
soupsieve==2.2.1
starlette==0.16.0
streamlit==1.1.0
tenacity==8.0.1
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow==2.6.0
tensorflow-estimator==2.6.0
termcolor==1.1.0
terminado==0.10.1
testpath==0.5.0
threadpoolctl==3.0.0
toml==0.10.2
toolz==0.11.1
torch==1.10.0
tornado==6.1
tqdm==4.62.3
traitlets==4.3.3
typer==0.4.0
typing-extensions==3.7.4.3
tzdata==2021.4
tzlocal==4.0.1
urllib3==1.26.6
uvicorn==0.15.0
validators==0.18.2
watchdog==2.1.6
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.2.1
Werkzeug==2.0.1
widgetsnbextension==3.5.1
wrapt==1.12.1
zipp==3.5.0

An error in the implementation of DKT!!

🐛 Description

In DKT of EduKTM, the loss function is BCEWithLogitLoss() , but there is already a sigmoid() at the end of forward() of Net , So, shouldn’t it be BCELoss() here?
When I replace BCEWithLogitLoss() with BCELoss() , the performance has been greatly improved. ( AUC on 2009_skill_builder_data_corrected from ~0.75 to ~0.80)

This looks like a big mistake, does this mean that there are some problems with the experimental results of many papers?

knowledge state visualization

In LPKT code, how can the knowledge state of students be visualized? It seems that only one question can be answered at a time, so how can we obtain the mastery level of other knowledge concepts?

Adding GKT model fails through pytest

I have made the local system the same as the online system and carefully tested the code in my local system (all testing passed). However, I encountered the exception: E AttributeError: '_io.StringIO' object has no attribute 'buffer' after raising a PR #22 for adding GKT model. To solve the problem, I downgraded flake8 4.0.1 to 3.9.2 in setup.py and got the pycodestyle error report.

Dataset

I attend to run the program, but I can't download Assist12 and Assistmentchall dataset from their official website. Would you like to share these two dataset.

Question about Figure 1, 3 on the LPKT paper

Hello, Thanks for opening codes for this project.
I want to check the proficiency change of every knowledge during learning process as in the figure 1, 3. of your LPKT paper.
How did you get the proficiency level from 0.0 to 1.0?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.