The data and code of paper Topic Aware Graph.
The format of dataset of the model NTM we used is in ntm/data/gossipcop_lines.txt
, which you can find that each line in the txt file is one doc of your datas.
So first to preprocess to get the txt file:
$ python preprocess.py
And in the preprocess.py
you can change the data path.
The dataset gossipcop_v3_keep_data_in_proper_length.json
we offered here has 15,729 news.
Just clik the link gossipcop_v3_keep_data_in_proper_length.json to download it.
The 0-3,784 are fake news, and the other 3785-1,5729 are real news. Detailes can be seen in CossipCop-LLM.
$ cd ntm
$ python GSM_run.py --taskname gossipcop
Then you will find the model ckpt in ckpt/
, and change the ckpt path in the next code embedding.py
.
$ python embedding.py
After that you could get the docs embedding under topic in results/gossipcop_embed.py
, whose size is (n, 16).(The n_topic we set is 16).
The docs of different topic is stored in results/gossipcop_topic_{i}.txt
, where i
is from 0 to 15.
$ cd ..
$ python graph_construct.py
Then you can get different 16 graphs of 16 topics in data/sparse_matrix_topic_{j}.npz
, where j
is from 0 to 15.
$ python main.py
After that you will see the scores print on the terminal, and you can see the epoch-accuracy graph on the tensorboard.
More optional arguments can also be seen in main.py
.
This work has received assistance from the following. Consider citing their works if you find this repo useful.
@misc{ZLL2020,
author = {Leilan Zhang},
title = {Neural Topic Models},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/zll17/Neural_Topic_Models}},
commit = {f02e8f876449fc3ebffc66f7635a59281b08c1eb}
}
https://github.com/SZULLM/GossipCop-LLM