Git Product home page Git Product logo

topics_over_time's Introduction

No Maintenance Intended

Topics Over Time

This is an open-source implementation of A Non-Markov Continuous-Time Model of Topical Trends by Xuerui Wang and Andrew McCallum. The paper associated each LDA topic with a beta distribution over timestamps which characterized the evolution of that topic with time.

Instructions

  • Sanitize main_pnas.py and visualize_pnas.py to ensure all input directories, input files, and output directories are present.
  • Run python main_pnas.py to execute Topics over Time algorithm.
  • Run python visualize_pnas.py to visualize the topic-word distributions as well as the beta distributions showing evolution of topics with time.

Dataset

The code is tested on the PNAS titles dataset. The dataset can be found here. The resulting model is pickled and stored in the results folder.

Results

  • Topic Distributions for PNAS Titles Dataset

Topic Distributions

  • Evolution of Topics for PNAS Titles Dataset

Topic Evolution

License

GNU General Public License

Copyright © 2015 Abhinav Maurya

topics_over_time's People

Contributors

ahmaurya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

topics_over_time's Issues

File format

Hi, I am trying your code. I wonder whether the number before the time in file alltimes is necessary? For instance, you have experimented the email data, what is the number in that case? Thank you in advance!

Updating of psi

I have following two question regarding updating of parameter psi :

  1. Why 1 is added to psi[i][0] and psi[i][1] in GetMethodOfMomentsEstimatesForPsi apart from other factor mentioned in the paper ?
    psi[i][0] = 1 + timestamp_mean_common_factor
    psi[i][1] = 1 + (1-timestamp_mean)_common_factor

  2. Why time stamp is updated once for every document rather than updating for each word of every document in TopicsOverTimeGibbsSampling ?

    par['psi'] = self.GetMethodOfMomentsEstimatesForPsi(par)
    Thanks

An error in TopicsOverTimeGibbsSampling

Hello,
I run the code ,after a while, the Terminal tell me that there is an error:

     in TopicsOverTimeGibbsSampling
           new_topic = list(np.random.multinomial(1, topic_probabilities, size=1)[0]).index(1)
     ValueError: 1 is not in list

Can you tell me why the error will happen? Thanks very much!

about error

hello,i got a problem like that
Traceback (most recent call last): File "main_pnas.py", line 44, in <module> main() File "main_pnas.py", line 33, in main documents, timestamps, dictionary = tot.GetPnasCorpusAndDictionary(documents_path, timestamps_path, stopwords_path) File "/home/mere/topics_over_time/src/tot.py", line 43, in GetPnasCorpusAndDictionary stopwords.update(Set(line.lower().strip().split())) NameError: name 'Set' is not defined

Code duplication issue?

Hello,

I am noticing some duplication in the code for tot.py which doesn't seem right.

Lines 32-33 show:

		for line in fileinput.input(stopwords_path):
			stopwords.update(set(line.lower().strip().split()))

which seems to repeat in lines 42-43:

		for line in fileinput.input(stopwords_path):
			stopwords.update(Set(line.lower().strip().split()))

with the difference being the use of Set in line 43 vs the uset of set in line 33.
As far as I know there is no Set construct in Python, so that's likely an error.

I assume the latter code duplicate can be deleted as it's erroneous?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.