Comments (4)
@bikashg as you say, the Reuters dataset is a document term matrix, so I think you can copy that format.
For example, you might have this doc:term matrix for your data in a data.csv
file:
doc_id,1,2,3,4,5, ...
doc_1,0,0,5,1,2, ...
doc_2,0,1,2,3,4, ...
The following script loads in data in the above format and fits the provided LDA model to that data:
import numpy as np
import lda
import csv
matrix = np.loadtxt(open("data.csv", "rb"), delimiter=",", skiprows=1).astype("int64")
X = matrix[:,1:] # X is training data, matrix is data with doc id
model = lda.LDA(n_topics=100, n_iter=1000, random_state=1)
model.fit(X)
To export the model results to a CSV you might do this:
doc_ids = matrix[:,0]
doc_topic = model.doc_topic_
with open('doc_topics.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
writer.writerow(["doc_id", "topic_id"])
for i, doc_id in enumerate(doc_ids):
writer.writerow([doc_id, doc_topic[i].argmax()])
The above doesn't include the printing the terms for each topic etc., but you get the idea.
from lda.
+1
from lda.
By the way, just in case what I wrote above was a bit hard to understand, you can explore the structure of data (like the Reuters example dataset) quite easily with the Python interpreter. For example:
>>> import lda.datasets
>>> X = lda.datasets.load_reuters()
>>> X
array([[1, 0, 1, ..., 0, 0, 0],
[7, 0, 2, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[1, 0, 1, ..., 0, 0, 0],
[1, 0, 1, ..., 0, 0, 0],
[1, 0, 1, ..., 0, 0, 0]], dtype=int32)
>>> X[0]
array([1, 0, 1, ..., 0, 0, 0], dtype=int32)
>>> len(X[0])
4258
>>> X[0][0]
1
from lda.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from lda.
Related Issues (20)
- How can I use a trained model to infer a new document? HOT 1
- where is collapsed gibbs sampling function code? HOT 1
- Strange results on log-likelihood calculation HOT 14
- ModuleNotFoundError: No module named 'lda._lda'; 'lda' is not a package HOT 10
- alpha not updated? HOT 1
- Can this package be compatible with MacOS M1 chip? HOT 1
- lda cannot be built from source HOT 1
- ModuleNotFoundError: No module named 'lda._lda' HOT 1
- updating process : why eta_sum?? HOT 1
- Taking over active development or at least maintenance? HOT 2
- Any progress about wheel for Python 3.11 or 3.12, thanks in advance. HOT 4
- ModuleNotFoundError: No module named 'lda._lda' HOT 3
- py312 compatability HOT 8
- Maintainer on leave
- add verbose and tolerance feature HOT 3
- Is log-likelihood calculation irrelavant with gibbs sampling? HOT 1
- Add stale bot
- Python 3.8 wheels are not available HOT 7
- Draft 2.0 release notes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lda.