Document Classification
Classificate documents on categories
Requirements
Please see requirements.txt.
To install these packages, use
$ pip install -r requirements.txt
Training data
Based on Reuters-21578 files.
Available in sgm format on
classification/data/
Trained data's topics can be found in
classification/data/all-topics-strings.lc.txt
To train/test, run the following scripts from classification/
Train
$ python train_and_classify_reuters_data.py
Classify
$ python test_classifier.py
Will predict the topic for the testing articles from
classification/data/reuters_test_json/reuters_test1.json
If all the json testing articles are provided with
the specific topics field mentioned,
it will show the
Hit-rate score and the Confusion matrix .
Example
$ python train_and_classify_reuters_data
$ python test_classifier.py