chenhuanwen-source / wikipedia-extractor Goto Github PK
View Code? Open in Web Editor NEWThis project forked from bwbaugh/wikipedia-extractor
This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wikiextractor --- Extracts and cleans text from Wikipedia database dump and stores output in a number of files of similar size in a given directory.
Home Page: http://medialab.di.unipi.it/wiki/Wikipedia_Extractor