persiannlp / persian-raw-text Goto Github PK
View Code? Open in Web Editor NEWPersian raw text - حدود ۸۰ گیگابایت متن خام فارسی
Persian raw text - حدود ۸۰ گیگابایت متن خام فارسی
Thanks for this great raw text corpora! I wonder if you could provide more info about presian commoncrawl data. How it is extracted from commoncrawl data, is there a way to search in crawled site indices (like http://index.commoncrawl.org). Is it for example extracted from all indices which has "languages": "fas"
attribute in a specific period of years?
An elegantly inclusive documentation is equally as important as all other qualities of a software project.
would you please be kind enough to add within the readme file, that given the thousands of choices you have had during the design and development process of your dataset, you have optimized it so that for what usecases it is most suitable, and for what possible usecases it is not.
Hi,
Would you tell me what is the exact size of the corpus.
I have tried some times to download and now I am not sure if the downloaded file is a complete one or not.
My file (all corpus) shows 74991354317 bytes as its size. Is it correct?
PS: Maybe it is a good idea if there were e.g. sha1 digests of files.
PS2: Another point, it may be better if the files had provided as compressed.
Thanks
Hamzeh
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.