Comments (1)
Do I now have to read the files using file.read() or somesuch?
Yes, exactly. For most text corpora, you need to write your own code to open these files.
The usual pattern to open a file:
with open (<filepath>) as file_open:
text = file_open.read()
If you want to open multiple files in a directory, look at this: https://stackoverflow.com/a/3207973.
Is the general goal to not use the corpora for analysis but to use them as trained data sets to analyze some other texts that we find?
Both, in fact! Some repos are simply for reading (https://github.com/cltk/latin_text_latin_library) while others just for training (https://github.com/cltk/latin_training_set_sentence_cltk). Others could be used for both, perhaps! (For example, using a treebank to either train a predictive model or to do statistics).
What is your goal in using the CLTK?
Closing but post back here with other questions.
from tutorials.
Related Issues (12)
- Tried again to install NLTK on a Mac with python 3.6.0 HOT 2
- Stuck again HOT 8
- Consider adding lex dispersion plot to tutorial notebook HOT 6
- Possible edit in Tutorial 9 notebook - importing of indian_punctuation_tokenize_regex HOT 1
- Nothing here ? HOT 8
- Import error in Tutorial 3 for Latin stops
- problems with import HOT 4
- ModuleNotFoundError: No module named 'cltk.corpus' (resolved from previous query) HOT 15
- from cltk.corpus.greek.tei import onekgreek_tei_xml_to_text in version 1.0
- Issue in 3. Basic NLP.ipynb tokenize_sentence() example. HOT 2
- pyvenv HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tutorials.