In this bachelor`s thesis project, various approaches to the construction of vector representations (feature spaces) of words and documents are explored. On the basis of these representations, the task of classifying scientific documents into 26 classes is solved and a comparative analysis of models is carried out.
- The corpus of scientific documents has been formed (>160 thousand articles).
- The models of distributive semantics are reviewed.
- Vector representations of documents are constructed.
- Classification models have been trained and tested in the feature space of documents.
- The structure of vector space of documents is described.