https://github.com/allenalvin333/College_Mini/tree/master/CSE4022_NLP
- 17th July, 2020 - Frequency and CF distribution
https://github.com/allenalvin333/College4/blob/master/T1.ipynb - 24th July, 2020 - Preprocessing: Lexicons and Stemming
https://github.com/allenalvin333/College4/blob/master/T2.ipynb - 31st July, 2020 - Text processing pipeline and Web scraping
https://github.com/allenalvin333/College4/blob/master/T3.ipynb - 21st August, 2020 - Text classification and Count vectorizer
https://github.com/allenalvin333/College4/blob/master/T4.ipynb - 9th October, 2020 - Named Entity Recognition and Domain-specific Jargon
https://github.com/allenalvin333/College4/blob/master/T5.ipynb - 16th October, 2020 - Regex parser and Chunking
https://githubd.com/allenalvin333/College4/blob/master/T6.ipynb
https://github.com/allenalvin333/CollegeP4
https://github.com/allenalvin333/College4/blob/master/DA1.ipynb
SpaCy is a new NLP library built to be quick, simplified and primed for development. It is not as generally known, so for a new-comer to program developer, it is a go-to tool. It is a repository for sophisticated natural language processing in Python and Cython. It is based on the newest technology and it has been developed since day one for use in tangible assets. It includes a pre-trained mathematical models and word vectors, and natively supports 49 + languages tokenization. SpaCy is made to enable those who perform practical work โ to create actual goods, or to gain specific insights. It is quick to update, and the API is clear and efficient. It excels in large-scale knowledge retrieval activities and is written from the ground up in carefully maintained memory of Cython. Independent study in 2015 has considered SpaCy to be the best in this field. If your program wants to handle whole database dumps, SpaCy is the library you want to use. As it integrates smoothly with TensorFlow, PyTorch, Scikit-learn, Gensim and the rest of Python's AI ecosystem, is essentially becomes the best way to train the document for deep learning. With SpaCy, you can conveniently create linguistically complex mathematical models for several NLP issues.
SpaCy, written in Cython, does not provide more than 50 versions of the solution for each problem, as NLTK does. In fact, SpaCy offers only the best approach to the challenge, thereby eliminating the issue of choosing the optimum route yourself and ensuring that the models designed are lean, medium, and effective. In addition to these, the design of the platform is already robust and new features are introduced constantly and with blazing quick efficiency, SpaCy offers a convincing NLP solution that is superior to the rest of the market. On a comparison, based on the information published in their official website they show how they are standing toe-to-toe to the other tools based on benchmarks based on Parse accuracy and NER accuracy algorithms which are detailed. But even on a real-world comparison between the given tools, we can identify that, despite its newer entry to the competition, it already acquired a major lead in the industry and is currently used in many applications.
Spacy is an open source library with multiple functionalities like Part of speech (POS) tagging, lemmatization, classification, Named Entity Relation (NER), Sentence Boundary detection (SBD), similarity, Rule based matching, serialization sentiment analysis, dependency parsing, word vectors, tokenization, etc. and according to an article in medium, among all the tools that are listed, spacy is the only tool without any specific drawback given under cons