NLP-basic-handson
Basic NLP Hands-on. Data Cleaning, Pre-processing, Tokenization, Vectorization using NLTK and sklearn library.
-
Write a program to input three sentences from user and creates the corpus Example: Let’s say these 3 are your strings:S1=” India won the match” S2=” England won the cricket match” S3=” Australia won the final match” Then Corpus (list of union of all words from all strings) is: [India, England, Australia, won, the, match, cricket, final]
-
Write a program to input three sentences from user and convert them into vectors. Use presence and absence of words to build the vectors.
-
Write a program to enter 3 strings from a user and vectorise them on basis of their counts.
-
Write a program to input 3 strings but vectorise them using TF-IDF (Term Frequency and Inverse Document Frequency) and print the strings along with the vectors.