Scratching the web for information :)
- bv_extract.py contains code for crawling Bharatavani website (website for dictionaries for Indian languages).
- ntm_translators_db_crawler.py contains code for crawling National Translation Mission website for list of translators (needs to be improved).
- align_corpus.py is a simple code for aligning two sentences to make parallel corpus (needs to be improved).
- crawler.py is very basic crawler with BeautifulSoup for data parsing.
- extract_data.py contains code for extracing data between any two html tags and arranges the data in specific manner.
- fill_form.py contains code for submitting data and clicking java script buttons with selenium tool (I used this for getting News paper articles from news paper called "Sakshi").
- new_crawler.py is used for crawling PMModi website for informaiton :)
- shabdkosh_eng_tel_crawler.py contains code for crawling Shabdkosh website (another website for dictionaries for Indian languages). Implemented with selenium (because the website contains java script enabled buttons).