Shuangwubu Forbids Scarp multi-pages contents, I can't fix it right now
Create a logger
Execute the scrapy spider to scrap the Shangwubu website news
Read in data from different sources read() read_from_excel() read_db()
Preprocessing the read in dataframe
build a plsa model using gensim to extract some top topics
Generate the topic/keywords for a given piece of news
Find out if the futures are in the found new news keywords