It is a real-world dataset for cross-domain emotion distribution learning which was crawled from ChinaNews website. Each zipped file is a collection of news documents from a specific domain. The total numbers of domains and news articles are 39 and around 115000, respectively. In our experiments, four domains which contain top numbers of documents are selected: Economics_Channel (E), Culture_News (C), Law_News (L), and Society_News (S).
- Economics_Channel: 财经频道
- Culture_News: 文化新闻
- Law_News: 法治新闻
- Society_News: 社会新闻
- IT_News: IT新闻
- IT_Channel: IT频道
- Economics_Center: 财经中心
- Local_News: 地方新闻
- Legal_Channel: 法制频道
- Legal_News: 法制新闻
- Realty_Channel: 房产频道
- Realty_News: 房产新闻
- SAR_News: 港澳新闻
- Scroll_News: 滚动新闻
- International_News: 国际新闻
- Domestic_News: 国内新闻
- Oversea_Report: 海外华文报摘
- Chinese_News: 华人新闻
- Chinese_Report: 华文报摘
- Chinese_Education: 华文教育
- Healthy_News: 健康新闻
- Education_News: 教育新闻
- Financial_Channel: 金融频道
- Economics_News: 经济新闻
- Abroad_Life: 留学生活
- Energy_Channel: 能源频道
- Automobile_Channel: 汽车频道
- Automobile_News: 汽车新闻
- Overseas_Chinese: 侨乡传真
- Life_Channel: 生活频道
- Life_News: 生活新闻
- The_world_Expo_News: 世博会
- Taiwan_News: **新闻
- Sports_News: 体育新闻
- Photo_News: 图片新闻
- Securities_Channel: 证券频道
- Oversea_News: **侨界
- World_Expo: 中新世博
- Other_News: 其他新闻
- "moved": 感动
- "sympathy": 同情
- "boring": 无聊
- "anger": 愤怒
- "funny": 搞笑
- "sad": 难过
- "delighted": 高兴
- "not-interested": 路过
URL: XXXX
Title: XXXX
Category: XXXX
Time: XXXX年XX月XX日
News content: XXXXXX
Total: XX
感动: X
同情: X
无聊: X
愤怒: X
搞笑: X
难过: X
高兴: X
路过: X