Git Product home page Git Product logo

news-emotion's Introduction

Hi there 👋

  • 🔭 I’m currently working for Bytedance
  • 🌱 I’m currently learning System & Algorithm Design
  • 📫 How to reach me: yuanxin.me
  • 💬 Ask me about Serverless/Cloud/Frontend

news-emotion's People

Contributors

dongyuanxin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

news-emotion's Issues

TF-IDF时间复杂度如何降低

求助:希望提供关于TF-IDF的时间复杂度降低的解决办法。

关于TF-IDF模型的实现在这里:news-emotion/operate_data.pywords2vec方法中。

可以清晰的看到,实现的代码中和其他方法相比,多了一个循环,时间复杂度变成原来的N倍。
由于目前没有相应的集群供我们使用,并且服务器跑1000个训练样本也很慢,所以暂时先取消tf-idf这中词向量的尝试,之后会再重新补上

tf-idf

关于训练样本的说明

不少朋友Email我询问训练样本的事情,这里统一说明一下。

  1. 来源:wisenews网站。
  2. 分类:属于港股的新闻,数据库目前有80w+的新闻文本。
    database
  3. 训练样本:从以上的80w+的新闻文本中挑选出的最新的1000条新闻,人工打标后交给模型训练。

由于项目需要,所以在公开的仓库没有上传打标的文本,之后会考虑上传训练用的全部文本,供同好使用。

关于准确率的疑问

在不过拟合的前提下,相信样本的打标的准确率是大家最期待的结果。那么,这里公布一下的1000个打标的数据模型,在留一验证后的准确率。

二分类

只是将新闻打标分为正极和负极,各路论文的常见分类。
two-tag

三分类

将新闻文本分为正极、负极和中性三个类别。基本上,所有论文都尽力规避中性分类,但是,在现实中确是存在的问题。当然,在打标上,中性分类的标注也需要斟酌。就目前结果来看,三分类的效果可以接受。
three-tag

一些说明

由于一些问题,这里先取消了tf-idfsvm及相关模型的组合,具体原因请移步bug Issues查看。(上面结果中,会有一行一列均为0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.