Git Product home page Git Product logo

kg-2019-baseline's Introduction

kg-2019-baseline

2019年百度的三元组抽取比赛( http://lic2019.ccf.org.cn/kg ),一个baseline

注:正式版已经更新至 https://github.com/bojone/kg-2019

模型

用BiLSTM做联合标注,先预测subject,然后根据suject同时预测object和predicate,标注结构是“半指针-半标注”结构,以前也曾介绍过( https://kexue.fm/archives/5409

标注结构是自己设计的,我看了很多关系抽取的论文,没有发现类似的做法。所以,如果你基于此模型做出后的修改,最终获奖了或者发表paper什么的,烦请注明一下(其实也不是太奢望)

@misc{
  jianlin2019bdkg,
  title={Hybrid Structure of Pointer and Tagging for Relation Extraction: A Baseline},
  author={Jianlin Su},
  year={2019},
  publisher={GitHub},
  howpublished={\url{https://github.com/bojone/kg-2019-baseline}},
}

用法

python trans.py转换数据,python kg.py直接跑。

结果

5个epoch内dev集的F1应该就能到达0.71+了,最后基本上F1都能跑到0.72~0.73,自动保存F1最优的模型,有同学跑到过0.74甚至0.75的,我也表示很无辜,大家拼人品吧。反正都会比官方的baseline要高。

环境

Python 2.7 + Keras 2.2.4 + Tensorflow 1.8,其中关系最大的应该是Python 2.7了,如果你用Python 3,需要修改几行代码,至于修改哪几行,自己想办法,我不是你的debugger。

欢迎入坑Keras。人生苦短,我用Keras~

声明

欢迎测试、修改使用,但这是我比较早的模型,文件里边有些做法在我最新版已经被抛弃,所以以后如果发现有什么不合理的地方,不要怪我故意将大家引入歧途就行了。

欢迎跟我交流讨论,但请尽量交流一些有意义的问题,而不是debug。(如果Keras不熟悉,请先自学一个星期Keras。)

特别强调:baseline的初衷是供参赛选手测试使用,如果你已经错过了参赛日期,但想要训练数据,请自行想办法向主办方索取。我不负责提供数据下载服务。

链接

kg-2019-baseline's People

Contributors

bojone avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kg-2019-baseline's Issues

针对双实体多关系怎么办?

尊敬的苏神您好,
1.如果两个实体存在多种关系(比如NYT10这种数据集),那现在这种双标注是不是会崩,举例来说“诸葛亮和姜维的关系,既是老师也是朋友。”
2.拜读了您之前提出的“半指针-半标注“,有一点困惑,如果说半标注是指把答案的begin和end分开标注,半指针怎么理解,原先阅读理解不也是分开预测开始和结束么,是不是分开预测都可以叫半指针?
谢谢

模型几乎训练不出效果

你好!我是用你的模型对21条数据进行训练,损失在下降,但准确率和召回率没有一点反映,请问是怎么回事呢?是数据量太少了吗?

数据集事宜

苏神,您好:
我正在调试你的代码,发现没有相关的训练数据集,测试数据集。

比赛报名已结束,无法从竞赛网站上获取,请问能否提供相关的训练数据集,测试数据集下载地址。

多谢分享!

dataset

请问可以提供一下数据集吗

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.