Git Product home page Git Product logo

Comments (5)

xxxxxthhh avatar xxxxxthhh commented on August 22, 2024 9

pkuseg:北京大学/生前/来/应聘
jieba:北京/大学生/前来/应聘

看了楼上的,pkuseg好像对'生前'有点执念...

from pkuseg-python.

yaleimeng avatar yaleimeng commented on August 22, 2024 8

测试结果:
print(seg.cut('买水果然后来世博园最后去世博会'))

['买', '水', '果然', '后来', '世博园', '最后', '去世', '博', '会']

print(seg.cut('欢迎新老师生前来就餐'))

['欢迎', '新', '老师', '生前', '来', '就', '餐']

from pkuseg-python.

mesgavale avatar mesgavale commented on August 22, 2024

weibo模型会表现更好一些

from pkuseg-python.

jingjingxupku avatar jingjingxupku commented on August 22, 2024

pkuseg:北京大学/生前/来/应聘
jieba:北京/大学生/前来/应聘

看了楼上的,pkuseg好像对'生前'有点执念...

您好,这个是由于使用的词典问题造成的,如果您不导入词典的话,我们的分词软件是可以分对北京/大学生/前来/应聘的,不过还是很感谢您提供的样例。

from pkuseg-python.

jingjingxupku avatar jingjingxupku commented on August 22, 2024

实测ctb8模型,与jieba默认模型对比稍差一些,建议出一些最佳实践的文档

print(seg.cut('工信处女干事每月经过下属科室都要亲口交代24口交换机等技术性器件的安装工作'))
['工信', '处女', '干事', '每', '月', '经过', '下属', '科室', '都', '要', '亲口', '交代', '24', '口', '交换机', '等', '技术性', '器件', '的', '安装', '工作']
print(", ".join(jieba.cut("工信处女干事每月经过下属科室都要亲口交代24口交换机等技术性器件的安装工作")))
工信处, 女干事, 每月, 经过, 下属, 科室, 都, 要, 亲口, 交代, 24, 口, 交换机, 等, 技术性, 器件, 的, 安装, 工作
print(seg.cut('贱狗奴,鸡巴套子把爸爸加上'))
['贱', '狗', '奴', ',', '鸡', '巴', '套子', '把', '爸爸', '加上']
print(", ".join(jieba.cut("贱狗奴,鸡巴套子把爸爸加上",)))
贱狗奴, ,, 鸡巴, 套子, 把, 爸爸, 加上

谢谢您提供的这些数据,这些样例其实另一个方面诠释了给不同领域数据选择不同预训练模型的重要性。您提供的这些样例都是正常的新闻领域预训练模型中遇到的很棘手的问题,所以我们也愿意尝试使用更多网络用语的语料提供一种专有的处理这种情况的预训练模型~

from pkuseg-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.