Git Product home page Git Product logo

eda_nlp_for_chinese's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eda_nlp_for_chinese's Issues

没有停用词库呀

No such file or directory: 'stopwords/HIT_stop_words.txt',还要自己去下载吗

句子的原始类别标签是有效的

“实验结果就是,增强句子的隐藏空间表征紧紧环绕在这些原始句子的周围。作者的结论是,句子中有多个单词被改变了,那么句子的原始标签类别就可能无效了。” 表征紧紧的在原表征周围的话,那句子不是应该语意接近吗。那么句子的原始类别标签是有效的哇

about alpha?

I see your script is python code/augment.py --input=train.txt --output=train_augmented.txt --num_aug=16 --alpha=0.05 to share an alpha value.
But I want to set different alpha. What should I do

alpha_sr=alpha, alpha_ri=alpha, alpha_rs=alpha, alpha_rd=alpha

可以不使用标签吗

我看到数据前面有01标签,我只想得到扩充的数据,用于机器翻译,是否可以不适用标签,或者直接用0123456顺序号

使用原有数据测试时报错IndexError: list index out of range

正在使用EDA生成增强语句...
Traceback (most recent call last):
File "C:\Users\HP-OMEN\Desktop\project\code\EDA_NLP_for_Chinese-master\EDA_NLP_for_Chinese-master\code\augment.py", line 54, in
gen_eda(args.input, output, alpha=alpha, num_aug=num_aug)
File "C:\Users\HP-OMEN\Desktop\project\code\EDA_NLP_for_Chinese-master\EDA_NLP_for_Chinese-master\code\augment.py", line 44, in gen_eda
sentence = parts[1]
IndexError: list index out of range

生成的output.txt文件内容为:
0 今天天气 很棒 哦 。
0 今天天气 不错 哦 。
0 哟 不错 哦 。
0 喔 不错 哦 。
0 今天天气 哈哈哈 不错 哦 。
0 今天天气 不错 吧 哦 。
0 今天天气 不错 哦
0 今天天气 不错 哦
0 今天天气 不错 。 哦
0 yoi 不错 哦 。
0 今天天气 不错 哦 呵呵 。
0 今儿个 今天天气 不错 哦 。
0 今天天气 很棒 哦 。
0 。 不错 哦 今天天气
0 今天天气 不错 哦 。
0 今天天气 不错 。 哦
0 今天天气 不错 哦 。

关于n的问题

n_sr = max(1, int(alpha_sr * num_words))
n_ri = max(1, int(alpha_ri * num_words))
n_rs = max(1, int(alpha_rs * num_words))

请问为啥要和1比呀?这样一来替换、删除或插入n最多只能变一个?(不知道我理解错没,还望指正!

您好,label作用

您好,我也看完了9个issue,还有里面eda的代码。
但是我看到label除了读进来,写入。。没有发现他对rs等有什么具体的意义。
如果我理解错了,希望您能告诉我

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.