I installed sentencepiece successfully in Ubuntu 14.04 64 bit. But w

I feel like it would be useful to have an option which would treat <code class="notran

Core Dumped during Saving model about sentencepiece HOT 5 CLOSED

google commented on May 5, 2024

Core Dumped during Saving model

from sentencepiece.

Comments (5)

taku910 commented on May 5, 2024 7

This error happens when the input training data is too small. Could you try spm_train with larger data?

spm_train first generates the minimum vocabulary set with

(almost) all unique characters in the training data.
Substrings that have no less than two times occurrences in the corpus. These substrings will not cross word boundary (whitespaces)

For example, if the input corpus is "a b c d a b c", the minimum vocabulary set is 3. (a b and c)

spm train fails when the size of minimum vocabulary set is smaller than --vocab_size.
I will make the error message more easy-to-understand.

from sentencepiece.

taku910 commented on May 5, 2024 1

I've added --hard_vocab_limit flag.
The vocab size is considered as "soft limit" when running ./spm_train --hard_vocab_limit=false.

Thank you.

from sentencepiece.

jonsafari commented on May 5, 2024

A similar thing happened to me on a 256 GB Debian 8 x64 machine with --vocab_size=16000, unigram.

from sentencepiece.

zachmayer commented on May 5, 2024

I got the same error, and fixed it by decreasing the vocab size. I think too big of a vocabulary can also generate issues when trying to save the final results.

from sentencepiece.

codeman38 commented on May 5, 2024

I feel like it would be useful to have an option which would treat vocab_size as an upper bound rather than a hard value. That is, if this option were enabled and vocab_size turns out to be too large, spm_train would fall back to the minimum vocabulary rather than aborting.

from sentencepiece.

Recommend Projects

Core Dumped during Saving model about sentencepiece HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent