Saving the doc raises the warning. [W109] Unable to save

Worked! Thanks for the super speedy reply... <div class="snippet-clipboard-content

[W109] Unable to save user hooks while serializing the doc about spacy-stanza HOT 3 CLOSED

gustavengstrom commented on September 24, 2024

[W109] Unable to save user hooks while serializing the doc

from spacy-stanza.

Comments (3)

adrianeboyd commented on September 24, 2024

The user hooks are just for the vectors, if the stanza model provides pretrained word embeddings. This doesn't affect the lexeme attributes like is_punct.

To get the lexeme attributes, you need to reload the doc with the vocab for the correct language, instead of using Vocab(), which is blank and doesn't know anything about the English defaults. What you want instead:

new_nlp = spacy.blank("en")
doc = Doc(new_nlp.vocab).from_disk('filename')

In terms of the user hooks, there's a good chance you might not be using them at all? It's only the values you get for token.vector or doc.vector that would be affected.

Here's how they are added initially, if you need to redo this:

spacy-stanza/spacy_stanza/tokenizer.py

Lines 179 to 181 in a87c723

 if self.svecs is not None: 

 doc.user_token_hooks["vector"] = self.token_vector 

 doc.user_token_hooks["has_vector"] = self.token_has_vector

from spacy-stanza.

gustavengstrom commented on September 24, 2024

Worked! Thanks for the super speedy reply...

nlp = spacy_stanza.load_pipeline("en")
doc = Doc(nlp.vocab).from_disk('test.spacy')

from spacy-stanza.

adrianeboyd commented on September 24, 2024

Glad to hear it's working! (Loading the whole stanza pipeline might be a bit slow if all you're trying to do is reload docs. It's the same "en" with spacy.blank("en"), which is much faster.)

from spacy-stanza.

Recommend Projects

[W109] Unable to save user hooks while serializing the doc about spacy-stanza HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	if self.svecs is not None:
	doc.user_token_hooks["vector"] = self.token_vector
	doc.user_token_hooks["has_vector"] = self.token_has_vector