Hello, I am currently developing a Japanese model and have been referencing the "ecir2

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

We attempted to train SPLADE based on the model found at <a href="https://huggingface.

Inquiry about Configuration Details for "ecir23-scratch-tydi-japanese-splade" Model about splade HOT 4 CLOSED

kuro96al commented on May 27, 2024

Inquiry about Configuration Details for "ecir23-scratch-tydi-japanese-splade" Model

from splade.

Comments (4)

carlos-lassance commented on May 27, 2024

Hi @kuro96al,

we pretrained the model from scratch using the japanese Mr.TyDi corpus (https://github.com/castorini/mr.tydi), we then trained with a contrastive loss using japanese MMARCO (https://github.com/unicamp-dl/mMARCO) and finally finetuned with the japanese Mr.TyDi train query set.

The model is based on a distilbert (6L, 768Hidden dims), but as said previously, the model is initialized randomly and then trained as described in the previous paragraph.

For more information here's a paper talking about the strategies we used to develop that model and what we were looking for: https://arxiv.org/pdf/2301.10444.pdf

from splade.

kuro96al commented on May 27, 2024

Thank you for your response. Is the pre-trained model uploaded on platforms like Hugging Face?

from splade.

kuro96al commented on May 27, 2024

We attempted to train SPLADE based on the model found at https://huggingface.co/line-corporation/line-distilbert-base-japanese/tree/main, but it seems that there were issues with the vocabulary that prevented successful training.

from splade.

carlos-lassance commented on May 27, 2024

Thank you for your response. Is the pre-trained model uploaded on platforms like Hugging Face?

Unfortunately it is not, not sure if we still have it...

We attempted to train SPLADE based on the model found at https://huggingface.co/line-corporation/line-distilbert-base-japanese/tree/main, but it seems that there were issues with the vocabulary that prevented successful training.

Yeah, we found similar problems with a ton of models, that's one of the reasons we went with training a model from scratch.

from splade.

Recommend Projects