Comments (10)
Hi @GeorgeS2019
You could use Seq2SeqSharp to train GPT-x models only if you have training data set for it. They are all Transformer based model and data set is masked text.
from seq2seqsharp.
Hi @axel578 ,
In the demo and release package, SNT is data set for training and test rather than vocab file.
For vocab file, it could be either generated from SNT file or use external files as vocab files.
In vocab file, one token per line and each line has two parts: [token] \t [weights]
[weight] could be any value you want. Seq2SeqSharp doesn't use these [weight] for now.
Thanks
Zhongkai Fu
from seq2seqsharp.
Thanks for the answer !
I'd like to know what are the two src and target model in fiction text generation (enuSpm.model)
.\bin\Seq2SeqConsole\Seq2SeqConsole.exe -Task Test -ModelFilePath .\model\seq2seq_fiction.model -InputTestFile .\data\test\test_fiction.txt -OutputPromptFile .\data\test\test_fiction.txt -OutputFile out_fiction.txt -MaxTestSrcSentLength 256 -MaxTestTgtSentLength 512 -ProcessorType CPU -SrcSentencePieceModelPath .\spm\enuSpm.model -TgtSentencePieceModelPath .\spm\enuSpm.model -BeamSearchSize 1 -DeviceIds 0,1,2,3 -DecodingStrategy Sampling -DecodingRepeatPenalty 10
from seq2seqsharp.
For this command line, "test_fiction.txt" is input file. It's used as input for encoder, and prompt for decoder. "out_fiction.txt" is output file generated by decoder.
from seq2seqsharp.
Different Text Generation Strategy: ArgMax, Beam Search, Top-P Sampling
Just curious, how is this language generation implemented similar or dissimilar to e.g. GPT-x ?
from seq2seqsharp.
Yes but what is .\spm\enuSpm.model for ? is it for the vocabulary, because on my scenario the vocabulary is a bunch of code different than the usual vocabulary
from seq2seqsharp.
enuSpm.model is for SentencePiece to encode/decode for subword level tokens. Seq2SeqSharp can directly call APIs in SentencePiece for subword level encoding and decoding. SentencePiece has its own vocabulary in subword level, and it's different with your vocabulary.
I don't think you need to care about it, because with parameters "-SrcSentencePieceModelPath" and -TgtSentencePieceModelPath", Seq2SeqSharp can automatically encode word in your vocabulary to subword in model vocabulary, and decode subword back to word. With these two parameters, if you don't have vocabulary in subword level, you can set "SrcVocab" and "TgtVocab" to empty, and ask Seq2SeqSharp to generate vocabulary from training set. For inference, model itself already includes vocabulary.
from seq2seqsharp.
You could use Seq2SeqSharp to train GPT-x models only if you have training data set for it
Is importing trained weights from e.g. GPT2 .onnx into Seq2SeqSharp model to avoid training is still part of a long term plan?
from seq2seqsharp.
You could use Seq2SeqSharp to train GPT-x models only if you have training data set for it
Is importing trained weights from e.g. GPT2 .onnx into Seq2SeqSharp model to avoid training is still part of a long term plan?
You could use Seq2SeqSharp to train GPT-x models only if you have training data set for it
Is importing trained weights from e.g. GPT2 .onnx into Seq2SeqSharp model to avoid training is still part of a long term plan?
y... it's still a long-term plan, but I don't have specific timeline for it.
I actually already chatted with ONNX runtime team a last year, and operator translation between Seq2SeqSharp and ONNX is pretty straightforward, but this is not my urgent task for Seq2SeqSharp for now, because Seq2SeqSharp already supports large model training and fine-tuning for my daily works, and my work is not based on GPT-X models.
Thanks
Zhongkai Fu
from seq2seqsharp.
I know there is a need for modular and reusable, but it's not high priority and urgent for me right now. This is the reason why I say it's a long-term plan.
from seq2seqsharp.
Related Issues (20)
- Didn't save the model? HOT 7
- Error: C# 8.0 language feature HOT 1
- sentencepiece.dll problem in the API HOT 2
- SeqClassification Validation HOT 16
- Exception: 'The weight '.LayerNorm' has been released, you cannot access it.' HOT 10
- CPU_MKL Error converting value "CPU_MKL" to type 'Seq2SeqSharp.ProcessorTypeEnums HOT 6
- sqc.m_srcEmbedding_p.GetNetworkOnDevice(k).GetWeightAt() HOT 1
- GPTconsole HOT 4
- Target vocabulary size fixed to 45000 HOT 5
- Contextual embeddings HOT 22
- Train with general sequences of symbols HOT 2
- Moment of updating weights HOT 4
- Issues to get started with "Seq2SeqClassificationConsole" HOT 33
- Matrix initialization method HOT 4
- No requirement.txt in this repo HOT 2
- Sudden high increase in memory consumption while training a seq2seq model and validation happens HOT 11
- Setting FocalLossGamma = 2 causes weight corruption in the beginning of the seq2seq model training
- The checkpoint to save the model regularly should not depend on validation HOT 1
- Serialization of Seq2seq model is wrong
- SeqLabel model backward compatibility is broken by latest update HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seq2seqsharp.