Git Product home page Git Product logo

Comments (10)

zhongkaifu avatar zhongkaifu commented on May 18, 2024 1

Hi @GeorgeS2019

You could use Seq2SeqSharp to train GPT-x models only if you have training data set for it. They are all Transformer based model and data set is masked text.

from seq2seqsharp.

zhongkaifu avatar zhongkaifu commented on May 18, 2024

Hi @axel578 ,

In the demo and release package, SNT is data set for training and test rather than vocab file.

For vocab file, it could be either generated from SNT file or use external files as vocab files.

In vocab file, one token per line and each line has two parts: [token] \t [weights]
[weight] could be any value you want. Seq2SeqSharp doesn't use these [weight] for now.

Thanks
Zhongkai Fu

from seq2seqsharp.

axel578 avatar axel578 commented on May 18, 2024

Thanks for the answer !

I'd like to know what are the two src and target model in fiction text generation (enuSpm.model)

.\bin\Seq2SeqConsole\Seq2SeqConsole.exe -Task Test -ModelFilePath .\model\seq2seq_fiction.model -InputTestFile .\data\test\test_fiction.txt -OutputPromptFile .\data\test\test_fiction.txt -OutputFile out_fiction.txt -MaxTestSrcSentLength 256 -MaxTestTgtSentLength 512 -ProcessorType CPU -SrcSentencePieceModelPath .\spm\enuSpm.model -TgtSentencePieceModelPath .\spm\enuSpm.model -BeamSearchSize 1 -DeviceIds 0,1,2,3 -DecodingStrategy Sampling -DecodingRepeatPenalty 10

from seq2seqsharp.

zhongkaifu avatar zhongkaifu commented on May 18, 2024

For this command line, "test_fiction.txt" is input file. It's used as input for encoder, and prompt for decoder. "out_fiction.txt" is output file generated by decoder.

from seq2seqsharp.

GeorgeS2019 avatar GeorgeS2019 commented on May 18, 2024

@zhongkaifu

Different Text Generation Strategy: ArgMax, Beam Search, Top-P Sampling

Just curious, how is this language generation implemented similar or dissimilar to e.g. GPT-x ?

from seq2seqsharp.

axel578 avatar axel578 commented on May 18, 2024

Yes but what is .\spm\enuSpm.model for ? is it for the vocabulary, because on my scenario the vocabulary is a bunch of code different than the usual vocabulary

from seq2seqsharp.

zhongkaifu avatar zhongkaifu commented on May 18, 2024

enuSpm.model is for SentencePiece to encode/decode for subword level tokens. Seq2SeqSharp can directly call APIs in SentencePiece for subword level encoding and decoding. SentencePiece has its own vocabulary in subword level, and it's different with your vocabulary.

I don't think you need to care about it, because with parameters "-SrcSentencePieceModelPath" and -TgtSentencePieceModelPath", Seq2SeqSharp can automatically encode word in your vocabulary to subword in model vocabulary, and decode subword back to word. With these two parameters, if you don't have vocabulary in subword level, you can set "SrcVocab" and "TgtVocab" to empty, and ask Seq2SeqSharp to generate vocabulary from training set. For inference, model itself already includes vocabulary.

from seq2seqsharp.

GeorgeS2019 avatar GeorgeS2019 commented on May 18, 2024

@zhongkaifu

You could use Seq2SeqSharp to train GPT-x models only if you have training data set for it

Is importing trained weights from e.g. GPT2 .onnx into Seq2SeqSharp model to avoid training is still part of a long term plan?

from seq2seqsharp.

zhongkaifu avatar zhongkaifu commented on May 18, 2024

@zhongkaifu

You could use Seq2SeqSharp to train GPT-x models only if you have training data set for it

Is importing trained weights from e.g. GPT2 .onnx into Seq2SeqSharp model to avoid training is still part of a long term plan?

@zhongkaifu

You could use Seq2SeqSharp to train GPT-x models only if you have training data set for it

Is importing trained weights from e.g. GPT2 .onnx into Seq2SeqSharp model to avoid training is still part of a long term plan?

y... it's still a long-term plan, but I don't have specific timeline for it.

I actually already chatted with ONNX runtime team a last year, and operator translation between Seq2SeqSharp and ONNX is pretty straightforward, but this is not my urgent task for Seq2SeqSharp for now, because Seq2SeqSharp already supports large model training and fine-tuning for my daily works, and my work is not based on GPT-X models.

Thanks
Zhongkai Fu

from seq2seqsharp.

zhongkaifu avatar zhongkaifu commented on May 18, 2024

I know there is a need for modular and reusable, but it's not high priority and urgent for me right now. This is the reason why I say it's a long-term plan.

from seq2seqsharp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.