zhongkaifu / seq2seqsharp Goto Github PK

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.

License: Other

C# 98.97% C++ 0.49% C 0.05% Batchfile 0.01% HTML 0.34% CSS 0.01% JavaScript 0.01% Dockerfile 0.02% Python 0.12%

seq2seq encoder-decoder neural-network deep-learning lstm attention-model cuda gpu tensor machine-translation

seq2seqsharp's People

Contributors

Stargazers

Watchers

seq2seqsharp's Issues

Sequence Label task

Hi @zhongkaifu ,
Congratulations on the new lib. I'm very excited about it!

One question, do you think that it is viable to implement a Sequence Label task using Seq2SeqSharp
? Or RNNSharp is better suited for it?

Matrix initialization method

Hi, Zhongkaifu.

I would want to know what is the initialization method used for the matrices.

Thanks a lot.

FileNotFoundException

Hi!
I tried to run your program in cmd as followed:
Seq2SeqConsole.exe -TaskName Train -WordVectorSize 512 -HiddenSize 512 -SrcLang en -TgtLang de -TrainCorpusPath ./corpus -ArchType CPU

This raised an Exception:
Unhandled exception: System.IO.FileNotFoundException: The file or assembly "System.Numerics.Vectors, Version = 4.1.4.0, Culture = neutral, PublicKeyToken = b03f5f7f11d50a3a" or a dependency on it was not found. The system can not find the stated file.
at Seq2SeqConsole.Program.ShowOptions (String [] args, Options options)
at Seq2SeqConsole.Program.Main (String [] args) in D: \ Seq2SeqSharp_master \ Seq2SeqConsole \ Program.cs: line 67.

I created a new folder in the same directory as the exe, named corpus. In that folder i put two files named train01.de.snt and train01.en.snt which contain 40 sentences each.

Do you know what i did wrong?
Thanks in advance!

Could Seq2SeqSharp contribute to Tensorflow.NET NLU features in win-win way?

Suggestions

Currently Tensorlfow.NET lacks key Keras layers relevant for NLU (e.g. seq2seq and transformer).
For examples:
- Normalization layers
  - LayerNormalization
- Attention layers
  - MultiHeadAttention
  - Attention

LayerNormalization, MultiHeadAttention and Attention layers in C# (for example) have been implemented in Seq2SeqSharp

The questions:

Could these Seq2SeqSharp codes useful to speed up the implementation of NLU Keras layers in Tensorflow.NET?
Is there a need for Seq2SeqSharp to adopt Keras layer naming?
- This could help codes implemented in Tensorflow.NET to be relevant for Seq2SeqSharp and vice versa.
- This also make more developers aware of the transformer framework in Seq2SeqSharp.
- Hopefully more .NET developers could join and speed up transformer NLU development for the .NET community.

VisNNFile is not a valid parameter

Seq2SeqConsole.exe -Task VisualizeNetwork -VisNNFile Seq2SeqConsole.png -EncoderType Transformer -EncoderLayerDepth 2 -DecoderLayerDepth 2

VisNNFile is not a valid parameter

Is the Task VisualizeNetwork working?

==> found the network visualization codes. Is there a plan in future to bring them back?

sentencepiece.dll problem in the API

Dear Zhongkai,

We have a problem running the API. We are running a sequence to sequence topology.

When the procedure Startup is fired, and then the statement "var srcSentPiece = new SentencePiece( Configuration[ "Seq2Seq:SrcSentencePieceModelPath", a library called sentencepiece is needed.

We put the sentencepiece.dll in the bin folder, but this doesn`t work.

The fact is that our model has not been trained with that tokenizer, so for us is useless.

Two questions:

why "SrcSentencePieceModelPath" and "TgtSentencePieceModelPath" are mandatory?
How can we load sentencepiece.dll in your API project?

`
public Startup( IConfiguration configuration )
{
int maxTestSrcSentLength;
int maxTestTgtSentLength;
ProcessorTypeEnums processorType;
string deviceIds;

        Configuration = configuration;

        if ( !Configuration[ "Seq2Seq:ModelFilePath" ].IsNullOrEmpty())
        {
            Logger.WriteLine( $"Loading Seq2Seq model '{Configuration[ "Seq2Seq:ModelFilePath" ]}'" );

            var modelFilePath = Configuration[ "Seq2Seq:ModelFilePath" ];
            maxTestSrcSentLength = Configuration[ "Seq2Seq:MaxSrcTokenSize" ].ToInt();
            maxTestTgtSentLength = Configuration[ "Seq2Seq:MaxTgtTokenSize" ].ToInt();
            processorType = Configuration[ "Seq2Seq:ProcessorType" ].ToEnum< ProcessorTypeEnums >();
            deviceIds = Configuration[ "Seq2Seq:DeviceIds" ];

            **var srcSentPiece = new SentencePiece( Configuration[ "Seq2Seq:SrcSentencePieceModelPath" ] );
            var tgtSentPiece = new SentencePiece( Configuration[ "Seq2Seq:TgtSentencePieceModelPath" ] );**

            Seq2SeqInstance.Initialization( modelFilePath, maxTestSrcSentLength, maxTestTgtSentLength, 
                                            processorType, deviceIds, (srcSentPiece, tgtSentPiece) );
        }

`
Many thanks

CPU_MKL Error converting value "CPU_MKL" to type 'Seq2SeqSharp.ProcessorTypeEnums

Hi zhongkai:

If we understand well, to use MKL we need to do:

copy files in dll folder to our current working directory (the place with the console exe?)
set ProcessorType to CPU_MKL.

The problem is that we get a "Error converting value "CPU_MKL" to type 'Seq2SeqSharp.ProcessorTypeEnums".

Is is something wrong with our steps?

Thanks a lot

CS1003 Syntax error, ',' expected TensorSharp D:\Seq2SeqSharp-master (1)\Seq2SeqSharp-master\TensorSharp\Cpu\CpuRandom.cs

Hello Zhong
First I appreciate the contribution,
Previously I have used RNNSharp successfully for NER task, I am curious to apply this new version too, however, when I try to run it generates huge error list e.g
"IntPtr is a type but is used like a variable" especially in TensorSharp project in the following line of code
using (NativeWrapper.BuildTensorRefPtr(input_i, out IntPtr input_iPtr))
using (NativeWrapper.BuildTensorRefPtr(output_i, out IntPtr output_iPtr))
using (NativeWrapper.BuildTensorRefPtr(indices_i, out IntPtr indices_iPtr))
{
CpuOpsNative.TS_SpatialMaxPooling_updateOutput_frame(input_iPtr, output_iPtr, indices_iPtr,
Thanks in Advance

ManagedCuda.CudaException: ErrorOutOfMemory: The API call failed because it was unable to allocate enough memory to perform the requested operation

Can you share your experience, perhaps in order of which parameters to adjust, that would limit the change of not running out of GPU memory.

Below is one use case which leads to the problem.
Seq2SeqConsole.exe -TaskName Train -WordVectorSize 512 -HiddenSize 512 -StartLearningRate 0.002 -EncoderLayerDepth 6 -DecoderLayerDepth 2 -TrainCorpusPath "../../corpus" -ModelFilePath "../../trainings/seq2seq512.model" -SrcLang chs -TgtLang enu -ProcessorType GPU -DeviceIds 0 -MaxEpochNum 2 -EncoderType Transformer -BatchSize 32 -MultiHeadNum 8 -MaxSentLength 128

Post Inference APIs or functions reusability

Consider provision of Seq2SeqSharp Post Inference functions as APIs or externally callable methods.

Currently Seq2Seqsharp has one of the most complete selection of commonly use .NET Transformer post Inference functions

e.g. Text Generation Strategy: ArgMax, Beam Search, Top-P Sampling.

The .NET community need these functions to work with different ML frameworks: e.g. .NET Onnx.Runtime, ML.NET, TorchSharp etc.

Consider this as one of the first steps towards the standardization of Seq2SeqSharp model import, internal representation and export that is consistent with ONNX.

There is on going discussion on this topics.

In other words, we need these functions to support reusability beyond Seq2SeqSharp by extending to other ML frameworks.

FYI: this is related to many requests discussed before

Ultimately, it is time to accelerate the democratization of .NET Deep NLP, in some where pioneered by Seq2SeqSharp

What is SNT file and how to create a new file

Hello !

I would like to know if it would be possible to create a new vocab snt file, I looked at this file in notepad but I'm note sure it's for vocabulary.

For my case I want to generate text which is some sort of xml file with a limited number of token.
Every snt file available aren't suited to what I'm trying to do.

I would like to know if it's possible in any way to generate a new snt file based on a custom vocab

Monitor performance on some held-out set

Hi,
I found this project very useful when building my seq2seq models. I am wondering whether it is easy to implement reporting the performance on a held-out set (validation set) during the training? What would be the place in the code to look at?
This would allow to do the best model selection.
Thanks!

Chinese to English Translation model?

Do you have a plan to release a Chinese to English translation model?

GPU train error

Hello, the picture below shows that I used GPU for training, but the loading error occurred.

writing embedding matrix

Hi again:

As you know, we are researching about the effect of pretrained embedding matrix.
For this purpose, we are writing [SeqClassification].m_srcEmbedding (the embedding matrix) before and after training, while"IsEmbeddingTrainable" is set to "true".

The fact is that the embedding matrix before and after looks the same. Maybe I misunderstand something. should m_srcEmbedding change due to that the parameter "IsEmbeddingTrainable" is set to "true"?
We are thinking about the case, but maybe you can advise us.

Thanks a lot

No handlers have been registered for op relud

Can you fix 'No handlers have been registered for op relud' ?
i use argument Seq2SeqConsole.exe .... -ProcessorType CPU ...

Which are the right parameters in the configuration file if you want to use pretrained embbeddings?

I am starting to use SeqClassification. I have sentences in the source file and a category in the target file.
My question is, which are the right parameters in the configuration file if you want to use pretrained embbeddings? I have a txt2vec pretrained model and I want to test its performance.

I have "SrcEmbedding":"model.bin" but I guest some more parameters are needed to indicate to the exe it has to use them.

Thanks a lot

Sample data files? Demo?

Thanks for sharing the code! Are you planning adding sample data files and/or input/output data formats?

Thanks a lot!

Training data format, template features and NER

Dear zhongkaifu,

Thanks for delivering this high-quality library for deep-learning tasks. I was wondering whether you could support the same training data format used in RNNSharp, where each column represented a given feature/label, instead of using separate files for features.
Also, are template features supported in SeqLabel? An example or explanation using SeqLabel for NER would be much appreciated...

Thanks in advance,
Nicolás

SeqSimilarityConsole

Hello, thank you very much for your constant efforts. Could you please provide an example of "SeqSimilarityConsole" ?

Train with general sequences of symbols

Hello

It's not quite obvious for me from the documentation or the examples if Seq2SeqSharp is capable of handling generic sequences that are not necessarily language words, and if yes, how can achieve this. (I.e. tokens may not be separated by space)

Thanks

Is there a need to integrate Byte-pair encodings (BPE)?

Byte-pair encodings (BPE) are now very commonly used in NLP.

Is there a plan in future to integrate BPE in Sep2SeqSharp?

If so, will that be a c# wrapper (e.g. swift wrapper) around e.g. FastBPE.

Would you consider a pure C# version of e.g. FastBPE [ link to pure python FastBPE ]?

This issue is more a feature proposal. Looking forwards to get some feedback

SeqLabelConsole Sequence tag

Hello, I am learning this framework of yours, and I did not find the problem I met in the help document. I hope you can help me. My training text format is as follows:
世界 n n 1_n
第 m m 1_a
八 m m -1_m
大 a a 1_n
奇迹 n n 1_v
出现 v v 0_Root

<unk>in vocab and embedding when pretrained word representations

Hi zhongkai:

Is it necesary to have an symbol in the vocab file and its corresponed vector row in the embedding matrix?

I want to know how seq2seqSharp deal with the words in a sequence that are no represented in the vocabulary and hence has no vector in the embedding matrix. Are they automatically homologated with and then, the vector are retrieved to the training?

Thanks a lot

Guille

Source word 'mama' is UNK

Resolved:
Some old Model was in the same location => incr. training without new vocab!

Question on Named Entity Recognition

Hi,
Would it be possible to build a named entity recognition classifier using this project? I saw a 3yrs old post where you replied it was not possible and then a second one posted a year later where you informed that CRF was implemented.

I am looking for a .net core based tool for training and runing NER tasks.

Thank you in advance.
Regards,
Sebastian

Request for more challenging Transformer architecture use cases through a better performance tokenizer .NET library

Seq2SeqSharp is a valid alternative option for .NET Transformer architecture solution.

It seems with a cross platform .NET tokenizer library, especially with better performance than those provided through python library, this will make it less challenging for Seq2SeqSharp to explore other Transformer architecture real world End-To-End examples such as e.g. GPT2, BERT etc.

Raising this issue to promote user here to share their feedback for a concerting effort towards such .NET tokenization library.

Error: C# 8.0 language feature

I am trying to load the .sln and have .Net Core and VS 2019 installed, I have adjusted targets in all .csproj file to <TargetFramework>net4.0</TargetFramework>

But in some files (eg. MoEFeedForward.cs line #62) there are using statements that are not compatible with the earlier frameworks, yet it is mentioned directly in the project description that earlier .net versions are supported. Can I use Seq2SeqSharp with .net 4?

Endless computation with CUDA

I installed CUDA 10.0 and built ManagedCUDA (x64, Release) dll libraries.

Before I started training, I copied several files into the same directory as "Seq2SeqConsole.exe":

CudaBlas.dll
ManagedCuda.dll
nvrtc64_100.dll (from my CUDA installation)
nvrtc64-builtins64_100.dll (from my CUDA installation)

I start training with this command:
/Seq2SeqConsole.exe -TaskName train -WordVectorSize 50 -HiddenSize 50 -LearningRate 0.1 -ModelFilePath alarm.model -SrcVocab data_vocab.source -TgtVocab data_vocab.target -SrcLang en -TgtLang lf -TrainCorpusPath ~/Downloads -ArchType 0 -Depth 1

The training starts and prints the following:

info,2/11/2019 11:17:18 AM Command Line = '-TaskName train -WordVectorSize 50 -HiddenSize 50 -LearningRate 0.1 -ModelFilePath alarm.model -SrcVocab data_vocab.source -TgtVocab data_vocab.target -SrcLang en -TgtLang lf -TrainCorpusPath C:/Users/vlad/Downloads -ArchType 0 -Depth 1'
info,2/11/2019 11:17:18 AM Source Language = 'en'
info,2/11/2019 11:17:18 AM Target Language = 'lf'
info,2/11/2019 11:17:18 AM SSE Enable = 'True'
info,2/11/2019 11:17:18 AM SSE Size = '256'
info,2/11/2019 11:17:18 AM Processor counter = '8'
info,2/11/2019 11:17:18 AM Hidden Size = '50'
info,2/11/2019 11:17:18 AM Word Vector Size = '50'
info,2/11/2019 11:17:18 AM Learning Rate = '0.1'
info,2/11/2019 11:17:18 AM Network Layer = '1'
info,2/11/2019 11:17:18 AM Gradient Clip = '5'
info,2/11/2019 11:17:18 AM Dropout Ratio = '0.1'
info,2/11/2019 11:17:18 AM Batch Size = '1'
info,2/11/2019 11:17:18 AM Arch Type = 'GPU_CUDA'
info,2/11/2019 11:17:18 AM Device Ids = '0'
info,2/11/2019 11:17:18 AM Loading model from 'alarm.model'...
info,2/11/2019 11:17:18 AM Initialize device '0'
Precompiling GatherScatterKernels
Precompiling Im2ColKernels
Precompiling IndexSelectKernels
Precompiling ReduceDimIndexKernels
Precompiling CudaReduceAllKernels
Precompiling CudaReduceKernels
Precompiling ElementwiseKernels
Precompiling FillCopyKernels
Precompiling SoftmaxKernels
Precompiling SpatialMaxPoolKernels
Precompiling VarStdKernels
info,2/11/2019 11:24:37 AM Loading model from 'alarm.model'...
info,2/11/2019 11:24:37 AM Initializing weights...
info,2/11/2019 11:24:37 AM Initializing weights for device '0'
info,2/11/2019 11:24:37 AM Initializing encoders and decoders for device '0'...
info,2/11/2019 11:24:37 AM Start to train...
info,2/11/2019 11:24:37 AM Shuffling training corpus...
info,2/11/2019 11:24:37 AM Base learning rate is '0.1' at epoch '0'
info,2/11/2019 11:24:37 AM Cleaning cache of weights optmiazation.'
info,2/11/2019 11:24:37 AM Start to process training corpus.
info,2/11/2019 11:24:37 AM Shuffling training corpus...

Then it gets stuck (it also took a while to do Precompiling steps). One CPU is loaded 100% and 29GB (!) of RAM is used! My system has 64GB of RAM so the RAM appears not to be the issue.

Do you have an idea what is going on? May be I missed some important step?

Note: Training using CPU only works just fine.

Thank you

Didn't save the model?

Here is my configuration file:
{
"Task":"Train",
"EmbeddingDim":512,
"HiddenSize":512,
"StartLearningRate":0.001,
"WeightsUpdateCount":0,
"EncoderLayerDepth":6,
"DecoderLayerDepth":6,
"ModelFilePath":"D:\GPU\cutwords\model\seq_cut_word.model",
"SrcVocab":null,
"TgtVocab":null,
"SrcVocabSize":300000,
"TgtVocabSize":300000,
"SharedEmbeddings":false,
"SrcEmbeddingModelFilePath":null,
"TgtEmbeddingModelFilePath":null,
"TrainCorpusPath":".\data\train\cut\train.conll.txt",
"ValidCorpusPaths":null,
"InputTestFile":null,
"OutputTestFile":null,
"ShuffleType":"NoPadding",
"ShuffleBlockSize":-1,
"GradClip":5.0,
"BatchSize":256,
"ValBatchSize":128,
"DropoutRatio":0,
"ProcessorType":"GPU",
"EncoderType":"Transformer",
"MultiHeadNum":8,
"DeviceIds":"0",
"BeamSearchSize":1,
"MaxEpochNum":100,
"MaxTrainSentLength":10000,
"MaxTestSentLength":10000,
"WarmUpSteps":8000,
"VisualizeNNFilePath":null,
"Beta1":0.9,
"Beta2":0.98,
"ValidIntervalHours":1.0,
"EnableCoverageModel":false,
"CompilerOptions":"",
"Optimizer":"Adam"
}
The model was not saved after the training。

Issues to get started with "Seq2SeqClassificationConsole"

I have issues to get started with setting up a demo for the Seq2SeqClassificationConsole app.
I assume as a NewBe I do not setup the training data correctly. Can you please point me the way to setup a demo?

I setup two folders for Traning and Validation

In the train folder I placed 2 training files
Train01.CLS.snt
Train01.SRC.snt

In the validate folder I placed 2 training files
Validate01.CLS.snt
Validate01.SRC.snt

This are my command line settings:
-Task Train -TrainCorpusPath .\Train -ValidCorpusPaths .\Valid -TgtLang CLS -SrcLang SRC -ProcessorType CPU -DecoderType Transformer -EncoderLayerDepth 6

Error in method BuildVocabs at line "Vocab tgtVocab = tgtVocabs[1];" because tgtVocabs with Index 1 does not exist.

public (Vocab, Vocab, Vocab) BuildVocabs(int srcVocabSize = 45000, int tgtVocabSize = 45000, bool sharedVocab = false)
[...]
(var srcVocabs, var tgtVocabs) = CorpusBatch.GenerateVocabs(srcVocabSize, tgtVocabSize);

        Vocab srcVocab = srcVocabs[0];
        Vocab clsVocab = tgtVocabs[0];
        Vocab tgtVocab = tgtVocabs[1];  // Error position

Content of Train01.CLS.snt:
What should I do if I have a sore throat and a runny nose? [SEP] I feel sore in my throat after getting up in the morning, and I still have clear water in my nose. I measure my body temperature and I don’t have a fever. Have you caught a cold? What medicine should be taken.
How can I recuperate if my ankle is twisted? [SEP] I twisted my ankle when I went down the stairs, and now it is red and swollen. X-rays were taken and there were no fractures. May I ask how to recuperate to get better as soon as possible.
How to diagnose Alzheimer's Caregiving ? [SEP] Now that your family member or friend has received a diagnosis of Alzheimers disease, its important to learn as much as you can about the disease and how to care for someone who has it. You may also want to know the right way to share the news with family and friends.
What are the treatments for Alzheimer's Caregiving ? [SEP] Currently, no medication can cure Alzheimers disease, but four medicines are approved to treat the symptoms of the disease. - Aricept (donezepil)for all stages of Alzheimers - Exelon (rivastigmine)for mild to moderate Alzheimers - Razadyne (galantamine)--for mild to moderate Alzheimers - Namenda (memantine)for moderate to severe Alzheimers - Namzarec (memantine and donepezil)for moderate to severe Alzheimers Aricept (donezepil)for all stages of Alzheimers Exelon (rivastigmine)for mild to moderate Alzheimers Razadyne (galantamine)
How to diagnose Alzheimer's Caregiving ? [SEP] When you learn that someone has Alzheimers disease, you may wonder when and how to tell your family and friends. You may be worried about how others will react to or treat the person. Others often sense that something is wrong before they are told. Alzheimers disease is hard to keep secret. When the time seems right, be honest with family, friends, and others. Use this as a chance to educate them about Alzheimers disease. You can share information to help them understand what you and the person with Alzheimers are going through. You can also tell them what they can do to help.

Content of Train01.SRC.snt:
Otorhinolaryngology
Orthopedics
Alzheimer1
Alzheimer2
Alzheimer3

Whether incremental training is supported

Whether incremental training is supported？

TargetInvocationException and DllNotFoundException: DLL file 'CpuOps.dll' could not be loaded

Hi,

Everytime I open the project, press Debug and try to execute seqlabel to perform a seqlabel task, these two exceptions show up. Thus, I tried to build CpuOps beforehand, but there appeared 6 errors with the same error message:
Error C2039 'sqrt': is not a member of 'std' CpuOps C:\Users\niko_\Desktop\Seq2SeqSharp-master\CpuOps\LayerNormOps.h 90

Any help would be much appreciated.

Kind regards,
Nicolás

GPTconsole

Hi Zhongkai,

Defintetively, we are using your last release just putting some code lines in order to load embbedding from plain text. But, anyway, using your last version. All apps work properly in CPU, GPU and with MKL.

Nonnetheless, we are trying to work with your new GPTConsole to make an autorregresive (and in a self-supervised way) model and generate embeddings. The problem is that it seems not to work. Please, can you help us to make it work for teh first time? May be is something in our config...

These are our config and log:

config.txt
Seq2SeqConsole_Train_2023_02_17_13h_22m_56s.log

Thanks a lot

Consider using Dagre.NET to visualize and perhaps edit/export the transformer architecture

Is your feature request related to a problem? Please describe.
We need a way to visualize/edit/export (onnx) transformer architecture for .NET community.

Describe the solution you'd like
Dagre.NET and Dendrite could evolve to support this need. Seq2SeqSharp could export the architecture in Json format for import into Dendrite or use Dagre.NET to export images.

One unify seq2seq interop interface to backend - suggestion

It seems there is more recent initiative to speed up the completion of the Torch.Sharp binding interop to Torch C Api. There is now a seq2seq example implemented through Torch.Sharp.

If Seq2SeqSharp can adapt its backend to a more unifying interface that is compatible to the Torch.Sharp interop API to Torch C API, this will lead to a path with more complex Deep NLP seq2seq algorithm development.

I cannot open the zip files

Describe the bug
I cannot open the zip files you provided for data. I mean the Release Packages.

To Reproduce
They downloaded propertly but I cannot open them in my desktop. What I am doing wrong?

Thanks a lot. I appreciate your work.

Contextual embeddings

Hi zhongkai:

We want to write into a text file contextual embeddings in some time-stamps at test mode.

Something like:

Timestamp_5 to [.................contextualized vector....................]
Timestamp_6 the [.................contextualized vector....................]
Timestamp_7 Cat [.................contextualized vector....................]

We guess a contextualized vector of a input-word could be the hidden state in its timestamp or the output of this same timestamp. Do you think this last assertion is righ?

If right, What is the best way to do it?

Thanks a lot

Example for Seq2Seq for sequence-classification

I would love to use Seq2SeqSharp for Multi-intent classification for Chatbot.
Unfortunately I do not understand your examples offered here. I also can't find the test file.
Is there a c# demo project somewhere that explains this problem.

I must mention that my experiences with AI are just using Microsoft ML.Net.

thank you in advance

What are the next seq2seqSharp examples?

Here are the categories of HuggingFace transformer examples

It seems you have started to build more .NET seq2seqSharp transformer examples (e.g. Text Classification). Perhaps the HuggingFace example categories could guide what could be the next seq2seqSharp examples the .NET community needs :-)

Exception: 'The weight '.LayerNorm' has been released, you cannot access it.'

When we set the parameter "encoderType" to "BiLSTM" an exception arises:

'The weight '.LayerNorm' has been released, you cannot access it.'

In fact, when we use "Transformer" in both encoder and decoder, everything works fine. However, when we try to set the parameter to "AttentionLSTM" as decoder or ""BiLSTM" as encoder, the exception arises.

What does the exception mean?

Thanks a lot

Moment of updating weights

Hi Zhongkai:

We want to do the following:

In the moment that the weights of the embedding matrix are updating, we want to update other words not in the sentence. The criterium by which we update them is by similarity with the words impacted in the updating. That is, if a word (the part of the embedding matrix that represents it -the vector-) is updated, some other words are also updated proportionally (by a coefficient).
To do that, we need to evaluate the function or functions involved in the updating.
Tentatively, we call this mechanism “family updating”, and will be deployed, if we can and it works, in order to help to deal with the phenomena called “systematic compositionality” in rule acquisition (“stimulus poverty” phenomena), which is reported in some studies to be a problem. Something that is recently say by Chomsky in New York Times. https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html
We want to deal with it by two things:

An initialized “proto knowledge” in some part of embedding matrix.
A kind of mechanism as “family updating”.

If you are also interested in this kind of things, we can report you our first results and you can even participate in papers and white papers. Remember that we are interested in Cognitive aspects of the models, but also in applications to the resolution of technical problems. In any case, we understand that you are a busy person with important projects. Seq2seqSharp is one of them, that we really appreciate.

So first questions are:
At what moment the weighs are updated? And, In what moment is completed? We want to manipulate the embedding at that times.
We have explored some parts, but we want to ear your advises, if possible.

Thanks a lot for all.

Target vocabulary size fixed to 45000

Hi, Zhongkaifu.

I am trying to train a model with GPTConsole and no matter the amount of words there are in my corpus, the embedding matrix always has a dimension fixed to 45000. I have tried to control this by varying some parameters, such as "TgtVocabSize", but it changes nothing.
It seems as if 45000 is an upper limit. Is that the case?

Supported CUDA version

Hi,

Thanks for implementing many new features!

It appears that CUDA 8.0 is implicitly used. Is there a place to use different version? I have versions 9.0 and 9.2 installed in my system.

During runtime, how does application knows where the CUDA libraries reside? Are environment variables read or is this defined in the project?

Thank you very much!

Release not found: https://github.com/zhongkaifu/Seq2SeqSharp/releases/tag/20210125]

I tried to get the release https://github.com/zhongkaifu/Seq2SeqSharp/releases/tag/20210125

but got 404 code.

Please let me know where to get the release with medical data corpus and trained models.

I would like to test it and try to use it in our applications.

Your 2 samples of medical text classification are very cool.

Thanks
Yury

SeqClassification Validation

Dear Zhongkaifu:

I am trying to validate a model trained on a sequence-classification task, but when trying to execute the program, the following error appears:

"Task 'Valid' is not supported"

In the Usage: SeqClassificationConsole [parameters...] section that pops out after the error, 'Valid' appears as a proper value to the "Task" parameter so I do not understand why it does not work.

I hope you can shed some light on my issue.

Any plan to make the model format more "standard"?

It just happens that I set the .Model to be linked to Netron.

I understand that this is not a trivial work, however, I think as more and more users discover this work, it will no longer me who will feedback this :-)

Can some examples for training and evaluation of model be provided?

Hello,
Can you provide example input training files? In particular I was wondering how to use this for named entity recognition. How do we represent the tagged entities in the training data?

sqc.m_srcEmbedding_p.GetNetworkOnDevice(k).GetWeightAt()

Hi,
What is the way to do that:

sqc.m_srcEmbedding_p.GetNetworkOnDevice(k).GetWeightAt(auxLong)

sqc.m_srcEmbedding_p.GetNetworkOnDevice(k).SetWeightAt(new float(), auxLong)

being sqc

SeqClassification sqc

and

long[] auxLong = new long[2]

in the newest versions of seq2seq framework?

The purpuse is to set and get the embeddings vectors in an auxiliary fuction.

I see that SeqClassification has changed, doesn't it?

Thanks

A plan to export Seq2SeqSharp translation model to ONNX`?

Currently, both the ML.NET ( which can import ONNX) and OnnxRuntime C# API samples are missing machine translation.

If it is possible to export Seq2SeqSharp model to ONNX, than both ML.NET and Onnx community will benefit from that.

zhongkaifu / seq2seqsharp Goto Github PK

seq2seqsharp's People

Contributors

Stargazers

Watchers

Forkers

seq2seqsharp's Issues

Recommend Projects

Recommend Topics

Recommend Org