Comments (6)
I am currently doing something similar. I plan to build a .net core app running onnxruntime + blingfire and then a unity client to send an HTTP request. The reason for the extra hoop is a concern for compatibility and computing power. The project's extra step is to run this in VR, ideally using stand-alone headsets.
I will, however, explore the option to run everything natively after I've dealt with the deadlines for this project.
Thanks again for your erudite responses! Much appreciated
from blingfire.
I found a solution via trial and error.
Case
Crash happens when model 'load' is run at the same time as the tokenizer load.
Solution
Let the model load fully, load Tokenizer only after some time, i.e. 1 second delay.
Strangely enough this works.
from blingfire.
Hey Darth-Carrotpie!
I am dealing with precisely the same issue as you are, but I can't really parse your solution.
This being my code:
And my runtime error message is the same.
Where exactly should I Invoke("wait", 1) for the error to go away? If by loading the model you mean gpt-2, I am not doing that yet.
from blingfire.
Hi @nicolastamm , sorry for late response. I hope it helps if I respond at least now ๐
My suggestion is to try it binding on a OnKeyDown and not running any loading tasks on Start(). BlingFire is not friendly on that regard, it often throws up something in memory.... and since Unity is handling stuff still when Awaks() and Start() are run, they probably collide somehow, not sure how thow, did not have time to delve that deep.
After spending half a year with BlingFire though, my humble suggestion would be not to use it at all ๐
The main reason why I moved away from it back to service approach, is that you'd eventually (when you need your own tokenization core) need to rebuild Blingfire from source yourself, because of how Tokenizers are coded there. Which sucks, cause they did not make it easy... ๐บ Also there's barely any support and help ๐ probably because community is virtually non-existent.
If It helps, here's some code snippets, ping me if you need more:
BlingTokenizer.cs
The main script which handles loading.
using System;
using System.IO;
using System.Linq;
using BlingFire;
using UnityEngine;
public class BlingTokenizer : Singleton<BlingTokenizer> {
ulong tokenizerHandle = 0;
ulong tokenizerI2WHandle = 0;
bool isLoaded = false;
public static ulong GetTokenizerHandle() {
return Instance.tokenizerHandle;
}
public static ulong GetTokenizerI2Handle() {
return Instance.tokenizerI2WHandle;
}
public static void Load() {
if (Instance.isLoaded)return;
string tokenizerModelPath = Instance.GetPath("gpt2.bin");
string tokenizerI2WPath = Instance.GetPath("gpt2.i2w");
if (File.Exists(tokenizerModelPath)) {
try {
Instance.tokenizerHandle = BlingFireUtils.LoadModel(tokenizerModelPath);
Debug.Log("Path Found and loaded: " + tokenizerModelPath);
} catch (Exception e) {
Console.WriteLine("{0} Exception caught.", e);
Debug.Log("Exception: " + tokenizerModelPath);
}
}
if (File.Exists(tokenizerI2WPath)) {
//This Crashes the Editor
try {
Instance.tokenizerI2WHandle = BlingFireUtils.LoadModel(tokenizerI2WPath);
Debug.Log("Path Found and loaded: " + tokenizerI2WPath);
} catch (Exception e) {
Console.WriteLine("{0} Exception caught.", e);
Debug.Log("Exception: " + tokenizerI2WPath);
}
}
//BlingFireUtils.SetNoDummyPrefix(GetTokenizerHandle(), true);
//BlingFireUtils.SetNoDummyPrefix(GetTokenizerI2Handle(), true);
Instance.isLoaded = true;
}
private void OnDestroy() {
BlingFireUtils.FreeModel(tokenizerHandle);
}
public static int[] Tokenize(string input_str) {
byte[] inBytes = System.Text.Encoding.UTF8.GetBytes(input_str);
int[] ids = new int[128];
int outputCount = BlingFireUtils.TextToIds(GetTokenizerHandle(), inBytes, inBytes.Length, ids, 8, -100);
ids = ids.Take(outputCount).ToArray();
return ids;
}
String GetPath(string fileName) {
return Path.GetFullPath(Path.Combine(new string[] { Application.dataPath, "Models", fileName }));
}
}
LanguageModel.cs
This is where I run the Load() function. Sorry for the mess, no time to make it tasty for open source consumption ๐
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using BlingFire;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using UnityEngine;
public class LanguageModel : MonoBehaviour {
public const int VOCAB_SIZE = 50257;
String modelFilePath = "Models/gpt-neo-125M.onnx";
OrtEnv ortEnv = OrtEnv.Instance();
InferenceSession ortSession;
private void Awake() {
ortEnv.DisableTelemetryEvents();
LoadModel();
}
void Start() {
//LoadTokenizer();
}
void Update() {
if (Input.GetKeyDown(KeyCode.A)) {
LoadTokenizer();
}
if (Input.GetKeyDown(KeyCode.B)) {
//LoadModel();
}
if (Input.GetKeyDown(KeyCode.LeftControl)) {
LoadTokenizer();
var startTime = System.DateTime.Now;
string testText = "I am a man and I";
int[] inputIds = BlingTokenizer.Tokenize(testText);
var preProcTime = System.DateTime.Now;
Tensor<float> output_ids = Predict(inputIds);
var gptNeoRunTime = System.DateTime.Now;
List<List<LMPrediction>> preds = PostProcess(output_ids, 4, inputIds.Length);
var endTime = System.DateTime.Now;
Debug.Log("Gpt Neo Runtime: " + (gptNeoRunTime - startTime).TotalMilliseconds);
Debug.Log("LM PreProc Time: " + (preProcTime - startTime).TotalMilliseconds + "ms; " + "PostProcess runtime: " + (endTime - gptNeoRunTime).TotalMilliseconds + "ms");
PrintPreds(preds);
Debug.Log(NaivePredsToText(preds));
}
}
void LoadTokenizer() {
BlingTokenizer.Load();
}
void LoadModel() {
String modelPath = GetPath();
ortSession = new InferenceSession(modelPath);
}
Tensor<float> Predict(int[] input_ids) {
Debug.Log(input_ids.Length);
Tensor<Int64> input_tensor = new DenseTensor<Int64>(new [] { 1, 1, 8 });
for (int i = 0; i < 8; i++) {
input_tensor[0, 0, i] = i < input_ids.Length?input_ids[i]: -100;
}
var model_inputs = new List<NamedOnnxValue>() {
NamedOnnxValue.CreateFromTensor("input_ids", input_tensor)
};
var model_outputs = ortSession.Run(model_inputs);
var token_activation_output = model_outputs.First((v) => v.Name == "output_0").AsTensor<float>();
//var token_activation_output12 = model_outputs.First((v) => v.Name == "output_12").AsTensor<float>();
Debug.Log($"Got an output tensor [{String.Join(",", token_activation_output.Dimensions.ToArray())}]");
return token_activation_output;
}
List<LMPrediction> CreatePredictions(Tensor<float> token_activation_output, int top_k = -1, int skip = 0) {
var logits = token_activation_output.AsEnumerable<float>().Skip(skip * VOCAB_SIZE).Take(VOCAB_SIZE);
float sum = logits.Sum(x => (float)Math.Exp(x));
IEnumerable<float> softmax = logits.Select(x => (float)Math.Exp(x) / sum);
var test_sorted_predictions = softmax.Select((x, i) => new LMPrediction() { token = i, confidence = x }).OrderByDescending(x => x.confidence).Take(top_k > 0 ? top_k : VOCAB_SIZE);
return test_sorted_predictions.ToList();
}
public List<List<LMPrediction>> PostProcess(Tensor<float> token_activation_output, int top_k = -1, int inputSize = 1, int genLength = 8) {
List<List<LMPrediction>> output_predictions = new List<List<LMPrediction>>();
for (int i = 0; i < genLength - inputSize; i++) {
output_predictions.Add(CreatePredictions(token_activation_output, top_k, inputSize + i));
}
return output_predictions;
}
string NaivePredsToText(List<List<LMPrediction>> preds) {
string output = "";
foreach (List<LMPrediction> predList in preds) {
output += predList.FirstOrDefault().ToWord() + " ";
}
return output;
}
void PrintPreds(List<List<LMPrediction>> preds) {
string printVal = "";
foreach (List<LMPrediction> predList in preds) {
printVal += " \n---------------\n" + String.Join("\n", predList.Select(x => x.ToConfidenceStr()));
}
Debug.Log(printVal);
}
String GetPath() {
return Path.Combine(new string[] { Application.dataPath, modelFilePath });
}
}
But yeah, I'm not using "BlankFire" anymore ๐ถ
from blingfire.
Hey! Thanks for the code snippets and the extensive response. Unfortunately, after trying both your approach and much more trial and error I still keep getting the same error... LoadModel is definitively trying to access weird stuff since if I do attach to process and debug from visual studio it sometimes works. It seems really random. Sometimes it's the .i2w file, the .bin file or both!
I was mostly interested in their claims for performance since I believe a huge problem for text-generating models being deployed in Unity is how slow the inference runs are... So what other services have you tried to forgo BlingFire altogether?
Thanks again for the help!
from blingfire.
Hi, @nicolastamm! My initial idea which did not crash, was to load model on one button click, then wait a bit and load tokenizer on another. Just bind the functions to different keys and experiment with that. Also, have you tried different Unity versions? For me it was 2021.2.7f1, which ran most consistently.
Well I can't complain about the speed. When it worked, it worked fast, the problems, however are obviously consistency and adaptability, as you've experienced too I see ๐
I did not use other services per se. I went back to service-based approach:
- Load your model in Python locally;
- Run an API service locally (or on a server if you have one);
- In Unity write a client to call the API, with a json request;
- Receive and parse the response in Unity and do what you want thereafter with it.
It was my initial attempt to do things last year and it worked quite well. Nevertheless natively running things with less intermediary code would be faster, so I decided to try libraries. None worked initially, some were too limited (i.e. NatML). Tried Barracuda, Microsoft.ML.NET, Microsoft.ML.OnnxRuntime, which finally run things via blingfire's help.
Anyways, when running inference the local service way, I managed to dish out 30+ FPS with 128 x 128 size images in segmentation tasks. There's a throttling problem, but you can code around that. My only suggestion would be not to use raw libraries or frameworks which are made for experimentation, but instead try onnx runtime, it proved to be 2-3 times faster in some cases. A headache might be getting to export the model to the onnx format, but many models already have them out there, i.e. most ๐ค transformers-hosted models do, which, let's be honest, has most language models nowadays! On the other hand, Coqui-TTS doesn't, so you'd have to DIY through it.
from blingfire.
Related Issues (20)
- Byte offsets for original input bytes to allow non-destructive tokenization
- Trouble installing for custom model creation HOT 2
- M2M100 Marianmt tokenizers
- what is the last char of the last word from GetWords?
- How to create i2w model HOT 1
- Missing numpy dependency on setup.py
- Missing vcruntime140.dll and vcruntime140_1.dll dependencies HOT 4
- Could java call the tokenizer of bin file
- Add xlm-roberta-large tokenization support
- BlingFire fails with all-lowercase text
- "terminate called after throwing an instance of 'std::runtime_error'" HOT 1
- Issues building on Mac OSX M2 HOT 1
- Unable to Modify Tokenization Logic
- /O2 in CMakeLists.txt is incompatible with vcpkg using Ninja
- Import issue on MacOS M1 HOT 5
- Support for CLIP tokenizers from Hugging Face
- Build_Dll_For_Linux_ARM64 job fails
- c# example negative offset for Starts
- Loading the bert_base_tok.bin model sometimes throws an exception
- Unable to load DLL 'blingfiretokdll' or one of its dependencies: ๆพไธๅฐๆๅฎ็ๆจกๅใ (0x8007007E) System.DllNotFoundException: Unable to load DLL 'blingfiretokdll' or one of its dependencies: ๆพไธๅฐๆๅฎ็ๆจกๅใ (0x8007007E) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blingfire.