Git Product home page Git Product logo

Comments (6)

nicolastamm avatar nicolastamm commented on May 18, 2024 2

I am currently doing something similar. I plan to build a .net core app running onnxruntime + blingfire and then a unity client to send an HTTP request. The reason for the extra hoop is a concern for compatibility and computing power. The project's extra step is to run this in VR, ideally using stand-alone headsets.

I will, however, explore the option to run everything natively after I've dealt with the deadlines for this project.

Thanks again for your erudite responses! Much appreciated

from blingfire.

Darth-Carrotpie avatar Darth-Carrotpie commented on May 18, 2024

I found a solution via trial and error.
Case
Crash happens when model 'load' is run at the same time as the tokenizer load.
Solution
Let the model load fully, load Tokenizer only after some time, i.e. 1 second delay.
Strangely enough this works.

from blingfire.

nicolastamm avatar nicolastamm commented on May 18, 2024

Hey Darth-Carrotpie!

I am dealing with precisely the same issue as you are, but I can't really parse your solution.
This being my code:
image

And my runtime error message is the same.

Where exactly should I Invoke("wait", 1) for the error to go away? If by loading the model you mean gpt-2, I am not doing that yet.

from blingfire.

Darth-Carrotpie avatar Darth-Carrotpie commented on May 18, 2024

Hi @nicolastamm , sorry for late response. I hope it helps if I respond at least now ๐Ÿ˜„
My suggestion is to try it binding on a OnKeyDown and not running any loading tasks on Start(). BlingFire is not friendly on that regard, it often throws up something in memory.... and since Unity is handling stuff still when Awaks() and Start() are run, they probably collide somehow, not sure how thow, did not have time to delve that deep.

After spending half a year with BlingFire though, my humble suggestion would be not to use it at all ๐Ÿ‘Ž
The main reason why I moved away from it back to service approach, is that you'd eventually (when you need your own tokenization core) need to rebuild Blingfire from source yourself, because of how Tokenizers are coded there. Which sucks, cause they did not make it easy... ๐Ÿ˜บ Also there's barely any support and help ๐Ÿ˜† probably because community is virtually non-existent.

If It helps, here's some code snippets, ping me if you need more:

BlingTokenizer.cs

The main script which handles loading.

using System;
using System.IO;
using System.Linq;
using BlingFire;
using UnityEngine;
public class BlingTokenizer : Singleton<BlingTokenizer> {
    ulong tokenizerHandle = 0;
    ulong tokenizerI2WHandle = 0;
    bool isLoaded = false;
    public static ulong GetTokenizerHandle() {
        return Instance.tokenizerHandle;
    }
    public static ulong GetTokenizerI2Handle() {
        return Instance.tokenizerI2WHandle;
    }

    public static void Load() {
        if (Instance.isLoaded)return;
        string tokenizerModelPath = Instance.GetPath("gpt2.bin");
        string tokenizerI2WPath = Instance.GetPath("gpt2.i2w");

        if (File.Exists(tokenizerModelPath)) {
            try {
                Instance.tokenizerHandle = BlingFireUtils.LoadModel(tokenizerModelPath);
                Debug.Log("Path Found and loaded: " + tokenizerModelPath);
            } catch (Exception e) {
                Console.WriteLine("{0} Exception caught.", e);
                Debug.Log("Exception: " + tokenizerModelPath);
            }
        }
        if (File.Exists(tokenizerI2WPath)) {
            //This Crashes the Editor
            try {
                Instance.tokenizerI2WHandle = BlingFireUtils.LoadModel(tokenizerI2WPath);
                Debug.Log("Path Found and loaded: " + tokenizerI2WPath);
            } catch (Exception e) {
                Console.WriteLine("{0} Exception caught.", e);
                Debug.Log("Exception: " + tokenizerI2WPath);
            }
        }
        //BlingFireUtils.SetNoDummyPrefix(GetTokenizerHandle(), true);
        //BlingFireUtils.SetNoDummyPrefix(GetTokenizerI2Handle(), true);
        Instance.isLoaded = true;
    }
    private void OnDestroy() {
        BlingFireUtils.FreeModel(tokenizerHandle);
    }
    public static int[] Tokenize(string input_str) {
        byte[] inBytes = System.Text.Encoding.UTF8.GetBytes(input_str);
        int[] ids = new int[128];
        int outputCount = BlingFireUtils.TextToIds(GetTokenizerHandle(), inBytes, inBytes.Length, ids, 8, -100);
        ids = ids.Take(outputCount).ToArray();
        return ids;
    }

    String GetPath(string fileName) {
        return Path.GetFullPath(Path.Combine(new string[] { Application.dataPath, "Models", fileName }));
    }
}
LanguageModel.cs

This is where I run the Load() function. Sorry for the mess, no time to make it tasty for open source consumption ๐Ÿ˜†

using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using BlingFire;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using UnityEngine;

public class LanguageModel : MonoBehaviour {
  public const int VOCAB_SIZE = 50257;
  String modelFilePath = "Models/gpt-neo-125M.onnx";

  OrtEnv ortEnv = OrtEnv.Instance();
  InferenceSession ortSession;
  private void Awake() {
      ortEnv.DisableTelemetryEvents();
      LoadModel();
  }
  void Start() {
      //LoadTokenizer();
  }

  void Update() {
      if (Input.GetKeyDown(KeyCode.A)) {
          LoadTokenizer();
      }
      if (Input.GetKeyDown(KeyCode.B)) {
          //LoadModel();
      }
      if (Input.GetKeyDown(KeyCode.LeftControl)) {
          LoadTokenizer();
          var startTime = System.DateTime.Now;
          string testText = "I am a man and I";
          int[] inputIds = BlingTokenizer.Tokenize(testText);
          var preProcTime = System.DateTime.Now;
          Tensor<float> output_ids = Predict(inputIds);
          var gptNeoRunTime = System.DateTime.Now;
          List<List<LMPrediction>> preds = PostProcess(output_ids, 4, inputIds.Length);
          var endTime = System.DateTime.Now;
          Debug.Log("Gpt Neo Runtime: " + (gptNeoRunTime - startTime).TotalMilliseconds);
          Debug.Log("LM PreProc Time: " + (preProcTime - startTime).TotalMilliseconds + "ms;   " + "PostProcess runtime: " + (endTime - gptNeoRunTime).TotalMilliseconds + "ms");
          PrintPreds(preds);
          Debug.Log(NaivePredsToText(preds));
      }
  }
  void LoadTokenizer() {
      BlingTokenizer.Load();
  }
  void LoadModel() {
      String modelPath = GetPath();
      ortSession = new InferenceSession(modelPath);
  }
  Tensor<float> Predict(int[] input_ids) {
      Debug.Log(input_ids.Length);
      Tensor<Int64> input_tensor = new DenseTensor<Int64>(new [] { 1, 1, 8 });
      for (int i = 0; i < 8; i++) {
          input_tensor[0, 0, i] = i < input_ids.Length?input_ids[i]: -100;
      }

      var model_inputs = new List<NamedOnnxValue>() {
          NamedOnnxValue.CreateFromTensor("input_ids", input_tensor)
      };
      var model_outputs = ortSession.Run(model_inputs);

      var token_activation_output = model_outputs.First((v) => v.Name == "output_0").AsTensor<float>();
      //var token_activation_output12 = model_outputs.First((v) => v.Name == "output_12").AsTensor<float>();
      Debug.Log($"Got an output tensor [{String.Join(",", token_activation_output.Dimensions.ToArray())}]");
      return token_activation_output;
  }
  List<LMPrediction> CreatePredictions(Tensor<float> token_activation_output, int top_k = -1, int skip = 0) {
      var logits = token_activation_output.AsEnumerable<float>().Skip(skip * VOCAB_SIZE).Take(VOCAB_SIZE);
      float sum = logits.Sum(x => (float)Math.Exp(x));
      IEnumerable<float> softmax = logits.Select(x => (float)Math.Exp(x) / sum);
      var test_sorted_predictions = softmax.Select((x, i) => new LMPrediction() { token = i, confidence = x }).OrderByDescending(x => x.confidence).Take(top_k > 0 ? top_k : VOCAB_SIZE);
      return test_sorted_predictions.ToList();
  }
  public List<List<LMPrediction>> PostProcess(Tensor<float> token_activation_output, int top_k = -1, int inputSize = 1, int genLength = 8) {
      List<List<LMPrediction>> output_predictions = new List<List<LMPrediction>>();
      for (int i = 0; i < genLength - inputSize; i++) {
          output_predictions.Add(CreatePredictions(token_activation_output, top_k, inputSize + i));
      }
      return output_predictions;
  }

  string NaivePredsToText(List<List<LMPrediction>> preds) {
      string output = "";
      foreach (List<LMPrediction> predList in preds) {
          output += predList.FirstOrDefault().ToWord() + " ";
      }
      return output;
  }

  void PrintPreds(List<List<LMPrediction>> preds) {
      string printVal = "";
      foreach (List<LMPrediction> predList in preds) {
          printVal += " \n---------------\n" + String.Join("\n", predList.Select(x => x.ToConfidenceStr()));
      }
      Debug.Log(printVal);
  }

  String GetPath() {
      return Path.Combine(new string[] { Application.dataPath, modelFilePath });
  }
}

But yeah, I'm not using "BlankFire" anymore ๐Ÿถ

from blingfire.

nicolastamm avatar nicolastamm commented on May 18, 2024

Hey! Thanks for the code snippets and the extensive response. Unfortunately, after trying both your approach and much more trial and error I still keep getting the same error... LoadModel is definitively trying to access weird stuff since if I do attach to process and debug from visual studio it sometimes works. It seems really random. Sometimes it's the .i2w file, the .bin file or both!

I was mostly interested in their claims for performance since I believe a huge problem for text-generating models being deployed in Unity is how slow the inference runs are... So what other services have you tried to forgo BlingFire altogether?

Thanks again for the help!

from blingfire.

Darth-Carrotpie avatar Darth-Carrotpie commented on May 18, 2024

Hi, @nicolastamm! My initial idea which did not crash, was to load model on one button click, then wait a bit and load tokenizer on another. Just bind the functions to different keys and experiment with that. Also, have you tried different Unity versions? For me it was 2021.2.7f1, which ran most consistently.

Well I can't complain about the speed. When it worked, it worked fast, the problems, however are obviously consistency and adaptability, as you've experienced too I see ๐Ÿ˜„

I did not use other services per se. I went back to service-based approach:

  • Load your model in Python locally;
  • Run an API service locally (or on a server if you have one);
  • In Unity write a client to call the API, with a json request;
  • Receive and parse the response in Unity and do what you want thereafter with it.

It was my initial attempt to do things last year and it worked quite well. Nevertheless natively running things with less intermediary code would be faster, so I decided to try libraries. None worked initially, some were too limited (i.e. NatML). Tried Barracuda, Microsoft.ML.NET, Microsoft.ML.OnnxRuntime, which finally run things via blingfire's help.

Anyways, when running inference the local service way, I managed to dish out 30+ FPS with 128 x 128 size images in segmentation tasks. There's a throttling problem, but you can code around that. My only suggestion would be not to use raw libraries or frameworks which are made for experimentation, but instead try onnx runtime, it proved to be 2-3 times faster in some cases. A headache might be getting to export the model to the onnx format, but many models already have them out there, i.e. most ๐Ÿค— transformers-hosted models do, which, let's be honest, has most language models nowadays! On the other hand, Coqui-TTS doesn't, so you'd have to DIY through it.

from blingfire.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.