Git Product home page Git Product logo

Comments (7)

museun avatar museun commented on August 25, 2024 1

display-name can be localized (e.g. non-ascii scripts). Its a newish feature (perhaps a year now) that allows non-western users to provide a native name.

Looking at the code, I don't know if I can fix this in 0.14.x. There is a fatal flaw here:

for (i, ch) in input.char_indices().skip(1) {
let i = i + 1;

I would need to calculate the byte offset of all further 'chars' which is something I don't really want to do.

I kind of want to get rid of the the whole super-cheap indices approach. Currently, all of the messages, each, use a single allocation and then provide their 'sub-strings' as indices. These indices refer back to the single &str/String. But getting rid of this and just moving to a naive 'struct of indv. &str/String would probably break the semver.

This'd allow me to use utf-8 aware splitting without having to really be considerate of boundaries -- let the std library provide that. I have quite a bit already pushed for the 0.15.x branch (#226). I can take this into consideration, but I'm still looking at providing one last 0.14.x release. I'm going to think about a way of not breaking the semver when changing all of the internal memory representation of the types.

from twitchchat.

museun avatar museun commented on August 25, 2024

I see, was this from Twitch or were your parsing custom messages?

I should look into issue templates, all I really need is the input data (or sample), the problem and the crate version. Its in the tags which I last added string escaping for. I'll see where its getting the indices wrong. I also assume it was the display-name tag.

from twitchchat.

olback avatar olback commented on August 25, 2024

Yes, this was from Twitch. I don't really know how a non ascii character even gets into the display-name as it has to be the same as your username, just different capitalization.

When titling the issue I assumed that all tags are parsed the same way, the issue is with display-name though.

Edit: Managed to get the user-id from one of the crashing messages, here is a partial response from the /helix/users endpoint with user in question:

{
  "data": [
    {
      "id": "86293428",
      "login": "yuebing233",
      "display_name": "月饼",
      ...
    }
  ]
}

from twitchchat.

museun avatar museun commented on August 25, 2024

A bit more thinking, I can just change the tags representation -- it already has to allocate a boxed slice:

pub struct TagIndices {
pub(super) map: Box<[(MaybeOwnedIndex, MaybeOwnedIndex)]>,
}

I can just make this a Box<[(Cow<'a, str>, Cow<'a, str>)]> internally and it wouldn't change the public API. This would ensure the tag and its index (now removed) are always using the same character (code point/scalar) boundaries.

I would basically just remove the indices and make

#[derive(Clone, PartialEq)]
pub struct Tags<'a> {
pub(crate) data: &'a MaybeOwned<'a>,
pub(crate) indices: &'a TagIndices,
}
simpler.

I don't think I expose most of this to the user so it shouldn't be breaking.

from twitchchat.

olback avatar olback commented on August 25, 2024

Looks great.

display-name can be localized (e.g. non-ascii scripts). Its a newish feature (perhaps a year now) that allows non-western users to provide a native name.

Ah, neat.

from twitchchat.

museun avatar museun commented on August 25, 2024

I found a workaround, its not ideal but its transparent for the most part.

I've published 0.14.6 which fixes this problem.

from twitchchat.

olback avatar olback commented on August 25, 2024

Thank you!

from twitchchat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.