Git Product home page Git Product logo

Comments (8)

ctron avatar ctron commented on July 21, 2024 2

In the (unlikely) event that someone urgently needs the amount of grapheme clusters to be correct at the end of the string, you could maybe think about opting in to a feature.

Aside from PRs welcome … one can still manually create the TruncateContent::Middle enum variant. And bring your own logic.

from patternfly-yew.

aDogCalledSpot avatar aDogCalledSpot commented on July 21, 2024 2

Also, amount of characters is a very inaccurate way of measuring how much text you want to keep. 10 "i"s is a lot shorter than 10 "m"s and graphemes could be much much wider again.

from patternfly-yew.

helio-frota avatar helio-frota commented on July 21, 2024

and anything better than that will require an extra dependency. So a comment would probably be good,

+1 to add a comment.

Apparently the docs says nothing about this case:
https://www.patternfly.org/components/truncate/design-guidelines/

And they are not using external deps: https://github.com/patternfly/patternfly-react/blob/main/packages/react-core/src/components/Truncate/Truncate.tsx#L46

from patternfly-yew.

ctron avatar ctron commented on July 21, 2024

Right, I am not sure how JavaScript handles Unicode and UTF-8, so I am not sure this can be compared. In Rust, you have "characters", but they are more like "technical" characters. Not “user-perceived characters” (aka grapheme clusters). Also see: https://unicode.org/reports/tr29/ and maybe https://stackoverflow.com/questions/58770462/how-to-iterate-over-unicode-grapheme-clusters-in-rust

Now it gets even more complicated, strings are indexed by bytes (not characters). So split_at (https://doc.rust-lang.org/std/primitive.str.html#method.split_at) will panic in case one picks a byte which is actually part of a multi-byte unicode "code point".

The current implementation is pretty naive, and just looks for the next location which can he split, based on the byte index. That's a rather quick operation. Assuming Latin 1, most likely is a hit at the first attempt. Otherwise, it takes just a few tries more.

The downside is that it might be horribly wrong. Having mostly non Latin1 characters, the byte index doesn't work well. Still, it's quick.

Searching for the x-th characters from the end of a string, is a terribly imperforment operation. As one needs to count all "characters" from the beginning of the string. And even then, one would actually need to count all "grapheme clusters" from the start of a string.

Maybe it's just better to drop this use case. Then again the current thing is better than not having it.

from patternfly-yew.

ctron avatar ctron commented on July 21, 2024

Just documented the current impl: 1dccf8a

from patternfly-yew.

helio-frota avatar helio-frota commented on July 21, 2024

Thanks for clarification. I was basically comparing patternfly and patternfly-yew.
The extra info also make more sense to me the original comment in the issue now.

I am not sure how JavaScript handles Unicode and UTF-8, so I am not sure this can be compared.

For a quick info from https://exploringjs.com/impatient-js/ch_strings.html#atoms-of-text

* Code points are the atomic parts of Unicode text. Each code point is 21 bits in size.

* JavaScript strings implement Unicode via the encoding format UTF-16. It uses one or two 16-bit code units to encode a single code point.

    * Each JavaScript character (as indexed in strings) is a code unit. In the JavaScript standard library, code units are also called char codes.

* Grapheme clusters (user-perceived characters) represent written symbols, as displayed on screen or paper. One or more code points are needed to encode a single grapheme cluster.
TC39 is working on [Intl.Segmenter](https://github.com/tc39/proposal-intl-segmenter), a proposal for the ECMAScript Internationalization API to support Unicode segmentation (along grapheme cluster boundaries, word boundaries, sentence boundaries, etc.).

Until that proposal becomes a standard, we can use one of several libraries that are available (do a web search for “JavaScript grapheme”).

I found this table where apparently firefox is not supporting yet.

And the 15 million downloads from this library graphemer are also related to the dependents
( So I'm not sure if end users are using this directly or the std API Intl.Segmenter etc...)

Also see: https://unicode.org/reports/tr29/ and maybe https://stackoverflow.com/questions/58770462/how-to-iterate-over-unicode-grapheme-clusters-in-rust

Thanks for the links, in a comment a person shared that rust was supporting it on the stdlib, in the past.

so yeah now I got the naive and the might be horribly wrong parts, thanks for the clarification ...

in this case, the comment to be added should be "Warning: this might be horribly wrong !" 😄
or to use external dependency like @aDogCalledSpot mentioned in the beginning.

Maybe it's just better to drop this use case. Then again the current thing is better than not having it.

yeah I agree, a product/project Foo using patternfly-yew will certainly open an issue in the future, in case needed so 🤷‍♂️ 👍

from patternfly-yew.

aDogCalledSpot avatar aDogCalledSpot commented on July 21, 2024

I think the comment is sufficient. It would also be worth adding a link to this issue so that the discussion is easily accessible.

I also don't think the implementation needs fixing. If someone is using graphemes that need a lot of bytes, then you will always have a few less of those at the end compared to if everything was ASCII, i.e. visually "more" is truncated - truncating less might be problematic but that will never happen. The behavior is documented now so it is easy to investigate why this is happening if someone stumbles upon it. In the (unlikely) event that someone urgently needs the amount of grapheme clusters to be correct at the end of the string, you could maybe think about opting in to a feature.

from patternfly-yew.

ctron avatar ctron commented on July 21, 2024

That's why I am just using the ExpandableSection component with the Truncate variant. It uses "x lines" and allows for using HTML content.

from patternfly-yew.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.