Continuing from <a class="issue-link js-issue-link" data-error-text="Failed to load ti

Right, I am not sure how Java handles Unicode and UTF-8, so I am not sure this c

Just documented the current impl: <a class="commit-link" data-hovercard-type="commit"

[Disucssion] handling the "middle" truncation option about patternfly-yew HOT 8 OPEN

patternfly-yew commented on August 23, 2024

[Disucssion] handling the "middle" truncation option

from patternfly-yew.

Comments (8)

ctron commented on August 23, 2024 2

In the (unlikely) event that someone urgently needs the amount of grapheme clusters to be correct at the end of the string, you could maybe think about opting in to a feature.

Aside from PRs welcome … one can still manually create the TruncateContent::Middle enum variant. And bring your own logic.

from patternfly-yew.

aDogCalledSpot commented on August 23, 2024 2

Also, amount of characters is a very inaccurate way of measuring how much text you want to keep. 10 "i"s is a lot shorter than 10 "m"s and graphemes could be much much wider again.

from patternfly-yew.

helio-frota commented on August 23, 2024

and anything better than that will require an extra dependency. So a comment would probably be good,

+1 to add a comment.

Apparently the docs says nothing about this case:
https://www.patternfly.org/components/truncate/design-guidelines/

And they are not using external deps: https://github.com/patternfly/patternfly-react/blob/main/packages/react-core/src/components/Truncate/Truncate.tsx#L46

from patternfly-yew.

ctron commented on August 23, 2024

Right, I am not sure how JavaScript handles Unicode and UTF-8, so I am not sure this can be compared. In Rust, you have "characters", but they are more like "technical" characters. Not “user-perceived characters” (aka grapheme clusters). Also see: https://unicode.org/reports/tr29/ and maybe https://stackoverflow.com/questions/58770462/how-to-iterate-over-unicode-grapheme-clusters-in-rust

Now it gets even more complicated, strings are indexed by bytes (not characters). So split_at (https://doc.rust-lang.org/std/primitive.str.html#method.split_at) will panic in case one picks a byte which is actually part of a multi-byte unicode "code point".

The current implementation is pretty naive, and just looks for the next location which can he split, based on the byte index. That's a rather quick operation. Assuming Latin 1, most likely is a hit at the first attempt. Otherwise, it takes just a few tries more.

The downside is that it might be horribly wrong. Having mostly non Latin1 characters, the byte index doesn't work well. Still, it's quick.

Searching for the x-th characters from the end of a string, is a terribly imperforment operation. As one needs to count all "characters" from the beginning of the string. And even then, one would actually need to count all "grapheme clusters" from the start of a string.

Maybe it's just better to drop this use case. Then again the current thing is better than not having it.

from patternfly-yew.

ctron commented on August 23, 2024

Just documented the current impl: 1dccf8a

from patternfly-yew.

helio-frota commented on August 23, 2024

Thanks for clarification. I was basically comparing patternfly and patternfly-yew.
The extra info also make more sense to me the original comment in the issue now.

I am not sure how JavaScript handles Unicode and UTF-8, so I am not sure this can be compared.

For a quick info from https://exploringjs.com/impatient-js/ch_strings.html#atoms-of-text

* Code points are the atomic parts of Unicode text. Each code point is 21 bits in size.

* JavaScript strings implement Unicode via the encoding format UTF-16. It uses one or two 16-bit code units to encode a single code point.

    * Each JavaScript character (as indexed in strings) is a code unit. In the JavaScript standard library, code units are also called char codes.

* Grapheme clusters (user-perceived characters) represent written symbols, as displayed on screen or paper. One or more code points are needed to encode a single grapheme cluster.

TC39 is working on [Intl.Segmenter](https://github.com/tc39/proposal-intl-segmenter), a proposal for the ECMAScript Internationalization API to support Unicode segmentation (along grapheme cluster boundaries, word boundaries, sentence boundaries, etc.).

Until that proposal becomes a standard, we can use one of several libraries that are available (do a web search for “JavaScript grapheme”).

I found this table where apparently firefox is not supporting yet.

And the 15 million downloads from this library graphemer are also related to the dependents
( So I'm not sure if end users are using this directly or the std API Intl.Segmenter etc...)

Also see: https://unicode.org/reports/tr29/ and maybe https://stackoverflow.com/questions/58770462/how-to-iterate-over-unicode-grapheme-clusters-in-rust

Thanks for the links, in a comment a person shared that rust was supporting it on the stdlib, in the past.

so yeah now I got the naive and the might be horribly wrong parts, thanks for the clarification ...

in this case, the comment to be added should be "Warning: this might be horribly wrong !" 😄
or to use external dependency like @aDogCalledSpot mentioned in the beginning.

Maybe it's just better to drop this use case. Then again the current thing is better than not having it.

yeah I agree, a product/project Foo using patternfly-yew will certainly open an issue in the future, in case needed so 🤷‍♂️ 👍

from patternfly-yew.

aDogCalledSpot commented on August 23, 2024

I think the comment is sufficient. It would also be worth adding a link to this issue so that the discussion is easily accessible.

I also don't think the implementation needs fixing. If someone is using graphemes that need a lot of bytes, then you will always have a few less of those at the end compared to if everything was ASCII, i.e. visually "more" is truncated - truncating less might be problematic but that will never happen. The behavior is documented now so it is easy to investigate why this is happening if someone stumbles upon it. In the (unlikely) event that someone urgently needs the amount of grapheme clusters to be correct at the end of the string, you could maybe think about opting in to a feature.

from patternfly-yew.

ctron commented on August 23, 2024

That's why I am just using the ExpandableSection component with the Truncate variant. It uses "x lines" and allows for using HTML content.

from patternfly-yew.

[Disucssion] handling the "middle" truncation option about patternfly-yew HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent