Comments (3)
After some investigation, I'm starting to feel like this fix is not trivial.
Separator HTML elements may come to the end of a paragraph node and the top of the following paragraph node, so we need a postprocess to look for any duplicated separators across the paragraph.nodes
. If we can assume that the separator is always <wbr>
, it's easy to locale them with querySelectorAll('wbr')
and the nextSibling
property. However, the reality is that the separator can be an arbitrary node, thus the querySelectorAll(TAG_NAME)
approach won't work. I'm feeling we could close this issue if there's no performant way to implement the fix as it should work even if the separator is duplicated. @kojiishi wdyt?
from budoux.
Avoiding redundant separaters is easy, but applying multiple times is a bit complicated, because once BudouX is applied, it's difficult to distinguish <wbr>
inserted by BudouX from the one author originally inserted.
One way to do this is removing existing separaters before inserting new ones. This looks like a clean way, but this doesn't work well, becuase this option ignores author-inserted separaters.
The 2nd option. BudouX can check if there's an existing separater before inserting a new one. This can fix the redundant separators, but if the content changes and if we want to apply BudouX again, existing separaters for the previous content will remain. For example:
- Original content
abcabcabc
- Applied once:
abc<wbr>abc<wbr>abc
- A
c
was removed:abc<wbr>ab<wbr>abc
- Then applying again doesn't remove the
<wbr>
afterab
. The result is different from when applying toabcababc
.
The 3rd option is to mark separators. For example, xyz<wbr>abcabc
can become to xyz<wbr budoux-origin=author>abc<wbr budoux-origin=auto>abc
.
The 4th option is BudouX to insert different separators and apply the option 1. For example, if author is supposed to use <wbr>
, BudouX can insert ​
. This is a lot simpler than the option 3, but requires authors to be aware of this.
If we want to fix redundant separators without worrying about applying multiple times for updated content, the option 2 looks reasonable and simple to me.
Thoughts?
from budoux.
Thanks for your thorough consideration. I think the 2nd option is the way to go. We want to respect word break opportunities inserted by the author initially, but we should not distinguish if they're inserted by the author or BudouX itself from the second run.
from budoux.
Related Issues (20)
- [quality] "ありがとうございます。"
- [quality] まとめる HOT 1
- [configuration] Usage on browser's web worker
- [quality] Better Support for Japanese Hiragana Adverbs
- [quality] あなたの意図したとおりに情報を伝えることができます。 HOT 1
- [quality] あのイーハトーヴォのすきとおった風、夏でも底に冷たさをもつ青いそら、うつくしい森で飾られたモリーオ市、郊外のぎらぎらひかる草の波。 HOT 1
- 禁則処理
- Consider to use DocumentFragment
- Styles does not propagate in BudouX Web Components
- Numbers become <a href="tel:">links even if <meta name="format-detection" content="telephone=no"> is set on iOS HOT 4
- Unopened HTML tag causes exception in budoux 0.6 HOT 4
- dart support HOT 1
- [Java] Java version emits close tag for self-closing tags
- [java] `HTMLProcessor.getText()` collapses whitespaces HOT 2
- [quality] "のみ"
- [quality] お問い合わせ
- [quality] いよいよ HOT 1
- Chrome M121 breaks <ruby> unittest HOT 4
- Clarification on the unicode used on the keys HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from budoux.