Git Product home page Git Product logo

Comments (3)

tushuhei avatar tushuhei commented on May 5, 2024

After some investigation, I'm starting to feel like this fix is not trivial.
Separator HTML elements may come to the end of a paragraph node and the top of the following paragraph node, so we need a postprocess to look for any duplicated separators across the paragraph.nodes. If we can assume that the separator is always <wbr>, it's easy to locale them with querySelectorAll('wbr') and the nextSibling property. However, the reality is that the separator can be an arbitrary node, thus the querySelectorAll(TAG_NAME) approach won't work. I'm feeling we could close this issue if there's no performant way to implement the fix as it should work even if the separator is duplicated. @kojiishi wdyt?

from budoux.

kojiishi avatar kojiishi commented on May 5, 2024

Avoiding redundant separaters is easy, but applying multiple times is a bit complicated, because once BudouX is applied, it's difficult to distinguish <wbr> inserted by BudouX from the one author originally inserted.

One way to do this is removing existing separaters before inserting new ones. This looks like a clean way, but this doesn't work well, becuase this option ignores author-inserted separaters.

The 2nd option. BudouX can check if there's an existing separater before inserting a new one. This can fix the redundant separators, but if the content changes and if we want to apply BudouX again, existing separaters for the previous content will remain. For example:

  1. Original content abcabcabc
  2. Applied once: abc<wbr>abc<wbr>abc
  3. A c was removed: abc<wbr>ab<wbr>abc
  4. Then applying again doesn't remove the <wbr> after ab. The result is different from when applying to abcababc.

The 3rd option is to mark separators. For example, xyz<wbr>abcabc can become to xyz<wbr budoux-origin=author>abc<wbr budoux-origin=auto>abc.

The 4th option is BudouX to insert different separators and apply the option 1. For example, if author is supposed to use <wbr>, BudouX can insert &ZeroWidthSpace;. This is a lot simpler than the option 3, but requires authors to be aware of this.

If we want to fix redundant separators without worrying about applying multiple times for updated content, the option 2 looks reasonable and simple to me.

Thoughts?

from budoux.

tushuhei avatar tushuhei commented on May 5, 2024

Thanks for your thorough consideration. I think the 2nd option is the way to go. We want to respect word break opportunities inserted by the author initially, but we should not distinguish if they're inserted by the author or BudouX itself from the second run.

from budoux.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.