null8626 / decancer Goto Github PK
View Code? Open in Web Editor NEWA library that removes common unicode confusables/homoglyphs from strings.
License: MIT License
A library that removes common unicode confusables/homoglyphs from strings.
License: MIT License
Hi there,
I'm experiencing an issue where decancer
automatically lowercases all uppercase letters. As far as I understand, this is unintentional behavior (since it is not documented anywhere I looked). If it is intentional, could we have an option that keeps the case of non-violating characters?
decancer
Version: 2.0.2import decancer from "decancer"
console.log(decancer("Test").toString()) // Expected output: "Test", actual output: "test"
console.log(decancer("TeSt").toString()) // Expected output: "TeSt", actual output: "test"
Line 233 in 3f4e7df
noundef
. This will cause aborts in a couple Rust versions, when this pattern is compiled to ud2
(undefined instruction, abort instantly).
This is already being caught by running clippy
locally, which I suggest is added to CI, under the clippy::uninit_assumed_init
lint.
There currently aren't any good libraries for decancering text in python.
I'd like python bindings using a lib such as PyO3.
I tried reinventing the wheel but failed and would love to integrate this perfectly working library into my python projects as well.
Not sure why this is the case. Happens on version 1.6.2 in Rust with the error attempt to subtract with overflow
, error location being src/lib.rs:95:20
. This also occurs with cure_char()
.
Hi,
I expect text of different languages, arabic included.
Take this sentence from https://ar.lipsum.com/
import decancer from 'decancer'
const str = decancer('لا أحد يحب الألم بذاته، يسعى ورائه أو يبتغيه').toString()
console.log(str)
This gives a very scrambled text:
oseتبs gi osijg seسs ,oتiذب مjiji بcs دci ij
Thank you for providing Decancer :)
Ever since July 2023, i have been thinking about adding back Arabic and Hebrew support for decancer
without causing issues because of their right-to-left madness. Then i've found unicode-bidi
which rerenders your mixed RTL/LTR text in memory as it were to be rendered by a web browser.
The plan is to somehow implement its algorithm here - an attempt has been in the works since then, but due to school and the complexity of Unicode's bidirectional algorithm, it has been in hiatus for months.
Because of this, development on this library has (publicly) stagnated as the attention has been directed to this enhancement.
Hi @null8626 ,
Thank you for providing this very fast and bright library !
On VSCode, I'm putting some coding rules with TS and Eslint (only on IDE, it's still a JS project), I'm having an error on decancer
function
This expression is not callable.
Type 'typeof import("c:/Users/Administrator/WebstormProjects/classified-ads/node_modules/decancer/src/typings")' has no call signatures.
When running JUnit tests that use this library - calling new CuredString("value")
results in an exception
Caused by: java.lang.RuntimeException: [x86_64-unknown-linux-gnu] this operating system (Linux) and/or architecture (x64) is not supported.
original error:
no decancer-x86_64-unknown-linux-gnu in java.library.path: /usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
at com.github.null8626.decancer.CuredString.<clinit>(CuredString.java:70)
... 48 more
I have identified the issue is with the exception for loading the library for JUnit tests.
if (CuredString.isJUnit()) {
System.loadLibrary("decancer-" + rustTarget);
}
I assume it was supposed to detect test runs for regression testing of this library. But it does not distinguish it from running it from another project. It goes through the thread stack trace and checks the class name, looking for the first appearence of org.junit element.getClassName().startsWith("org.junit.")
If I remove the condition for JUnit tests and use the NativeUtils to load the library as is used for the production use. It works well in the unit tests as well.
Would it be possible to create a more sophisticated approach to detecting which unit tests are run?
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.github.null8626</groupId>
<artifactId>decancer</artifactId>
<version>v3.2.0</version>
</dependency>
</dependencies>
</dependencyManagement>
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
OS: Ubuntu 24
Java Version: Java 21
Library Version: v3.2.0
I allow users to create simple patterns with asterisks which boils down to equal/startsWith/endsWith/contains. This is currently the only missing feature from adopting this library in production.
test
-> equaltest*
-> startsWith*test
-> endsWith*test*
-> containsI'd really love to see these functions being added <3. Thanks!
I.e. when fed unicode character “ˑ” (U+02D1) (Modifier Letter Half Triangular Colon):
fn main() {
println!("{}", decancer::cure("ˑ").as_str());
println!("This never prints.");
}
See #14 for full comments.
Some portions utilize unsafe code to optimize performance. It would be useful for developers (current and future) if the purpose and invariants of the unsafe code were laid out in a comment to prevent violations.
There is no reason for this crate to use unsafe
code.
As of version v3.2.0, i've noticed that Jitpack no longer allows newer releases of the Java binding of decancer due to the repository's history exceeding 500 MB.
Therefore, i am currently in the process of migrating from jitpack to another host! Sadly, this may delay future releases.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.