Git Product home page Git Product logo

widestring-rs's People

Contributors

clubby789 avatar joshwd36 avatar jrwats avatar kpcyrd avatar lifthrasiir avatar mark-summerfield avatar nicbn avatar openbytedev avatar starkat99 avatar yescallop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

widestring-rs's Issues

Provide lines iterator

Thank you for this great crate, just started using it.

I wanted to split an Utf16String into lines, similar to the built-in lines() function for UTF-8 strings in the standard library. Would be great if this could be added. In general currently missing any kind of splitting iterators like in the stdlib (split_terminator and so on).

WideCString/WideString => &str

Hi there,
this is actually more of a question than an issue but I'm very interested in your opinion. I have been using your WideCString type in a very small rust project and am facing the situation that I have to tranform a variable of type WideCString into a slice of type &str.
I can see various ways of doing so, specifically:

  • tranforming from WideCString => std::String (via to_string_lossy()) => &str
  • transforming from WideCString => &[u16](via as_slice) => &str

I would be really interested in your perspective in this. What would be the best way to go?
For me this is not only a question of writing code that jst works. I'm currently learning rust and am really interested in writing "good" code (whatever that means). So, I'm trying to understand a little bit more than what I have to....
Thanks
Norbert

panic in version 0.5 when no problem in 0.4

This may not be a bug at all.

When I run my application using widestring 0.4 it works fine. But with 0.5 it crashed:

V:\myapp>rustc -V
rustc 1.56.0 (09c42c458 2021-10-18)

V:\myapp>\test.exe
thread 'main' panicked at 'range end index 18446744073709551615 out of range for slice of length 0', C:\Use
rs\mark\.cargo\registry\src\github.com-1ecc6299db9ec823\widestring-0.5.0\src\ucstring.rs:119:15
stack backtrace:
   0:        0x13fd17c60 - std::sys_common::backtrace::_print::impl$0::fmt
                               at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\sys_common\backtrace.rs:46
   1:        0x13fcf25fa - core::fmt::write
                               at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\core\src\fmt\mod.rs:1150
   2:        0x13fd175a8 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
                               at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\io\mod.rs:1667
   3:        0x13fd16cfd - std::panicking::rust_panic_with_hook
                               at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\panicking.rs:624
   4:        0x13fd1ded5 - std::panicking::begin_panic_handler::closure$0
                               at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\panicking.rs:521
   5:        0x13fd1de49 - std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure$0,never$>
                               at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\sys_common\backtrace.rs:141
   6:        0x13fd1de04 - std::panicking::begin_panic_handler
                               at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\std\src\panicking.rs:517
   7:        0x13fd30280 - core::panicking::panic_fmt
                               at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\core\src\panicking.rs:101
   8:        0x13fd30387 - core::slice::index::slice_end_index_len_fail
                               at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa\/library\core\src\slice\index.rs:41
   9:        0x13fce6089 - __acrt_rg_country_count
  10:        0x13fce1006 - __acrt_rg_country_count
  11:        0x13fcf110d - main
  12:        0x13fd23a35 - __scrt_common_main_seh
                               at f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:253
  13:         0x76e0571d - BaseThreadInitThunk
  14:         0x7706385d - RtlUserThreadStart

Unfortunately, this does not appear to give me any clue as to which widestring function is failing.
In my code I use only two widestring functions, WideCString::from_ptr_str and
WideCString::from_str. The former is only used inside one function:

pub fn str_for_win16(p: *const Wchar) -> String {
    if p.is_null() {
        return String::new();
    }
    unsafe { WideCString::from_ptr_str(p).to_string_lossy() }
}

The docs say that WideCString::from_ptr_str will panic if the pointer is null, but as you can see I always avoid this.

However, I use WideCString::from_str in many places and it turned out that one of these uses was the problem for me. The solution I applied was to replace from_str with from_str_truncate. This changed all my failing tests to passes when using widestring 0.5.

No conversion from Utf32String to Vec<char>

Despite what the documentation says, there is no conversion from Utf32String to Vec<char> (but there is the other way around):

This also means that Utf32String is the same representation as a Vec<char>; indeed conversions between the two exist and are simple typecasts.

Add `Display` implementation

It would be useful to be able to print strings directly without converting them to a String, thereby avoiding the allocation.

While trying to implement this locally, I noticed that implementing Display automatically implies a ToString implementation, which conflicts with the current to_string functions as they return an error when encountering invalid code units, whereas my Display implementation performs the conversion lossily. Therefore, implementing Display would be a breaking change.

Solutions I could think of:

  • Rename the conversion methods
    Pro: makes displaying strings convenient
    Con: breaks the API and does not follow the naming guidelines
  • Add a display method similar to Path::display
    Pro: allows configuring different replacement options, like escaping invalid units
    Con: is not as straightforward and complicates usage when using proc-macros like thiserror

Provide a `Pattern`-like API

For Rust &str's, there is the standard library, there's the (still unstable) std::str::pattern::Pattern trait. It's used for methods like contains, starts_with, or matches.

This trait should be ported to work on types from this library (ideally, the trait from the standard library should allow different strings).

#1 requested string matching functions for the types exposed by this crate - Pattern would allow this by providing a std-like API.

Widestring literals

There exist quite a handy crate: https://docs.rs/wchar. It provides wch_c! macro, which converts a sting literal to &'static [u16] at compile time.

It would be great to have a similar macro in widestring-rs, so you could define &'static U16CStr constants.

Feature request: insert / insert_str

It would be nice if the UtfNString types had insert and insert_str methods like String, for inserting a character or string (slice) at certain position.

impl `TryFrom` for fallible string conversions

As the name suggests, this would remove some boilerplate in my public code. Specifically, I need TryFrom<OsString> for a U16CString although I don't see why this shouldn't be implemented for all fallible types.

Won't build for 1.26.0

As of my PR #12, the library won't compile for Rust 1.26.0.
The options are to either stop supporting this version (which is pretty old already), which means we should bump the version as well as it is a breaking change, or to fix it.

I could try fixing it if 1.26.0 support is mandatory, but I don't see why it should be. I'd like some input on this.

Feature request: more string manipulation functions

It would really help if some of these features existed on WideStr and WideCStr:

  • Slicing/indexing (more than just RangeFull)
  • Mutations (remove, pop, trim, etc.)
  • String matching (starts_with, ends_with, contains, find, matches, etc.)
  • String splitting
  • String replacing
  • Iteration (iterate by u16, iterate by (char | malformed utf16), iterate with indices)

It would also be nice to better document the behaviour of to_os_string/from_str on non-windows platforms where there's no canonical 1:1 relationship between an OsString and a WideString.

UCStr::from_ptr_with_nul & inconsistent documentation + behaviors around nulls

The documentation here for UCStr::from_ptr_with_nul is a little self-contradictory - "Safety" claims the pointer mustn't be null, but then "Panics" documents safe behavior (panicing) on null, but then the implementation doesn't check for null and may or may hit debug-only asserts inside of std:

/// `p` must be non-null, even for zero `len`.

/// This function panics if `p` is null or if a nul value is not found at offset `len` of `p`.

widestring-rs/src/ucstr.rs

Lines 148 to 152 in e7236b6

pub unsafe fn from_ptr_with_nul<'a>(p: *const C, len: usize) -> &'a Self {
assert!(*p.add(len) == UChar::NUL);
let ptr: *const [C] = slice::from_raw_parts(p, len + 1);
&*(ptr as *const UCStr<C>)
}

Additionally, while UCStr::from_ptr_with_nul doesn't scan for nuls, it appears UCString::from_ptr_with_nul does (and will truncate) - and also handles the len=0 ptr=null case without panicing at all:

pub unsafe fn from_ptr_with_nul(
p: *const u16,
len: usize,
) -> Result<Self, MissingNulError<u16>> {
if len == 0 {
return Ok(UCString::default());
}
assert!(!p.is_null());
let slice = slice::from_raw_parts(p, len);
UCString::from_vec_with_nul(slice)
}

calls:

pub fn from_vec_with_nul(v: impl Into<Vec<C>>) -> Result<Self, MissingNulError<C>> {
let mut v = v.into();
// Check for nul vals
match v.iter().position(|&val| val == UChar::NUL) {
None => Err(MissingNulError { inner: Some(v) }),
Some(pos) => {
v.truncate(pos + 1);
Ok(unsafe { UCString::from_vec_with_nul_unchecked(v) })
}
}
}

Should I create a PR for UCStr to try and return &[UChar::NUL] for the length 0 case? (Maybe UCStr can gain a Default impl?)
Should it scan/truncate too (would change the result of str.len())? Perhaps matching function signatures?

The inconsistent "Safety" vs "Panics" documentation crops up in multiple places - should I try and drop this text from all the "Safety" sections where p is already null-checked soundly and documented to be null-checked under "Panics"?

p must be non-null.

Why is WideCString::from_str_unchecked unsafe?

Hi,

Great work on widestring stuff!

I wonder why WideCString::from_str_unchecked is marked unsafe given that it accepts anything that can be converted to an OsStr?

It sort of marks a lot of higher level stuff that I have as unsafe too but passing a String and converting it eventually into u16 slice seems to be entirely safe. Am I doing something wrong maybe? I only want to convert a T: AsRef<OsStr> to a const* u16 and pass it to WinAPI function.

I realize that any nul values in my string may cause the underlying value to be seen differently by C environment but that does not seem to be unsafe until Rust manages the underlying memory and I dont't think WideCString is giving it away anyway.

Cheers,
Andrej

`U16String::pop_char` panic with surrogate string

#[test]
fn truncated_with_surrogate() {
    // Character U+24B62, encoded as D852 DF62 in UTF16
    let buf= "𤭢";
    let mut s = widestring::U16String::from_str(buf);
    s.pop_char();
}

output:

thread 'windows::mount::tests::truncated_with_surrogate' panicked at C:\Users\gbleu\.cargo\registry\src\index.crates.io-6f17d22bba15001f\widestring-1.0.2\src\ustring.rs:1286:42:
index out of bounds: the len is 1 but the index is 1
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\std\src\panicking.rs:645
   1: core::panicking::panic_fmt
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\panicking.rs:72
   2: core::panicking::panic_bounds_check
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\panicking.rs:190
   3: core::slice::index::impl$2::index<u16>
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\slice\index.rs:258
   4: alloc::vec::impl$12::index<u16,usize,alloc::alloc::Global>
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\alloc\src\vec\mod.rs:2732
   5: widestring::ustring::U16String::pop_char
             at C:\Users\gbleu\.cargo\registry\src\index.crates.io-6f17d22bba15001f\widestring-1.0.2\src\ustring.rs:1286
   6: libparsec_platform_mountpoint::windows::mount::tests::truncated_with_surrogate
             at .\tests\unit\windows_volume_label.rs:40
   7: libparsec_platform_mountpoint::windows::mount::tests::truncated_with_surrogate::closure$0
             at .\tests\unit\windows_volume_label.rs:35
   8: core::ops::function::FnOnce::call_once<libparsec_platform_mountpoint::windows::mount::tests::truncated_with_surrogate::closure_env$0,tuple$<> >
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\ops\function.rs:250
   9: core::ops::function::FnOnce::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\ops\function.rs:250
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

   Canceling due to test failure: 0 tests still running

I guess this is due to a off-by-one issue here:

let high = self.inner[self.len()];

(should be self.inner[self.len() - 1] )

Remove implementation based on generics

Currently UStr/UCStr/UString/UCString are generic over u16/u32, to share common implementation details. This leads to both confusing docs and an inability to add const functions, among other problems. So these details should be removed and simply have U16/U32 strings be entirely separate types, using macros where possible to reduce code duplication.

0.4.1 fails to build on Debian (rustc 1.34.2)

0.4.0 was fine, but 0.4.1 breaks on 1.34.2:

    Compiling widestring v0.4.1
error[E0658]: use of unstable library feature 'alloc': this library is unlikely to be stabilized in its current form or name (see issue #27783)
   --> /builds/inliniac/suricata-ci/suricata/rust/vendor/widestring/src/lib.rs:195:1
    |
195 | extern crate alloc;
    | ^^^^^^^^^^^^^^^^^^^

We use widestring in Suricata. We use a minimum of rustc 1.34.2 as this is what Debian stable uses and we want to make sure Debian can (continue to) package Suricata.

Add a way to create U32/U16 string from number

I'm using widestring to implement R7RS/R6RS Scheme VM and to convert number to string I have first to use to_string that creates String and after this I use U32String::from_str which does 2 heap allocations.

library uses u16 rather than wchar_t

I was trying to use this library with a C library that uses wchar_t* strings in its API. Unfortunately widestring decided to use u16 as its “wide character” type, while wchar_t is a 32-bit type on Linux.
Any reason why widestring can't just use wchar_t as its character type? IMO that would be the sensible thing to do…

Create an empty `U16CString`

It is possible to create an empty U16String using U16String::new(), although it is impossible to create an empty U16CString. It looks like the ::new() method on a U16CString has been deprecated and so I could guess that you've been reluctant to introduce this function until you've removed the deprecated one (or never)? Is there any reason why we can't create an empty U16CString using some convenience method?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.