burntsushi / memchr Goto Github PK
View Code? Open in Web Editor NEWOptimized string search routines for Rust.
License: The Unlicense
Optimized string search routines for Rust.
License: The Unlicense
error[E0428]: the name `imp` is defined multiple times
--> .cargo/registry/src/github.com-1ecc6299db9ec823/memchr-2.3.1/src/lib.rs:148:5
|
139 | fn imp(n1: u8, haystack: &[u8]) -> Option<usize> {
| ------------------------------------------------ previous definition of the value `imp` here
...
148 | fn imp(n1: u8, haystack: &[u8]) -> Option<usize> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `imp` redefined here
|
= note: `imp` must be defined only once in the value namespace of this block
This isn't quite a feature request at this time, just a question at this point:
Is providing memchr4
considered to be within scope for this crate? What about analogs of the byte searching functions that search up to 4 u16
values in a &[u16]
?
(Context: Escaping a string according to HTML.)
What about up to 5? (Context: the Data state of the HTML parsing algorithm taking into account CR and LF for end-of-line handling.)
Memchr 2.2 and up don't compile on the x86_64-unknown-uefi
target using the latest nightly, currently rustc 1.47.0-nightly (e15510ca3 2020-08-20)
, with "LLVM ERROR: Do not know how to split this operator's operand!"
memchr 2.1 and below compile, however.
Related:
Issue #57, but that was a year ago and with a custom target, and this is an upstream one? Sorry if duplicate idk where else this should be reported,
I've spent the evening thinking about this, and I think it's possible to do and match glibc's performance, but I haven't experimented yet. I'm going to take a crack at this soon.
Could be something like this:
trait CharPred: CharPredSecret {}
struct CharEq(u8); impl CharPred for CharEq {}
struct CharLtSigned(i8); impl CharPred for CharLtSigned {}
struct CharLtUnsigned(u8); impl CharPred for CharLtSigned {}
struct CharGtSigned(i8); impl CharPred for CharGtSigned {}
struct CharGtUnsigned(u8); impl CharPred for CharGtUnsigned {}
struct CharOr(CharPred, CharPred) {}
impl CharPred for CharOr {}
fn memchr_pred(needle: impl CharPred, haystack: &[u8]) -> Option<usize> { ... }
This could be useful in scenarios like this:
fn need_escape(s: &str) -> bool {
let pred = CharOr(,
CharOr(CharEq(b'"'), CharEq(b'\"')),
CharOr(CharEq(b'\\'), CharLtSigned(32)),
);
memchr::char_pred(pred, s.as_bytes()).is_none()
}
fn c_escape(s: &str) -> String {
if !need_escape(s) { return s.to_owned(); }
// else slow iteration over character
}
// trait not exposed to user
trait CharPredSecret {
fn eval<V: Vector>(self, arg: V) -> V;
}
impl CharPredSecret for CharEq {
fn eval<V: Vector>(self, arg: V) {
// compiler should be smart enough to move splat out of the loop
let c = V::splat(self.0);
V::cmpeq(arg, c)
}
}
ifunc
macro cannot be used in such memchr_pred
function because there are no generic statics, but that is probably OK, it should be at least better that non-SIMD version.
Consider the following code. It's used as a wrapper in one of my projects to avoid bounds checks in safe code.
#[inline]
fn chr(s: &[u8], b: u8) -> Option<usize> {
memchr::memchr(b, s).map(|i| {
if i >= s.len() {
unsafe { core::hint::unreachable_unchecked() }
}
i
})
}
I wonder if it's practical and sound to insert unreachable hints into memchr
crate, so that all its users could get an increase in performance. It's just a tentative suggestion that needs more discussion :)
The crates.io badge in the Readme (https://meritbadge.herokuapp.com/memchr) no longer loads.
This crate provides a small reasonably well defined API and probably won't ever see any breaking changes. While it may still need new additions or improved platform support, I expect it won't require backwards incompatible changes. Therefore, I propose cutting a 1.0 release in the next few weeks.
The crate does not seem to work well with miri at the moment for multiple reasons. One reason is that the crate uses the x86 / C implementations over the fallback implementation, which miri doesn't support. But even when cfg'ing out those implementations via cfg(miri), the fallback implementation uses a lot of bit math on pointers, which miri also doesn't like.
It seems surprising to me that when using the Memchr iterator, the position that is returned when calling .next() is the position of the needle + 1. Looking at the tests of the crate, this is expected behavior. Could you please explain why it is this way? I could then make a PR with additional documentation and an example to show usage of the Memchr (and Memchr2/3) iterator.
memchr
implements a generic SIMD accelerated search that is ideal for implementing something like CheatEngine
where you scan the memory of an executable process to aid reverse engineering. This process involves repeatedly scanning for possibly millions of small values (u16, i32, ...) in the memory of that process. The user might have information ahead of time about the frequency distribution of bytes in the memory being scanned, which may vary wildly between executables. The user might also be able to control the program to ensure certain rare bytes appear in the program's memory at certain times.
There is an issue that prevents memchr
from performing optimally when scanning binary executables - the byte frequency table. The core algorithm is based on detecting rare bytes with specific positions in haystack (the prefilter) and then testing these matches to check if the needle has been found. As mentioned in the incredibly detailed comments, the performance of this algorithm is highly dependent on the byte frequency table used to determine what is a rare byte. While the table that is included in memchr
is optimal for the majority of cases, there are some specific data types that have very different byte frequency distributions, which causes memchr
to perform worse on those inputs than it otherwise might with a different byte frequency table.
To illustrate this point, consider the following byte frequencies (where ideal
is the ideal frequency for an x86 binary):
byte | memchr |
ideal |
---|---|---|
\x00 |
55 | 255 |
\xdd |
255 | 0 |
\x8b |
80 | 186 |
H |
150 | 254 |
Now, consider scanning for the needle H\x00\xdd\x8b
in an x86 binary. memchr
would identify \x00
and \x8b
as the rarest bytes, when they are in fact common bytes. Even if memchr
considered \x00
to be a frequent byte via configuration, it would still choose H
and \x8b
as the rarest bytes, which are both much more common than \xdd
, the only actually rare byte. This would result in a lot of unnecessary false positives, decreasing the throughput. This is a simple case, but it is easy to extend this idea to many other pathological input sequences that defeat the default frequency table, and might also reasonably appear in an executable or be scanned for by a user.
Now consider a haystack that contains HHH\x00\xdd\x8b
. The user might know in advance that searching for HHH\x00
and searching for H\x00\xdd\x8b
will both return a single unique match, the sub-slice that was mentioned earlier (the exact indices are not identical but that is not the point). The user might also know that \xdd
is a very rare byte in their dataset. The user should be able to choose scanning for \xdd
instead of a more common byte to speed up their searches. I cannot imagine how to support something like this without providing the user a mechanism for customizing the byte frequency table.
The proposed solution is to allow the user to specify the byte frequency table at runtime by modifying the memchr::memmem::rarebytes::rank
function. Currently, this function reads from the global byte frequency table.
My first idea was to create an enum that can be provided to a FinderBuilder
and then forwarded to RareNeedleBytes
to choose the table:
enum ByteFrequencies<'a> {
Default,
Custom(&'a [u8; 256]),
}
This enum can be stored in the NeedleInfo
struct and used at runtime to determine which byte frequency table to use. However, this introduces the lifetime 'a, which may or may not be the same as the needle ('n) and haystack ('h) lifetimes that are stored in related structs. Considering lifetime 'a to be separate and different requires the public API of Finder
to be changed to add this lifetime.
I believe that the extra lifetime might make life more difficult for the compiler, which is why I observed a small but noticeable (around 10%) impact on the performance of constructing a Finder
with the default frequency table on my local machine.
Also, by introducing a new member on the struct NeedleInfo
, the size/alignment properties of Finder
, Searcher
and NeedleInfo
changed, which also might be the reason for the performance impact I observed. (if this sounds crazy to anyone I suggest you take a look at the wonderful performance talk by the legend Emery Berger titled 'Performance Matters' for more details https://www.youtube.com/watch?v=r-TLSBdHe1A).
An idea to remove the generic lifetime from ByteFrequencies
:
enum ByteFrequencies {
Default,
Custom(&'static [u8; 256]),
}
However, I believe this static API is logically inconsistent with the FinderBuilder
API. You can construct millions of unique Finder
s at runtime and then discard them later, but the same cannot be said for static arrays.
Also, the user might want to perform analysis of their specific corpus at runtime to generate a specialized byte frequency table (like 'pre-training'). This is a very interesting use case in the context of the analysis of binary executables, as there is a lot of information that can only be obtained at runtime and can be useful in optimizing many kinds of searches. Forcing the user to use a static byte frequency table would necessarily prevent this use-case.
Another idea to remove the generic lifetime and also allow runtime generation of the byte frequency table:
enum ByteFrequencies {
Default,
Custom(Box<[u8; 256]>),
}
However, an issue with this approach is that the ByteFrequencies
enum has a size of 16 bytes which is mostly wasted. Another issue is that it seems that conceptually we should be passing around some kind of reference to a byte table that can be reused, instead of copying the table for each construction, but that ultimately depends on benchmarks. Also, now the standard library and memory allocation are required for an operation that is unrelated to both of those things (Rc
, Arc
and others have similar issues).
I also tried storing the byte table inline, but this had disastrous results on performance. This is probably because this extra storage pushed important members on related structs into new cache lines, which affected subsequent operations on these members.
enum ByteFrequencies {
Default,
Custom([u8; 256]),
}
One thing I have not tried yet but might be interesting is trying to re-organize the members and memory layout of any struct that stores a ByteFrequencies
object. This might allow using an inline byte frequency table for example, but would likely result in breaking changes to the layout of public structs in memchr
. Even just introducing the ByteFrequencies
object already changes the memory layout of certain structs, which I am not sure about whether it is something undesirable or not.
All of this culminated in the pull request I submitted, but I realize now it is better to just lay it all out here and figure out the best path forward together. I appreciate any feedback you may have on these suggestions.
P.S. I think memchr
is an incredible library and the code quality and detail of documentation definitely helped me greatly in understanding the internals and even being able to suggest this in the first place, so kudos.
I have noticed that the SSE2 implementation of memchr in this crate unrolls the loop 4x. Unfortunately, this seems to lead to a significant performance drop on processors on the Zen 1 architecture. Benchmarked on a TR 1950x, I see about 50-60% better performance compared to this crate when avoiding loop unrolling altogether. Below is a benchmark demonstrating this. The memchr implementation in the repository linked below is written in such a way so that you just have to change one constant (UNROLL_SIZE
) to change the amount of loop unrolling that the function uses for the main loop.
https://github.com/redzic/memchr-demonstration
Just clone the repository and run cargo run --release
to run the benchmark.
Increasing UNROLL_SIZE
leads to worse performance on my TR1950x, with 4x unrolling being basically the same speed as this crate which makes sense. However, when using UNROLL_SIZE = 8
, the performance difference between this crate's implementation and the custom implementation spikes again to being about 10% faster than this crate (i.e., this crate's SSE2 memchr is 90.9% as fast).
Would it be possible to tune the unroll factor, or possibly even do something similar to OpenBLAS, so that we query information about the CPU such as cache size or even exact CPU model, and dispatch code accordingly? Perhaps this functionality could be implemented behind some kind of feature flag.
First and foremost, thank you for the crate Andrew.
I’ve been employing it for a while, and I’ve recently made one of the libraries I’m working on no_std
. Coincidentally while doing so, I altered my benchmarking habits, and never got hold of the fact --no-default-features
leads to fallback::memchr
due to std::is_x86_feature_detected
, and only discovered the change via cargo asm
.
What do you think of extending the README‘s no_std
section to mention the otherwise “silent” requirement?
(I am now using std_detect
with an extern no_std
memchr. Considering you work on that too, I assume you’ll want it to be stable before adopting it.)
See rust-bakery/nom#1313.
nom 5.1
depends on `memchr = "^2.0". It has a documented msrv of 1.37.
Since 2.4.0 is supposed to be semver-compatible, cargo will select memchr
2.4.0
, which does not build on 1.37.
I understand that balancing msrv vs semver can be a drag on maintainer productivity, so I have no real expectations here other than documenting that it happened.
I was wondering if it would be possible for this library to also support memrchr?
I can see this might be difficult because memrchr is not a posix-defined functionality.
And another (kinda related) question: Does this crate work on all platforms (especially windows)?
I saw sometime haystack
is in first then needle
is second like in https://docs.rs/memchr/2.4.1/memchr/memmem/fn.find.html. But sometime it's the opposite like https://docs.rs/memchr/2.4.1/memchr/struct.Memchr.html#method.new.
That can lead to make error.
I guess this came up before because it's so obvious: Provide an iterator akin to std::str::lines
, but using memchr
to search for line endings. The impl in core::memchr
came up in my benchmarks and it turns out the code below is 2x to 3x faster than what std::str::lines()
does, both in synthetic and real-life code.
pub fn lines(inp: &[u8]) -> impl Iterator<Item = &[u8]> {
let mut inp = inp;
std::iter::from_fn(move || {
if inp.is_empty() {
return None;
}
let ending = memchr::memchr(b'\n', inp).unwrap_or(inp.len() - 1) + 1;
let (mut line, rest) = inp.split_at(ending);
inp = rest;
if let Some(b'\n') = line.last() {
line = &line[..line.len() - 1];
if let Some(b'\r') = line.last() {
line = &line[..line.len() - 1];
}
}
Some(line)
})
}
pub fn str_lines(inp: &str) -> impl Iterator<Item = &str> {
lines(inp.as_bytes()).map(|sl| unsafe { std::str::from_utf8_unchecked(sl) })
}
Is the performance difference (on my machine!) reason enough to include it in memchr
? Feel free if so :-)
error: failed to run custom build command for `memchr v2.4.1`
Caused by:
could not execute process `/mnt/c/Users/asaff/Documents/Dev/Github/tool/target/debug/build/memchr-b9330b1f01949571/build-script-build` (never executed)
Caused by:
No such file or directory (os error 2)
config:
[build]
rustflags = ["-C", "link-arg=-nostdlib", "-C", "link-arg=-static", "-C", "relocation-model=pic"]
When compiling only with link-arg=-nostdlib
I get the follwoing crash:
error: failed to run custom build command for `memchr v2.4.1`
Caused by:
process didn't exit successfully: `/mnt/c/Users/asaff/Documents/Dev/Github/tool/target/debug/build/memchr-b9330b1f01949571/build-script-build` (signal: 11, SIGSEGV: invalid memory reference)
warning: build failed, waiting for other jobs to finish...
error: build failed
I was working on something that can parse ninja.build
files. It used to be twice as slow as ninja
itself, but after replacing just a few lines with memchr
and memchr3
from this crate, it is now almost twice as fast as ninja
.
Thanks! :)
I am on macOS Big Sur 11.1 and just updated to the newest version of XCode Command Line Tools. After I did (and I assume it is something to do with that, because everything with memchr worked fine before that), I got the following error message when I tried to cargo check
anything that had a transitive dependency on memchr.
error: linking with `cc` failed: exit code: 1
|
= note: "cc" "-m64" "-arch" "x86_64" "-L" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.0.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.1.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.10.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.11.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.12.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.13.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.14.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.15.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.2.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.3.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.4.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.5.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.6.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.7.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.8.rcgu.o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.build_script_build.55pqr4uh-cgu.9.rcgu.o" "-o" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/build/memchr-23bc3b7c5319ae9c/build_script_build-23bc3b7c5319ae9c.14i7c91is1si092v.rcgu.o" "-Wl,-dead_strip" "-nodefaultlibs" "-L" "/Users/cadenhaustein/MEGA/Coding_Projects/hedgehog/target/debug/deps" "-L" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libstd-cf45c391193686b0.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libpanic_unwind-bfb82cdc97bd35ea.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libobject-0e543fa90fe41090.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libaddr2line-f50981f4143e4c69.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libgimli-bbe9b2276f9fe948.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/librustc_demangle-c04e87d408a5de4c.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libhashbrown-3865f13d7ece40bb.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/librustc_std_workspace_alloc-83f3487f53b2e684.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libunwind-518f93c579715cca.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcfg_if-ab0ea20e972aeb4f.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/liblibc-50e4694516c58a71.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/liballoc-8171c7b795c55f62.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/librustc_std_workspace_core-8357f853e5f39333.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcore-80c77ff1434731cf.rlib" "/Users/cadenhaustein/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcompiler_builtins-8c8eeab435e54e85.rlib" "-lSystem" "-lresolv" "-lc" "-lm"
= note: xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
error: aborting due to previous error
error: could not compile `memchr`
Any advice on how to fix it?
👋 Hi!
I'm working on docker-activity and memchr
is one of my dependencies.
I'm building docker-activity
with docker buildx in order to have a single image for both platform and I end up having a weird behavior and I'm not sure if I open this issue in your repo, or in buildx
or even in qemu
, but I'll try here.
When I build on a real arm64
machine (RPi4), the image buildx perfectly but when I use docker buildx build --platform linux/arm64
I end up having this issue due to memchr
apparently.
#18 33.61 Compiling futures-core v0.3.18
#18 35.36 error: could not compile `memchr` due to previous error
#18 35.37 warning: build failed, waiting for other jobs to finish...
#18 43.90 error: linking with `cc` failed: exit status: 1
#18 43.90 |
#18 43.90 = note: "cc" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crt1.o" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crti.o" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crtbegin.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.0.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.1.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.10.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.11.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.12.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.13.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.14.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.15.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.2.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.3.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.4.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.5.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.6.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.7.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.8.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.build_script_build.9009e48d-cgu.9.rcgu.o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d.25mgmn3yz3lxkcgb.rcgu.o" "-Wl,--as-needed" "-L" "/code/target/release/deps" "-L" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib" "-Wl,--start-group" "-Wl,-Bstatic" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libstd-bb69598673ac6378.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libpanic_unwind-347c34ae82bb4da0.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libminiz_oxide-86fc36b502bfb8aa.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libadler-cb14375f652e6e86.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libobject-9e87208331b99476.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libmemchr-ebe0ff89d9e37134.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libaddr2line-b0f16d22595fdd3b.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libgimli-57bd3e568b1b69be.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libstd_detect-d2296608bd767c8a.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/librustc_demangle-e4d26fe9e39d3be6.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libhashbrown-8322f07825c42064.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/librustc_std_workspace_alloc-403fa8d4a1124a0d.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libunwind-29e90d90171d4117.rlib" "-lunwind" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libcfg_if-ed66653f82293f20.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/liblibc-ad350ff50825d4f2.rlib" "-lc" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/liballoc-98d6df8d800ab2ff.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/librustc_std_workspace_core-e0db88e40d9c7e0b.rlib" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libcore-fcedc0d4b8cb02ca.rlib" "-Wl,--end-group" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/libcompiler_builtins-0c2242734ae54219.rlib" "-Wl,-Bdynamic" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-nostartfiles" "-L" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib" "-L" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained" "-o" "/code/target/release/build/futures-core-34f97c22b245c83d/build_script_build-34f97c22b245c83d" "-Wl,--gc-sections" "-static" "-no-pie" "-Wl,-zrelro,-znow" "-nodefaultlibs" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crtend.o" "/usr/local/rustup/toolchains/nightly-aarch64-unknown-linux-musl/lib/rustlib/aarch64-unknown-linux-musl/lib/self-contained/crtn.o"
#18 43.90 = note:
#18 43.90
#18 46.48 error: build failed
------
Dockerfile:16
--------------------
14 | COPY src/exporter /code/src/exporter
15 | COPY src/format /code/src/format
16 | >>> RUN cargo build --release --offline
17 |
18 | FROM alpine
--------------------
error: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c cargo build --release --offline" did not complete successfully: exit code: 101
Do you have any idea where that could come from? If you want, here is the Dockerfile I use.
I'm writing a parser and looking for 4 needles. I tried adjusting the bithacks to analyze 8 bytes at a time since I'm using a 64-bit arch. I anticipated memchr
to be faster but I am finding that is not true. Am I doing something wrong? The benchmarks are the following using rustc 1.31.0-nightly (fc403ad98 2018-09-30)
. I got similar results in both MacOS and Linux.
test tests::bench_hasvalue ... bench: 2 ns/iter (+/- 0)
test tests::bench_memchr ... bench: 13 ns/iter (+/- 1)
The code is the following.
extern crate byteorder;
extern crate memchr;
use self::byteorder::{ByteOrder, NativeEndian};
// Taken from "Determine if a word has a zero byte" at http://graphics.stanford.edu/~seander/bithacks.html
// and adjusted for 64-bits and Rust complaining of overflow.
fn haszero(x: u64) -> bool {
(x.wrapping_sub(0x0101_0101_0101_0101) & !x & 0x8080_8080_8080_8080) != 0
}
// Taken from "Determine if a word has a byte equal to n" at http://graphics.stanford.edu/~seander/bithacks.html
// and adjusted for 64-bits.
fn hasvalue(haystack: &[u8], needles: &[u8]) -> bool {
let x = NativeEndian::read_u64(haystack);
let y = !0 as u64 / 255 as u64;
for c in needles {
if haszero(x ^ (y * u64::from(*c))) {
return true;
}
}
false
}
extern crate test;
#[cfg(test)]
mod tests {
use self::test::Bencher;
use super::*;
#[bench]
fn bench_hasvalue(b: &mut Bencher) {
b.iter(|| {
let haystack = test::black_box(b"01234567");
hasvalue(haystack, &[b'a', b'b', b'c', b'd'])
});
}
#[bench]
fn bench_memchr(b: &mut Bencher) {
b.iter(|| {
let haystack = test::black_box(b"01234567");
memchr::memchr(b'a', haystack).is_none()
&& memchr::memchr3(b'b', b'c', b'd', haystack).is_none()
});
}
}
I think it would be great if someone with access to a windows environment could run benchmarks to revisit the built-in vs this crate's fallback implementation of memchr.
When I wrote the new memmem implementation earlier this year, one thing I did was write the implementation as something that was generic over the vector type:
memchr/src/memmem/genericsimd.rs
Lines 95 to 107 in 186ac04
where an example of it being called, e.g. for AVX2, is:
Line 27 in 186ac04
So basically, the idea here is, you write the nasty SIMD code once, and then write some trivial shims for each target feature you want to support.
The actual use of SIMD in this crate is reasonably simple, so it turns out that the trait defining the API of a vector is quite small:
Lines 21 to 32 in 186ac04
OK, so what's this issue about? I think ideally, we would push the Vector
trait up a level in the module hierarchy, port the existing x86 SIMD memchr implementation to a "generic" version, and then replace the existing implementations with shims that call out to the generic version.
This will hopefully let us easily add a WASM implementation of memchr, but adding other implementations in the future would be good too once more intrinsics (e.g., for ARM) are added to std.
(One wonders whether we should just wait for portable SIMD to land in std, but I don't know when that will happen.)
This could branch out to {memchr
, memchr2
, memchr3
} depending on the first 4 bits of the needle: char
argument. Alas, we have no memchr4
function.
Hello,
While reading through the code I noticed all the #[cfg(...)]
statements, and I was wondering if you knew cfg-if
existed? https://github.com/alexcrichton/cfg-if
This could help clean up the code and make sure that only 1 fn
ever gets compiled in the code.
I'd be willing to implement this.
After upgrading to 2.5.0, I'm getting build errors on Android like this:
[CONTEXT] stderr: error[E0531]: cannot find tuple struct or tuple variant `GenericSIMD128` in this scope
[CONTEXT] --> third-party/rust/vendor/memchr-2.5.0/src/memmem/mod.rs:885:13
[CONTEXT] |
[CONTEXT] 885 | GenericSIMD128(gs) => GenericSIMD128(gs),
[CONTEXT] | ^^^^^^^^^^^^^^ not found in this scope
[CONTEXT]
It appears that the GenericSIMD128 enum variant is defined with cfg target_arg = "x86_64" or memchr_runtime_wasm128, but then it is used in the code without a cfg check limiting it to those platforms, causing it to fail to compile.
[192.168.18.146] out: error[E0428]: the name
imp
is defined multiple times
[192.168.18.146] out: --> /home/aram/.cargo/registry/src/github.com-1ecc6299db9ec823/memchr-2.3.1/src/lib.rs:148:5
[192.168.18.146] out: |
[192.168.18.146] out: 139 | fn imp(n1: u8, haystack: &[u8]) -> Option {
[192.168.18.146] out: | ------------------------------------------------ previous definition of the valueimp
here
[192.168.18.146] out: ...
[192.168.18.146] out: 148 | fn imp(n1: u8, haystack: &[u8]) -> Option {
[192.168.18.146] out: | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^imp
redefined here
[192.168.18.146] out: |
[192.168.18.146] out: = note:imp
must be defined only once in the value namespace of this block
Seems [cfg(all(target_arch = "x86_64", memchr_runtime_simd, not(miri)))] and [cfg(all(memchr_libc, not(all(target_arch = "x86_64", memchr_runtime_simd, miri))))] are not mutually exclusive.
To reproduce the error,
memchr
to the Cargo.toml. For me, it would be nom = "7.0"
and object = "0.26"
cargo build -Z build-std=core,compiler_builtins,alloc -Z build-std-features=compiler-builtins-mem --target x86_64-unknown-linux-gnu
And there'll go thousands of lines of output of cargo with duplicate lang items
or so.
My rust version:
nightly-x86_64-unknown-linux-gnu (default)
rustc 1.56.0-nightly (b03ccace5 2021-08-24)
I ran into this error when trying to add the cstr_core
crate to my no_std OS project, which depends on memchr
and disables its default features.
Building in debug mode:
LLVM ERROR: Do not know how to split this operator's operand!
Building in release mode:
error: Could not compile `memchr`.
Caused by:
process didn't exit successfully: `rustc --crate-name memchr /home/kevin/.cargo/registry/src/github.com-1ecc6299db9ec823/memchr-2.2.1/src/lib.rs --color always --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C codegen-units=1 -C metadata=ebf07e953eacc227 -C extra-filename=-ebf07e953eacc227 --out-dir /home/kevin/my_os/target/x86_64-my_os/release/deps --target x86_64-my_os -L dependency=/home/kevin/my_os/target/x86_64-my_os/release/deps -L dependency=/home/kevin/my_os/target/release/deps --cap-lints allow --emit=obj -C debuginfo=2 -C code-model=large -C relocation-model=static -D unused-must-use -Z merge-functions=disabled -Z share-generics=no --sysroot /home/kevin/.xargo --cfg memchr_runtime_simd --cfg memchr_runtime_sse2 --cfg memchr_runtime_sse42 --cfg memchr_runtime_avx` (signal: 11, SIGSEGV: invalid memory reference)
For reference and in case it matters, my compiler target .json file is:
{
"llvm-target": "x86_64-unknown-none-gnu",
"data-layout": "e-m:e-i64:64-f80:128-n8:16:32:64-S128",
"linker-flavor": "gcc",
"target-endian": "little",
"target-pointer-width": "64",
"target-c-int-width": "32",
"arch": "x86_64",
"os": "none",
"features": "-mmx,-sse,+soft-float",
"disable-redzone": true,
"panic": "abort"
}
My rustc version is rustc 1.38.0-nightly (78ca1bda3 2019-07-08)
.
I've never encountered an error like this before, so I'm not sure what else to say. If more information is needed, I am happy to provide it.
/// optimized routine that can be up to an order of magnitude master in some
should be
/// optimized routine that can be up to an order of magnitude faster in some
I am trying to make an efficient csv parser using memchr.
I want to search all commas when there is a line like the one below.
a1,b1,c1,d2
In this case, I want to use the code below.
let sep_iter = memchr3_iter(col_sep, row_sep, b''', &buffer[..]);
...
loop {
let next_sep_pos_wrap = self.sep_iter.next();
.....
}
If sep_iter.next() is calculated here, it seems that the already calculated result can be used again, but it seems to recalculate from the beginning.
When sep_iter.next() operates for the first time, it seems that the result value is already stored in another bit. But now it seems to only use trailing_zeros.
It would be nice if there were somewhere that listed changes between different releases. Currently it seems the only way to do that is by comparing tags, but that's not such a great experience. I know it's a lot of work to make a log for 20 releases, but I think it's worth having.
Using my site code I attempted to build memchr using cargo build
. The build.rs file fails to run with the following error:
$ cargo build
Compiling proc-macro2 v1.0.24
Compiling libc v0.2.73
Compiling syn v1.0.48
Compiling memchr v2.3.3
Compiling log v0.4.11
Compiling bitflags v1.2.1
Compiling ryu v1.0.5
Compiling serde_derive v1.0.117
error: failed to run custom build command for `memchr v2.3.3`
Caused by:
process didn't exit successfully: `/Users/cadey/Code/site/target/debug/build/memchr-08053740b12295b3/build-script-build` (signal: 9, SIGKILL: kill)
warning: build failed, waiting for other jobs to finish...
error: build failed
I assume this may be a rustc
bug as the error in question is identical to the error you get when you attempt to run an unsigned binary on an M1 Mac. I attempted to sign the binary manually with the codesign
tool and it failed with this error:
$ codesign -s - /Users/cadey/Code/site/target/debug/build/memchr-08053740b12295b3/build-script-build
/Users/cadey/Code/site/target/debug/build/memchr-08053740b12295b3/build-script-build: replacing existing signature
/Users/cadey/Code/site/target/debug/build/memchr-08053740b12295b3/build-script-build: the codesign_allocate helper tool cannot be found or used
Should I file this as a rustc bug?
Not sure if this is an issue with memchr
, nom
, elastic-rs/elastic
(where I am using nom
) or even std
/core
/compiler
but I thought I would start here...
I am getting undefined symbols
ld
errors to various symbols in core
when I enable the memchr/use_std
feature of nom
, see elastic-rs/elastic/pull/389 for a little more background + error logs and this or this Travis build.
I have reproduced it on macOS 10.14 & 10.15 and Ubuntu 19.04 (and for the sake of completeness; various Linux via Docker) with rustc 1.38.0 (625451e37 2019-09-23)
, 1.39.0-beta.6 (224f0bc90 2019-10-15)
, 1.40.0-nightly (4a8c5b20c 2019-10-23)
—and a few other nightlies—and when cross compiling to x86_64-unknown-linux-musl
from macOS and Linux hosts.
If you think this doesn't belong here, please let me know where you think I should file this. I can also upload the current Cargo.lock
if that would help.
PS thanks for all your awesome work—I have been using rg
practically daily for years and love it.
What's the purpose of inking against the standard library? Everything that this crate needs from std
is provided in core
, which in turn is re-exported by std
.
The libc
crate can also be used as a dependency even in a #![no_std]
environment, however this crate only enables it when use_std
is enabled.
For crate authors who want to utilize memchr
and provide #![no_std]
, they're left with the fallback implementation. This is slower on macOS than using libc::memchr
.
Edit: I didn't try testing with enabling the libc
feature by itself. Apologies 😅
The reverse search implementations (memrchr
) seem illegal under stacked borrows. They all follow the same pattern, so here I'll only annotate one. It retrieves a raw pointer to the end of the haystack from a reference to an empty slice, but then uses that pointer to iterate backwards by offsetting it with negative indices. Under strict rules, that pointer would however only be valid for access to the bytes that the reference covered from which it was cast, i.e. a zero-length slice at the end.
To my understanding, this is very likely illegal but not yet caught by MIRI since it does not strictly track the source for raw pointers (^source). @RalfJung might be able to provide more definitive insights.
Relevant code (inserted comments marked as // !
):
pub fn memrchr(n1: u8, haystack: &[u8]) -> Option<usize> {
// [...]
let start_ptr = haystack.as_ptr();
// ! This pointer only covers the same slice that the reference does.
// ! Would need to create these manually from offsetting the start pointer
// ! which covers the whole array.
let end_ptr = haystack[haystack.len()..].as_ptr();
let mut ptr = end_ptr;
unsafe {
// [...]
ptr = (end_ptr as usize & !align) as *const u8;
// [...]
while loop_size == LOOP_SIZE && ptr >= ptr_add(start_ptr, loop_size) {
// [...]
// ! These are outside the bounds of the reference from which ptr was created.
let a = *(ptr_sub(ptr, 2 * USIZE_BYTES) as *const usize);
let b = *(ptr_sub(ptr, 1 * USIZE_BYTES) as *const usize);
// [...]
ptr = ptr_sub(ptr, loop_size);
}
// [...]
}
}
Library code reduced to that version of memrchr.
The fix is simple, create ptr
from manually offsetting haystack.as_ptr()
which is valid for the whole haystack. I also don't expect any miscompilation.
Currently there are only an x86_64-specific implementations using SIMD instructions.
As described in #75 I want to try to port the implementation to aarch64 using the code in /src/x86/sse42.rs as a template and the NEON intrinsics from core::arch::aarch64 (currently nightly only).
I was looking through the code of this crate. I have a need for something like this on a no-std + alloc target, but it seems several features (such as using Cow
from alloc) are missing. That should be possible to support.
Hi @BurntSushi I'm the author of Artichoke Ruby. We met on Twitter.
Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7f18fe26ae4413e95f1e350d77f86b28
The iter_next!
macro hard codes how much to advance the haystack position by.
This means that memchr*_iter
functions on more than one byte incorrectly scan. For example, this code outputs 2 when it should output 1:
extern crate memchr; // 2.2.1
fn main() {
let haystack = b"abcdefghijklmnopqrstuvwxyz";
println!("{}", memchr::memchr2_iter(b'a', b'b', haystack.as_ref()).count());
}
As const evaluation is slowly becoming powerful enough for wide use, memchr
not providing a const version is slowly becoming an issue.
For example, in Amanieu/cstr_core#25 (an embedded version of cstr
), it would be convenient to construct a &str out of const array data taken from C in mixed C-and-Rust environments. The workaround there is to have an own simple-stupid-but-const memchr implementation and dispatch through const_eval_select
between that and an actually runtime-friendly memchr from this crate.
On the long run, it would be great if this crate would just provide its memchr as const -- obviously that's not gonna fly any time soon (especially considering the MSRV), but there could be steps:
memchr_const
function (feel free to take the one from there if you like, it's under a different license but I wrote it and hereby also license it under this crate's license).memchr
under a nightly-only feature gate.The full solution (memchr "just" being const) is likely to be tricky, because while a "regular" memchr can be const on stable in the forseeable future, I don't expect that to happen for vectorized or even libc-calling versions. Thus, once any of these optimizations are on, dispatch between the runtime-and-optimized and the const version will still need to happen, and the only way to do that currently AFAIK is through const_eval_select which there are no plans for stabilization for. This issue could be a use case to start stabilizing const_eval_select, or explore alternative avenues.
is_x86_feature_detected!
resolves to just a return true
when -C target-cpu
or -C target-feature
is set to a value that enables the feature. When using just a simple is_x86_feature_detected!
and -C target-cpu=native
(or whatever), the compiler can inline the function and completely avoid the machinery of the atomic operations and calling a function pointer. However, when using AtomicPtr
, it is impossible for the compiler to inline the function at all.
It would be great if there was some way to automatically disable the runtime feature detection if avx (or whatever the corresponding CPU feature set is) is already enabled at compile-time.
Now simd landed on nightly, do we have any plan to add the simd support?
Just hit this doing various linux-vendor side testing. It's probably not important, but it would be nice to be able to ensure this crate works in this configuration.
cargo +stable test --no-default-features
error[E0433]: failed to resolve: maybe a missing crate `std`?
--> src/tests/mod.rs:1:5
|
1 | use std::iter::repeat;
| ^^^ maybe a missing crate `std`?
error: cannot find macro `eprintln` in this scope
--> src/tests/mod.rs:9:5
|
9 | eprintln!("LITTLE ENDIAN");
| ^^^^^^^^
error[E0433]: failed to resolve: use of undeclared type or module `Vec`
--> src/tests/iter.rs:167:27
|
167 | let mut found_front = Vec::new();
| ^^^ use of undeclared type or module `Vec`
error[E0433]: failed to resolve: use of undeclared type or module `Vec`
--> src/tests/iter.rs:168:26
|
168 | let mut found_back = Vec::new();
| ^^^ use of undeclared type or module `Vec`
error[E0433]: failed to resolve: use of undeclared type or module `Box`
--> src/tests/iter.rs:201:5
|
201 | Box::new(it)
| ^^^ use of undeclared type or module `Box`
error[E0433]: failed to resolve: use of undeclared type or module `Box`
--> src/tests/iter.rs:214:5
|
214 | Box::new(it)
| ^^^ use of undeclared type or module `Box`
error[E0433]: failed to resolve: use of undeclared type or module `Box`
--> src/tests/iter.rs:228:5
|
228 | Box::new(it)
| ^^^ use of undeclared type or module `Box`
error[E0433]: failed to resolve: use of undeclared type or module `Vec`
--> src/tests/mod.rs:20:21
|
20 | let mut tests = Vec::new();
| ^^^ use of undeclared type or module `Vec`
error[E0433]: failed to resolve: use of undeclared type or module `Vec`
--> src/tests/mod.rs:295:24
|
295 | let mut more = Vec::new();
| ^^^ use of undeclared type or module `Vec`
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:52:27
|
52 | needle: u8, data: Vec<u8>, take_side: Vec<bool>
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:52:47
|
52 | needle: u8, data: Vec<u8>, take_side: Vec<bool>
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:66:41
|
66 | needle1: u8, needle2: u8, data: Vec<u8>, take_side: Vec<bool>
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:66:61
|
66 | needle1: u8, needle2: u8, data: Vec<u8>, take_side: Vec<bool>
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:81:15
|
81 | data: Vec<u8>, take_side: Vec<bool>
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:81:35
|
81 | data: Vec<u8>, take_side: Vec<bool>
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:97:30
|
97 | fn qc_memchr1_iter(data: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:103:34
|
103 | fn qc_memchr1_rev_iter(data: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:109:30
|
109 | fn qc_memchr2_iter(data: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:116:34
|
116 | fn qc_memchr2_rev_iter(data: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:123:30
|
123 | fn qc_memchr3_iter(data: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:131:34
|
131 | fn qc_memchr3_rev_iter(data: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:139:40
|
139 | fn qc_memchr1_iter_size_hint(data: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/iter.rs:162:58
|
162 | fn double_ended_take<I, J>(mut iter: I, take_side: J) -> Vec<I::Item>
| ^^^ not found in this scope
error[E0412]: cannot find type `Box` in this scope
--> src/tests/iter.rs:195:6
|
195 | ) -> Box<dyn DoubleEndedIterator<Item = usize> + 'a> {
| ^^^ not found in this scope
error[E0412]: cannot find type `Box` in this scope
--> src/tests/iter.rs:208:6
|
208 | ) -> Box<dyn DoubleEndedIterator<Item = usize> + 'a> {
| ^^^ not found in this scope
error[E0412]: cannot find type `Box` in this scope
--> src/tests/iter.rs:222:6
|
222 | ) -> Box<dyn DoubleEndedIterator<Item = usize> + 'a> {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/memchr.rs:92:49
|
92 | fn qc_memchr1_matches_naive(n1: u8, corpus: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/memchr.rs:98:57
|
98 | fn qc_memchr2_matches_naive(n1: u8, n2: u8, corpus: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/memchr.rs:106:17
|
106 | corpus: Vec<u8>
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/memchr.rs:113:50
|
113 | fn qc_memrchr1_matches_naive(n1: u8, corpus: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/memchr.rs:119:58
|
119 | fn qc_memrchr2_matches_naive(n1: u8, n2: u8, corpus: Vec<u8>) -> bool {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/memchr.rs:127:17
|
127 | corpus: Vec<u8>
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/mod.rs:19:22
|
19 | fn memchr_tests() -> Vec<MemchrTest> {
| ^^^ not found in this scope
error[E0412]: cannot find type `String` in this scope
--> src/tests/mod.rs:144:13
|
144 | corpus: String,
| ^^^^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/mod.rs:152:14
|
152 | needles: Vec<u8>,
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/mod.rs:154:16
|
154 | positions: Vec<usize>,
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/mod.rs:276:26
|
276 | it.collect::<Vec<usize>>(),
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/mod.rs:278:63
|
278 | self.needles.iter().map(|&b| b as char).collect::<Vec<char>>(),
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/mod.rs:294:25
|
294 | fn expand(&self) -> Vec<MemchrTest> {
| ^^^ not found in this scope
error[E0412]: cannot find type `String` in this scope
--> src/tests/mod.rs:300:33
|
300 | let mut new_corpus: String = repeat('%').take(i).collect();
| ^^^^^^ not found in this scope
error[E0425]: cannot find function `repeat` in this scope
--> src/tests/mod.rs:300:42
|
300 | let mut new_corpus: String = repeat('%').take(i).collect();
| ^^^^^^ not found in this scope
|
help: possible candidate is found in another module, you can import it into scope
|
1 | use core::iter::repeat;
|
error[E0412]: cannot find type `String` in this scope
--> src/tests/mod.rs:309:26
|
309 | let padding: String = repeat('%').take(i).collect();
| ^^^^^^ not found in this scope
error[E0425]: cannot find function `repeat` in this scope
--> src/tests/mod.rs:309:35
|
309 | let padding: String = repeat('%').take(i).collect();
| ^^^^^^ not found in this scope
|
help: possible candidate is found in another module, you can import it into scope
|
1 | use core::iter::repeat;
|
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/mod.rs:330:47
|
330 | fn needles(&self, count: usize) -> Option<Vec<u8>> {
| ^^^ not found in this scope
error[E0412]: cannot find type `Vec` in this scope
--> src/tests/mod.rs:348:57
|
348 | fn positions(&self, align: usize, reverse: bool) -> Vec<usize> {
| ^^^ not found in this scope
error[E0599]: no method named `to_string` found for type `&'static str` in the current scope
--> src/tests/mod.rs:28:36
|
28 | corpus: statict.corpus.to_string(),
| ^^^^^^^^^ method not found in `&'static str`
|
= help: items from traits can only be used if the trait is in scope
= note: the following trait is implemented but not in scope; perhaps add a `use` for it:
`use alloc::string::ToString;`
error: aborting due to 46 previous errors
Some errors have detailed explanations: E0412, E0425, E0433, E0599.
For more information about an error, try `rustc --explain E0412`.
error: could not compile `memchr`.
To learn more, run the command again with --verbose.
From rust-lang/rust#73139 (comment):
The index is always less than the length. So even if the length is usize::MAX, the index will be at most MAX - 1 and so cannot overflow.
It would be great to tell rustc this is the case by using rustc_layout_scalar_valid_range_end(usize::MAX - 1)
. That would allow storing Option<index>
in usize
instead of needing an extra bit, which in some cases could double the size of the struct due to alignment requirements.
Failing that (since rustc_layout_scalar_valid_range_end
is unstable and likely will never be stabilized), would it be possible to document that the index is always less than usize::MAX
?
It'd be nice to have a fast implementation that finds the first character that isn't the needle (like find_last_not_of
in C++). This comes up in path parsing, e.g.:
https://github.com/danielpclark/faster_path/blob/master/src/path_parsing.rs#L21-L32
I'm writing a routine that escapes HTML special characters. To do that, I have to search for five different characters (&
, <
, >
, '
, "
) simultaneously. I can do this using two calls to memchr2
or memchr3
, but that doesn't seem elegant. It would be nice if there was a function that could do this search in one go.
From the implementation side: the PCMPESTRI instruction in SSE4.2 supports searching for up to 16 different needles in parallel. It would be nice if we could expose this somehow.
Hi, whenever I try to install or build from source distant the process fails on memchr compilation.
$ cargo build --release
Downloaded anyhow v1.0.44
Downloaded polling v2.1.0
Downloaded tokio-util v0.6.8
Downloaded instant v0.1.11
Downloaded libc v0.2.103
Downloaded openssl-src v111.16.0+1.1.1l
Downloaded blocking v1.0.2
Downloaded proc-macro2 v1.0.29
Downloaded async-process v1.2.0
Downloaded serde_json v1.0.68
Downloaded mio v0.7.13
Downloaded openssl-sys v0.9.67
Downloaded structopt v0.3.23
Downloaded tokio-macros v1.4.1
Downloaded thiserror v1.0.29
Downloaded whoami v1.1.5
Downloaded slab v0.4.4
Downloaded pkg-config v0.3.20
Downloaded thiserror-impl v1.0.29
Downloaded cc v1.0.71
Downloaded tokio v1.12.0
Downloaded syn v1.0.80
Downloaded half v1.7.1
Downloaded structopt-derive v0.4.16
Downloaded zeroize v1.4.2
Downloaded 25 crates (7.1 MB) in 7.25s (largest was `openssl-src` at 5.1 MB)
Compiling winapi v0.3.9
Compiling proc-macro2 v1.0.29
Compiling autocfg v1.0.1
Compiling unicode-xid v0.2.2
Compiling syn v1.0.80
Compiling cfg-if v1.0.0
Compiling memchr v2.4.1
Compiling libc v0.2.103
Compiling futures-core v0.3.17
Compiling cc v1.0.71
Compiling version_check v0.9.3
Compiling pin-project-lite v0.2.7
Compiling log v0.4.14
Compiling futures-io v0.3.17
Compiling once_cell v1.8.0
Compiling vcpkg v0.2.15
Compiling pkg-config v0.3.20
Compiling cache-padded v1.1.1
Compiling typenum v1.14.0
Compiling parking_lot_core v0.8.5
Compiling parking v2.0.0
Compiling fastrand v1.5.0
Compiling slab v0.4.4
Compiling waker-fn v1.1.0
Compiling event-listener v2.5.1
Compiling lazy_static v1.4.0
Compiling scopeguard v1.1.0
Compiling smallvec v1.7.0
Compiling async-task v4.0.3
Compiling proc-macro-hack v0.5.19
Compiling ntapi v0.3.6
Compiling bitflags v1.3.2
Compiling unicode-segmentation v1.8.0
Compiling futures-sink v0.3.17
Compiling atomic-waker v1.0.0
Compiling proc-macro-nested v0.1.7
Compiling serde_derive v1.0.130
Compiling futures-channel v0.3.17
Compiling futures-task v0.3.17
Compiling serde v1.0.130
Compiling anyhow v1.0.44
Compiling bytes v1.1.0
Compiling unicode-width v0.1.9
Compiling regex-syntax v0.6.25
Compiling pin-utils v0.1.0
Compiling ryu v1.0.5
Compiling cpufeatures v0.2.1
Compiling subtle v2.4.1
Compiling serde_json v1.0.68
Compiling zeroize v1.4.2
Compiling camino v1.0.5
Compiling ppv-lite86 v0.2.10
Compiling strsim v0.8.0
Compiling opaque-debug v0.3.0
Compiling regex-automata v0.1.10
Compiling shell-words v1.0.0
Compiling convert_case v0.4.0
Compiling itoa v0.4.8
Compiling half v1.7.1
Compiling base64 v0.13.0
Compiling whoami v1.1.5
Compiling hex v0.4.3
Compiling yansi v0.5.0
Compiling glob v0.3.0
Compiling instant v0.1.11
Compiling getrandom v0.2.3
Compiling futures-macro v0.3.17
error: failed to run custom build command for `memchr v2.4.1`
Caused by:
could not execute process `C:\Users\modzmi01\Documents\projects\2021\distant\target\release\build\memchr-34b704a4017ecdea\build-script-build` (never executed)
Caused by:
Access is denied. (os error 5)
warning: build failed, waiting for other jobs to finish...
error: build failed
$ cargo --version
cargo 1.56.0 (4ed5d137b 2021-10-04)
$ rustc --version
rustc 1.56.1 (59eed8a2a 2021-11-01)
I have tried to run the command in powershell run with admin rights but I get the same error. I don't know any rust, is there something I can do to get more details what command from build.rs is causing the issue?
at the time of building, the antivirus thinks it has found a virus
VHO:Trojan-Banker.Win32.ClipBanker.gen
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.