Comments (19)
- Outside character classes, characters that must be escaped to be used literally are: ^$.?*+()[|
Yes. - In character classes, characters that must be escaped to be used literally are: ^-]
plus [ - But in some other implementation (Atom) that use Oniguruma, [ can be used unescaped in classes but ] can't.
No.
I don't know what syntax Atom uses, but I suppose that is ONIG_SYNTAX_DEFAULT.
[ in char-class means start of nested char-class. You must escape [ as normal character.
You must escape ] in char-class unless it appears at the top of char-class.
static void warn_func(const char* s)
{
fprintf(stderr, "WARN: %s\n", s);
}
extern int main(int argc, char* argv[])
{
onig_set_warn_func(warn_func);
exec(ONIG_ENCODING_UTF8, ONIG_ENCODING_UTF8, ONIG_OPTION_NONE,
"[]a]", "]");
return 0;
}
(* omit exec() code)
result:
WARN: character class has ']' without escape: /[]a]/
match at 0 (UTF-8)
0: (0-1)
from oniguruma.
✨ Thanks very much for the clarification! ✨
https://github.com/kkos/oniguruma/wiki/Characters-That-Must-Be-Escaped-to-Be-Used-Literally Did I miss anything?
And could you keep this open just in case I have further questions? :))
from oniguruma.
Thank you.
The content is correct.
from oniguruma.
Isn't \w
equivalent to [0-9A-Z_a-z]
? I thought it was, but I found it can match much much more than that.
from oniguruma.
\w is equivalent to [0-9A-Z_a-z] if you are using ASCII encoding.
\w matches many code points in Unicode(UTF-8, UTF-16, UTF-32) encoding.
Unicode word code points data is defined at CR_Word[] in src/unicode_property_data.c
from oniguruma.
So in UTF-8 encoding, \w
can match the characters that are mapped to the code points defined in CR_Word[]
, right?
from oniguruma.
Yes. And 654 is the number of code ranges.
from oniguruma.
Thanks for your replies. They really helped me a lot!
So similarly, in UTF-8 encoding, \d
can match stuff specified in CR_Digit[]
, right? And can I assume [0-9_A-Za-z]
is "faster" than \w
in UTF-8?
Do you mind if I help edit the doc of RE? It would just be some small refinements to make it easier to understand.
from oniguruma.
I do not know the difference of the speed�.
Because single byte code range and multi byte code range are separated in compiled regexp code, it may be such not different.
I have add you as collaborator.
Please edit files in doc.
from oniguruma.
And please edit develop branch not master branch.
from oniguruma.
👌 😉
from oniguruma.
About \G
, does it mean "where the current match attempt begins", i.e., either \A
or "where the last match left off"?
from oniguruma.
onig_search() has search-range argument (start, range) and string argument (str, end).
Search-range is matching start position range.
\G mean start position of search-range.
\A mean start position of the string.
In most cases, search-range and string are same value, then \G == \A.
from oniguruma.
Tried digging around the definition of onig_search
, I could understand only a part of it.
I was trying to expand on/clarify some definitions in doc/RE, one of them is of \G
:
\G matching start position
\G
is useful in a regexp applied to the same string more than once. To define it from the user's standpoint. The first time the regexp applied, it matches the beginning of string; later it matches where the last match ends, right? So I wanna change the definition to something like
beginning of the current search attempt
Does it look correct to you?
from oniguruma.
Yes. You are right.
from oniguruma.
I was wondering, apart from that ? I mean, Unicode properties are so much more powerful, why do we still need the POSIX notation?[:...:]
is only available in character class, what really is different between Unicode properties and the POSIX notation, why do we need them both
from oniguruma.
Thank you for improved document.
POSIX bracket is poor than Unicode property.
It is not necessary anymore now.
But it was included in GNU regex library.
My first goal was to make GNU regex compatible library, and thereafter I was introduced character property function etc.. from Perl.
from oniguruma.
👌 No problem, and I can get to know more about how Oniguruma processes regexps along the way. 😌 So are you planning to deprecate POSIX brackets?
New question: in doc/RE,
In the back reference by the multiplex definition name,
What is a multiplex definition name?
Mind if I email you with the questions instead?
from oniguruma.
I will remove POSIX bracket from version 7.0 if it is removed in Perl6.
But I don't know it is removed or not.
You can assign one name to the more than two groups.
from oniguruma.
Related Issues (20)
- pkgconfig/oniguruma.pc: Incorrect `libdir` path when built with prefix HOT 5
- Potential null pointer dereference in regparse.c
- [[:punct:]] isn't matching all expected symbols HOT 6
- Checking for Whole Text Matches HOT 2
- [6.9.4] build failure on armhf HOT 1
- Documentation of onig_name_to_group_numbers is incorrect HOT 1
- ( /sample/callout.c ) Callouts of contents in if-then-else HOT 2
- 0-infinite quantifier inside lookbehind causes catastrophic backtracking HOT 5
- Internal `^` and `$` don't match as characters for GREP and POSIX_BASIC syntax HOT 4
- add oniguruma-devel to centos8.0+ HOT 2
- Dynamic library generated failed to support "(?-i)" HOT 3
- oniguruma not able to installed on PPC64le architecture HOT 3
- `.{0,99}` and `.*` behave differently on short input HOT 2
- mbc_enc_len(const UChar* p) to be improved HOT 4
- Does Oniguruma support POSIX collating symbols or equivalence classes? HOT 1
- New version?🤔 HOT 1
- Does "retry-limit-in-match over" mean NOT MATCH? HOT 13
- ONIG_SYN_CONTEXT_INDEP_REPEAT_OPS not working for ^* pattern HOT 1
- Literal escaped braces HOT 2
- what is escape code for `.` (literal dot)? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from oniguruma.