Git Product home page Git Product logo

tiny-regex-c's Introduction

CI

tiny-regex-c

A small regex implementation in C

Description

Small and portable Regular Expression (regex) library written in C.

Design is inspired by Rob Pike's regex-code for the book "Beautiful Code" available online here.

Supports a subset of the syntax and semantics of the Python standard library implementation (the re-module).

I will gladly accept patches correcting bugs.

Design goals

The main design goal of this library is to be small, correct, self contained and use few resources while retaining acceptable performance and feature completeness. Clarity of the code is also highly valued.

Notable features and omissions

  • Small code and binary size: 500 SLOC, ~3kb binary for x86. Statically #define'd memory usage / allocation.
  • No use of dynamic memory allocation (i.e. no calls to malloc / free).
  • To avoid call-stack exhaustion, iterative searching is preferred over recursive by default (can be changed with a pre-processor flag).
  • No support for capturing groups or named capture: (^P<name>group) etc.
  • Thorough testing : exrex is used to randomly generate test-cases from regex patterns, which are fed into the regex code for verification. Try make test to generate a few thousand tests cases yourself.
  • Verification-harness for KLEE Symbolic Execution Engine, see formal verification.md.
  • Provides character length of matches.
  • Compiled for x86 using GCC 7.2.0 and optimizing for size, the binary takes up ~2-3kb code space and allocates ~0.5kb RAM :
    > gcc -Os -c re.c
    > size re.o
        text     data     bss     dec     hex filename
        2404        0     304    2708     a94 re.o
        
    

API

This is the public / exported API:

/* Typedef'd pointer to hide implementation details. */
typedef struct regex_t* re_t;

/* Compiles regex string pattern to a regex_t-array. */
re_t re_compile(const char* pattern);

/* Finds matches of the compiled pattern inside text. */
int  re_matchp(re_t pattern, const char* text, int* matchlength);

/* Finds matches of pattern inside text (compiles first automatically). */
int  re_match(const char* pattern, const char* text, int* matchlength);

Supported regex-operators

The following features / regex-operators are supported by this library.

NOTE: inverted character classes are buggy - see the test harness for concrete examples.

  • . Dot, matches any character
  • ^ Start anchor, matches beginning of string
  • $ End anchor, matches end of string
  • * Asterisk, match zero or more (greedy)
  • + Plus, match one or more (greedy)
  • ? Question, match zero or one (non-greedy)
  • [abc] Character class, match if one of {'a', 'b', 'c'}
  • [^abc] Inverted class, match if NOT one of {'a', 'b', 'c'}
  • [a-zA-Z] Character ranges, the character set of the ranges { a-z | A-Z }
  • \s Whitespace, \t \f \r \n \v and spaces
  • \S Non-whitespace
  • \w Alphanumeric, [a-zA-Z0-9_]
  • \W Non-alphanumeric
  • \d Digits, [0-9]
  • \D Non-digits

Usage

Compile a regex from ASCII-string (char-array) to a custom pattern structure using re_compile().

Search a text-string for a regex and get an index into the string, using re_match() or re_matchp().

The returned index points to the first place in the string, where the regex pattern matches.

The integer pointer passed will hold the length of the match.

If the regular expression doesn't match, the matching function returns an index of -1 to indicate failure.

Examples

Example of usage:

/* Standard int to hold length of match */
int match_length;

/* Standard null-terminated C-string to search: */
const char* string_to_search = "ahem.. 'hello world !' ..";

/* Compile a simple regular expression using character classes, meta-char and greedy + non-greedy quantifiers: */
re_t pattern = re_compile("[Hh]ello [Ww]orld\\s*[!]?");

/* Check if the regex matches the text: */
int match_idx = re_matchp(pattern, string_to_search, &match_length);
if (match_idx != -1)
{
  printf("match at idx %i, %i chars long.\n", match_idx, match_length);
}

For more usage examples I encourage you to look at the code in the tests-folder.

TODO

  • Fix the implementation of inverted character classes.
  • Fix implementation of branches (|), and see if that can lead us closer to groups as well, e.g. (a|b)+.
  • Add example.c that demonstrates usage.
  • Add tests/test_perf.c for performance and time measurements.
  • Testing: Improve pattern rejection testing.

FAQ

  • Q: What differentiates this library from other C regex implementations?

    A: Well, the small size for one. 500 lines of C-code compiling to 2-3kb ROM, using very little RAM.

License

All material in this repository is in the public domain.

tiny-regex-c's People

Contributors

ahmetkosker avatar ayangd avatar cvengler avatar firasuke avatar jwerle avatar kokke avatar monolifed avatar pyrmont avatar roflcopter4 avatar smlavine avatar termosintez avatar theo-dep avatar toriningengames avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tiny-regex-c's Issues

Capture group

Hi,

I am looking for a C source code for regular expression that supports finding matchings substrings for capture groups. I seems that tiny-regex-c doesn't support it... I wonder if you happen to know any other C libraries that support this feature. Thanks for your help.

Compilation using VS2010

Hi,

Thank you for your code, it resolves an issue to use regex with gcc in very big string.
Visual Studio 2010 needs perfect C code, and it is throw this error :

re.c(250) : error C2143: syntax error : missing ';' before 'type'
re.c(251) : error C2143: syntax error : missing ';' before 'type'

To resolve this issue, only int i; and char c; need to be moved in top of the function void re_print(regex_t* pattern).

Problem into ".?" matching.

matchpattern(pattern, text);

Hello kokke! You have very good parser. But some improvement will be good inserting.
So
Checked string : "real_bar" with regexp ".?ba." - you have match into 0 index (correct result 4 index)
Checked string : "real_foo" with regexp ".?ba.
" - you have match into 0 index (correct result -1 (or nothing match)

If necessary - I can give you an example of a solution

Error on isdigit, isalpha, isspace with char > 127

The is*** functions to check if a char is a digit or space fail if you pass a char that's above 127.

https://en.cppreference.com/w/cpp/string/byte/isalpha

Depending on your compiler you might get an error. You can fix this by casting the value to unsigned char.
Try and match this "Çüéâ" as ASCII.

static int matchdigit(char c)
{
  return isdigit((unsigned char)c);
}
static int matchalpha(char c)
{
  return isalpha((unsigned char)c);
}
static int matchwhitespace(char c)
{
  return isspace((unsigned char)c);
}

Match of `\\s*` on empty string reports `pos == -1`, i.e., no match, while it should report `pos == 0` and `matchlength == 0`

Consider this program, which tries to match zero or more space characters in a string of length zero:

// $ cat empty-str.c
#include "re.h"
#include "assert.h"

int main() {
    int matchlength = 0;
    int pos = re_match("\\s*", "", &matchlength);
    assert(pos != -1);
    return 0;
}

When compiled with re.c and executed, the assertion fails:

$ gcc empty-str.c re.c
$ ./a.out
a.out: empty-str.c:7: main: Assertion `pos != -1' failed.
Aborted (core dumped)

It is an interesting question about what the result of a successful match for an empty string should be. My take on it is that it should not report a failure for matching (i.e. the return value must not be negative), but the length of the match should be zero.

Could not search for '.'

re_compile("[A-Z]:-?\d+.\d+");
or
re_compile("[A-Z]:-?\d+[.]\d+");

Can't distinguish between X:12.34 and X:1234.

Migrate all test scripts to Python v3

Hello,

Currently, test scripts uses syntax and libraries from Python 2 but most GNU/Linux
distributions has remove Python 2 from their repository, making it harder
to get a Py2 binary unless compiling it from source. It'd be nice if you or
an individual migrate them to use Python 3. I'm more than happy to help if needed!

Thanks!

Typo bug in re.c

At line 288 of re.c, there is a '==' operator between the tests for c == 'S' and c == 'w' that I think should be a '||'.

need help in finding one of the two strings.

I am having trouble finding the pattern to mach one of the two strings from a larger string.
re_t match_pattern = re_compile("I|Me");

The code above doesn't seem to work, what should I do?

Understand metacharacters and escape character

In most languages, the escape character is '\'. Sequences start with an escape character have special meaning, eg: '\n', '\t'.

Metacharacters are escape characters, but often match a group of characters, eg: '\s' matches any whitespace like '\t', '\n'.

So, when we are matching single comment lines in C, we need to use:

\/\/[^\n]*,

instead of

\/\/[^\\n]*

which won't match a line break but a "\n" string.

I ported this to rust

Hi,

Just thought I'd let you know that I ported this to rust for fun after seeing it on lobste.rs. The repo is here. There are a few small changes in internals, but it's a reasonably direct port.

Thanks for making this cool little library :)

Cannot match end-of-pattern match

Hi and thanks for an amazing library - I'm developing a Fortran port of it here.

I've just found an issue/potential bug with the end-pattern command.

text = "table football"
pattern = "l$"

returns index=13, matchlength=3

Here's the sample test program:

#include <stdio.h>
#include <stdlib.h>
#include "re.h"
int main() {

   const char *text = "table football";
   const char *pattern = "l$";
   int index,len;

   index = re_match(pattern, text, &len);

   printf("index=%d len=%d \n",index,len);
}

Tested on Mac with clang 13.1.6

Feature Requests and Minor Code Comments

Hello. First thank you for this library. It has been very helpful. Two small comments/questions on the code. Note I am still learning C.

First here I think ccl_bufidx could be initialized to 1, otherwise the first slot in the array is unused. I might be missing something though.

Second, I don't understand why there is a union in the struct regex_t. I think with padding it doesn't actually save any space for most architectures. However it might be possible to keep an index for the array instead of the pointer to the position in the array, which could be a small data type, possible even just a char.

Finally two features would help me, and perhaps many others also. First would be case insensitive matching. The second would be to have dynamically allocated patterns, which would allow more than one pattern to be used in a program. I realize either of those may not be in keeping with "tiny", so they may not be appropriate here.

Again thank you very much for this useful code.

regex compilation is making things slower?

After my experiment to change to a "flat memory layout" here (#62), I had a suspicion that the regex compilation may not be improving performance by much. It took a while but I re-implemented everything without compilation in this commit marler8997@8311dc8. Turns out that without regex compilation, the implementation is around 300 SLOC (instead of 500) and about 15% faster on my machine.

But hold on, it probably wasn't a fair race. My commit also has some other changes that help out performance (like #63). If all things were equal, the compiled version still might be faster in some cases, I'm not actually sure. It does seem clear that both the compiled and non-compiled versions perform very similarly, but I think each one will be better in certain environments , on certain machines or with different workloads.

That being said, I feel like the change aligns well with the goals of this project, the primary one in the name "tiny-regex-c" which is now even "tiny-er". The other nice benefit is not needing to allocate anything for the regex objects nor needing static memory for them. (makes #3 a non issue).

I also noticed that while removing compilation, I also happened to fix a few bugs (see some of them below). I also added makefile targets for each test along with a comprehensive test that runs all the targets from a clean repo (because if you don't write a test for it, then it will break).

I'm unsure whether @kokke will want to take this commit as it's such a huge fundamental change, but I figure I'd let you know the implementation is here, feel free to take all or none of the idea into this repository. I'm also happy to submit PR's for individual changes from this idea as I'm sure you'll not want to take everything all at once (I know I wouldn't). However, right now I've got 7 PRs in the queue so I'll hold off on making any more until some of those ones are settled.

Summary of related issues

fixes #3
fixes #40
fixes part (1) of #53
invalidates #59
fixes #66
fixes #69

Memory Corruption Vulnerabilities

This project has memory safety vulnerabilities due to off-by-1 pointer management. For example, running: "tiny-regex-c.exe [" produces the following output:
type: CHAR_CLASS []
type: CHAR '²'
type: CHAR '²'
type: CHAR '²'
type: CHAR '²'
type: CHAR 'á'
type: CHAR 'á'
type: CHAR 'á'
type: CHAR 'á'
type: CHAR 'á'
type: CHAR 'á'
type: CHAR 'á'
type: CHAR 'á'

This test case is reading past the end of the character buffer until the next NULL terminator is found. I identified issues in both re_print and re_compile in the above testcase. Vulnerabilities such as this can allow attackers to exploit systems or projects that embed this code.

Match question mark fails?

If my pattern is X?Y and my text is 'Z' I would expect this to not match, but it appears that it is matching? I think the logic in matchquestion is wrong. Once the two tests in that routine fail, we pass through to a result of 1, which becomes the result of the overall matchpattern. The remainder of the pattern isn't ever consulted.

Also, and this may be related, the pattern indices in the recursive and iterative matchpattern routines don't match. I'm not saying that they should, just noting their differences.

Software vulnerabilities detected during code analysis with ESBMC-WR tool

Hello,

One potential software vulnerability was found in code.
To identify this kind of vulnerabilities I used ESBMC-WR tool: https://github.com/thalestas/esbmc-wr
We detect it during code exploitation for RUFUS tool.
A bug was reported at this time and the developer of RUFUS tool requested to solve this potential issue directly here in mainline code.

Bug reported at RUFUS repository code: pbatard/rufus#1856

More about the tool: https://arxiv.org/pdf/2102.02368.pdf

Expected behavior:

Our main objective was to check memory safety
properties (e.g., pointer dereference and memory leaks) while
performing the verification code.

Please, check the logs of analysis:

Issue
[FILE] src/re.c

State 5 file re.c line 269 function re_print thread 0
Violated property:
file re.c line 269 function re_print
array bounds violated: array `types' upper bound
(signed long int)(pattern + (signed long int)i)->type < 17

Python version shenanigans

This is about the tiniest issue ever, but I figured I'd ask anyway. Many platforms (certainly at least ArchLinux and Gentoo) these days use python3 as the standard python implementation. The scripts included are all definitely python2 code and thus the tests fail immediately. It took approximately 3 seconds to type $ perl -pi -e 's/python/python2/g' Makefile scripts/*.py to fix it completely, but it might be nice if you'd incorporate this into the source itself.

Again, not really much of an issue, but worth pointing out anyway.

matchlen may be wrong

int match_len;
char* s = "aa";
char* p = ".*a.*a";
int match_idx = re_match(p, s, &match_len);

expected: match_len == 2
result: match_len == 3

re_matchp succeeds only for the first string compared to a pattern

re_t myRe = re_compile("^[^=]+=[^=]+$");
int matchLen;
re_matchp(myRe, "operationName=PublishEditMessageClients", &matchLen); (= > -1)
re_matchp(myRe, "operationName=DeleteMessageClients", &matchLen); (= < 0)
re_matchp(myRe, "operationName=PublishEditMessageClients", &matchLen); (= > -1)
re_matchp(myRe, "operationName=PublishDirectMessageClients", &matchLen); (= < 0)

As you can see, only the first string matched against the pattern yields a successful match. Any other string thereafter yields failure. Why is this?

Bug in matchpattern

In the matchpattern function you pass &pattern[2] to matchStar and matchPlus. This should be &pattern[1].

End of match

The re_match() routine returns the first character of the match. Is there an easy way to return the last matching character of the string? I'm trying to copy out only the resulting matching string but can't think of an easy way.

Great library by the way!

Can I set the number of characters for numbers?

Can i set the number of characters for numbers (from 1 to 2), for example for this format "00:00:00"?
The following pattern does not work: "\d\d?:\d\d?:\d\d?". Where could I make a mistake?

Does not work for GUUID validation

Hello, it seems that it does not work for GUUID validation

const char* pattern = "^([0-9A-Fa-f]{8}[-][0-9A-Fa-f]{4}[-][0-9A-Fa-f]{4}[-][0-9A-Fa-f]{4}[-][0-9A-Fa-f]{12})$";
int m = re_match(pattern, "CA761232-ED42-11CE-BACD-00AA0057B223");

This works with online tools https://regex101.com/r/hD8sJ8/3

Thank you

Add Precompiled Regex Code Generator

I decided to use this library for a scripting language I'm working on. Here's a link to the directory that uses it if you're interested: https://github.com/stitchlang/stitch/tree/fedf09a6a522e8e48c963075c84aa44cfc74a951/src

One thing I wanted was to "compile" my regular expressions at compile time rather than runtime. Not only is this more performant, but it means I can have as many as I want (see #3) and I don't need to allocate dynamic memory for them.

To solve this, I first noted that the regex_t data-structure is very simple:

typedef struct regex_t
{
  unsigned char  type;   /* CHAR, STAR, etc.                      */
  union
  {
    unsigned char  ch;   /*      the character itself             */
    unsigned char* ccl;  /*  OR  a pointer to characters in class */
  } u;
} regex_t;

I was able to write a function that takes a regex_t object and generates "C initializtion" code to "recreate itself". This function was easy to write becuase re.c already has a function named re_print that does something very similar. Here's what it looks like:

#include <re.h>
static regex_t INLINE_WHITESPACE[] = {
    { .type = 2 }, // BEGIN
    { .type = 8, { .ccl = (unsigned char*)" \t" } }, // CHAR_CLASS
    { .type = 6 }, // PLUS
    { .type = 0 }, // UNUSED
};
static regex_t USER_ID[] = {
    { .type = 2 }, // BEGIN
    { .type = 7, { .ch = '$' } }, // CHAR
    { .type = 8, { .ccl = (unsigned char*)"a-zA-Z0-9_\\." } }, // CHAR_CLASS
    { .type = 6 }, // PLUS
    { .type = 7, { .ch = '$' } }, // CHAR
    { .type = 4 }, // QUESTIONMARK
    { .type = 0 }, // UNUSED
};
static regex_t NEWLINE[] = {
    { .type = 2 }, // BEGIN
    { .type = 7, { .ch = '\n' } }, // CHAR
    { .type = 0 }, // UNUSED
};
static regex_t QUOTED_STRING[] = {
    { .type = 2 }, // BEGIN
    { .type = 7, { .ch = '"' } }, // CHAR
    { .type = 9, { .ccl = (unsigned char*)"\"" } }, // INV_CHAR_CLASS
    { .type = 5 }, // STAR
    { .type = 7, { .ch = '"' } }, // CHAR
    { .type = 0 }, // UNUSED
};
static regex_t COMMENT[] = {
    { .type = 2 }, // BEGIN
    { .type = 7, { .ch = '#' } }, // CHAR
    { .type = 9, { .ccl = (unsigned char*)"\n" } }, // INV_CHAR_CLASS
    { .type = 5 }, // STAR
    { .type = 0 }, // UNUSED
};
static regex_t OPEN_PAREN[] = {
    { .type = 2 }, // BEGIN
    { .type = 7, { .ch = '(' } }, // CHAR
    { .type = 0 }, // UNUSED
};
static regex_t CLOSE_PAREN[] = {
    { .type = 2 }, // BEGIN
    { .type = 7, { .ch = ')' } }, // CHAR
    { .type = 0 }, // UNUSED
};
static regex_t ESCAPE_SEQUENCE[] = {
    { .type = 2 }, // BEGIN
    { .type = 7, { .ch = '@' } }, // CHAR
    { .type = 8, { .ccl = (unsigned char*)"@#$\")(" } }, // CHAR_CLASS
    { .type = 0 }, // UNUSED
};

With the C initialization code, I can compile my regex objects directly into my final binary. Now I don't need to call re_compile at runtime. I can have as many regular expressions as I want and there's no need for dynamic memory.

Note that one big piece to this puzzle was exposing the regex_t definition to the user by moving it to the public header file re.h. Without this, the user would not be able to initalize the objects at compile-time.

For reference here's the function I wrote to generate this initialization code (https://github.com/stitchlang/stitch/blob/fedf09a6a522e8e48c963075c84aa44cfc74a951/src/tokens/compiler.c#L21). As you can see it's quite trivial. If a similar function were added to the library, even in a separate file like cgenerator.c then others would easily be able to do this as well.

Two minor overflows in re_compile

I found two minor overflows in re_compile:

First overflow

The first one is on line 121:

while (pattern[i] != '\0' && (j+1 < MAX_REGEXP_OBJECTS))

The bug is triggered with the following buffer input to the function: "\\\x01[^\\\xff][^\x00"

The overflow happens because of the [^ characters in the buffer, in particular if execution enters the condition:

case '[':

Then the following code will execute:

tiny-regex-c/re.c

Lines 179 to 190 in 9d46276

if (pattern[i+1] == '^')
{
re_compiled[j].type = INV_CHAR_CLASS;
i += 1; /* Increment i to avoid including '^' in the char-buffer */
}
else
{
re_compiled[j].type = CHAR_CLASS;
}
/* Copy characters inside [..] to buffer */
while ( (pattern[++i] != ']')

where i is increased on line 181 and 190, and finally before the loop continues i will be increased again:

tiny-regex-c/re.c

Lines 226 to 229 in 9d46276

}
i += 1;
j += 1;
}

Which causes the overflow to happen on

while (pattern[i] != '\0' && (j+1 < MAX_REGEXP_OBJECTS))

since i will now be 1-byte off in the buffer.

Second overflow

The second overflow happens on line 190:

while ( (pattern[++i] != ']')

This bug happens in case the following buffer is given to the function: "\\\x01[^\\\xff][\\\x00"

The bug happens in the code:

tiny-regex-c/re.c

Lines 173 to 192 in 9d46276

case '[':
{
/* Remember where the char-buffer starts. */
int buf_begin = ccl_bufidx;
/* Look-ahead to determine if negated */
if (pattern[i+1] == '^')
{
re_compiled[j].type = INV_CHAR_CLASS;
i += 1; /* Increment i to avoid including '^' in the char-buffer */
}
else
{
re_compiled[j].type = CHAR_CLASS;
}
/* Copy characters inside [..] to buffer */
while ( (pattern[++i] != ']')
&& (pattern[i] != '\0')) /* Missing ] */
{

This happens because of the code:

tiny-regex-c/re.c

Lines 190 to 201 in 9d46276

while ( (pattern[++i] != ']')
&& (pattern[i] != '\0')) /* Missing ] */
{
if (pattern[i] == '\\')
{
if (ccl_bufidx >= MAX_CHAR_CLASS_LEN - 1)
{
//fputs("exceeded internal buffer!\n", stderr);
return 0;
}
ccl_buf[ccl_bufidx++] = pattern[i++];
}

Where i is increased in size on lines: 190 and also line 200. Because i is increased on line 200 in case of a \ character, i will be one out of bounds on line 190 when the while loop enters the second iteration.

Python random tests are failing but CI is passing

The regex_test.py and regex_test_neg.py scripts are failing but they don't return a non-zero exit code. This causes the Makefile to continue on like nothing went wrong thinking that everything passed. This can be fixed by adding the following lines to the end of the regex python scripts:

if nfails != 0:
    sys.exit(1)

The problem is that this will cause the CI to fail because the tests currently aren't passing, see alll the "FAIL" tests in this run: https://github.com/kokke/tiny-regex-c/runs/1939275804?check_suite_focus=true

[..snip..]
pattern '[1-5-]+[-1-2]-[-]':         FAIL : doesn't match '-5--34522125-3131---' as expected [0x2d, 0x35, 0x2d, 0x2d, 0x33, 0x34, 0x35, 0x32, 0x32, 0x31, 0x32, 0x35, 0x2d, 0x33, 0x31, 0x33, 0x31, 0x2d, 0x2d, 0x2d].
    FAIL : doesn't match '31433--3-33325121--' as expected [0x33, 0x31, 0x34, 0x33, 0x33, 0x2d, 0x2d, 0x33, 0x2d, 0x33, 0x33, 0x33, 0x32, 0x35, 0x31, 0x32, 0x31, 0x2d, 0x2d].
    FAIL : doesn't match '12-32342324113--12--' as expected [0x31, 0x32, 0x2d, 0x33, 0x32, 0x33, 0x34, 0x32, 0x33, 0x32, 0x34, 0x31, 0x31, 0x33, 0x2d, 0x2d, 0x31, 0x32, 0x2d, 0x2d].
    FAIL : doesn't match '455354554121-4515---' as expected [0x34, 0x35, 0x35, 0x33, 0x35, 0x34, 0x35, 0x35, 0x34, 0x31, 0x32, 0x31, 0x2d, 0x34, 0x35, 0x31, 0x35, 0x2d, 0x2d, 0x2d].
    FAIL : doesn't match '---5551121353354--2--' as expected [0x2d, 0x2d, 0x2d, 0x35, 0x35, 0x35, 0x31, 0x31, 0x32, 0x31, 0x33, 0x35, 0x33, 0x33, 0x35, 0x34, 0x2d, 0x2d, 0x32, 0x2d, 0x2d].
    FAIL : doesn't match '215535-35551-442--' as expected [0x32, 0x31, 0x35, 0x35, 0x33, 0x35, 0x2d, 0x33, 0x35, 0x35, 0x35, 0x31, 0x2d, 0x34, 0x34, 0x32, 0x2d, 0x2d].
    FAIL : doesn't match '5-342342452523-52-41--' as expected [0x35, 0x2d, 0x33, 0x34, 0x32, 0x33, 0x34, 0x32, 0x34, 0x35, 0x32, 0x35, 0x32, 0x33, 0x2d, 0x35, 0x32, 0x2d, 0x34, 0x31, 0x2d, 0x2d].
    FAIL : doesn't match '55515455412-1--' as expected [0x35, 0x35, 0x35, 0x31, 0x35, 0x34, 0x35, 0x35, 0x34, 0x31, 0x32, 0x2d, 0x31, 0x2d, 0x2d].
    FAIL : doesn't match '2-3---' as expected [0x32, 0x2d, 0x33, 0x2d, 0x2d, 0x2d].
    FAIL : doesn't match '444-2-3412--45433-1--' as expected [0x34, 0x34, 0x34, 0x2d, 0x32, 0x2d, 0x33, 0x34, 0x31, 0x32, 0x2d, 0x2d, 0x34, 0x35, 0x34, 0x33, 0x33, 0x2d, 0x31, 0x2d, 0x2d].
[..snip..]

Wrong match length?

Hi,
I ran into an issue in the following scenario:
pattern = re_compile("{\\w+}");
match_pos = re_matchp(pattern, "{{FW}_TEST", &match_length);

The result is that match_pos==1 (correct), but match_length==5 (incorrect).
I was expecting to get match_pos==0 and match_length==5 --OR-- match_pos=1 and match_length==4.
Tnx
Yair

Prevent duplicate include errors

re.h file
The header and the tail of the file are added separately。

#ifndef _TINY_REGEX_C
#define _TINY_REGEX_C

......................
...................
....................

#endif

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.