Git Product home page Git Product logo

sgregex's Introduction

SGRegex [SRX] regular expression library v1.2

Simple regular expression library for C (ANSI characters only, limited feature set)

Usage:

  • add sgregex.h and sgregex.c to your project

The library supports:

  • . (dot)
  • *, +, ? (simple quantifiers)
  • *?, +?, ?? (lazy quantifiers)
  • {<num>}, {<num1>,<num2>} (complex quantifiers)
  • [...], [^...] (character classes)
  • (...) (subexpressions/capture ranges)
  • | (the "or" operator)
  • ^, $ (beginning/end matchers)
  • modifier m - multiline
  • modifier i - case insensitive matcing
  • modifier s - dot includes newlines

Change log:

  • 1.2 - partial rewrite to fix engine design issues

API documentation:

srx_Create

	const rxChar* regex, // the regular expression
	const rxChar* mods // modifier char list
  • creates a regular expression matcher from the specified expression and modifier list
  • returns the regular expression matcher ("context")

srx_CreateExt

	const rxChar* regex, // the regular expression
	const rxChar* mods, // modifier char list (optional)
	int* errnpos, // pointer to an array of *two* int values: error code and error position (optional)
	srx_MemFunc memfn, // memory allocation function (optional)
	void* memctx // user pointer to pass to the allocation function (optional)
  • creates a regular expression matcher from the specified expression and modifier list
  • allows to specify custom memory allocation and error output
  • returns the regular expression matcher ("context")

srx_Destroy

	srx_Context* R // the regex matcher context
  • destroys the created matcher object

srx_DumpToFile

	srx_Context* R, // the regex matcher context
	FILE* fp // the file to dump the structure
  • dumps the structure of the context to file

srx_DumpToStdout

	srx_Context* R // the regex matcher context
  • dumps the structure of the context to standard output

srx_Match

	srx_Context* R, // the regex matcher context
	const rxChar* str, // the string to use for matching
	int offset // the starting point for matching
  • searches for a match through the string
  • offset is not "approached safely" (with a loop to check for a NUL-byte)
  • returns whether a match was found

srx_MatchExt

	srx_Context* R, // the regex matcher context
	const rxChar* str, // the string to use for matching
	size_t size, // length of the string
	size_t offset // the starting point for matching
  • searches for a match through the string
  • string does not need to be null-terminated, size must be passed to size argument
  • offset is not "approached safely" (with a loop to check for a NUL-byte)
  • returns whether a match was found

srx_GetCaptureCount

	srx_Context* R // the regex matcher context
  • returns the number of capture ranges that were found in the expression (there's always at least one - the whole match)
  • the current upper limit is 10 (including the whole match)

srx_GetCaptured

	srx_Context* R, // the regex matcher context
	int which, // the capture range number
	int* pbeg, // pointer to output for start offset (optional)
	int* pend // pointer to output for end offset (optional)
  • retrieves the offsets from the specified capture range
  • returns if the capture range number is in range and if the last match included the capture range (which also means that data was written to specified pointers)

srx_GetCapturedPtrs

	srx_Context* R, // the regex matcher context
	int which, // the capture range number
	const rxChar** pbeg, // pointer to output for start offset pointer (optional)
	const rxChar** pend // pointer to output for end offset pointer (optional)
  • retrieves the offset pointers from the specified capture range
  • returns if the capture range number is in range and if the last match included the capture range (which also means that data was written to specified pointers)

srx_Replace

	srx_Context* R, // the regex matcher contex
	const rxChar* str, // the input string
	const rxChar* rep // the replacement string (supports capture ranges in the form of "\1")
  • replaces occurrences of pattern in string str with string rep, returns the replaced string
  • the returned string is allocated with the registered allocator

srx_ReplaceExt

	srx_Context* R, // the regex matcher contex
	const rxChar* str, // the input string
	size_t strsize, // the length of the input string
	const rxChar* rep, // the replacement string (supports capture ranges in the form of "\1")
	size_t repsize, // the length of the replacement string
	size_t* outsize // pointer to output for length of returned string (optional)
  • replaces occurrences of pattern in string str with string rep, returns the replaced string
  • the returned string is allocated with the registered allocator
  • none of the strings involved need to be null-terminated

srx_FreeReplaced

	srx_Context* R,
	RX_Char* repstr
  • frees the string returned by srx_Replace

This library was created by Arvīds Kokins (snake5)

sgregex's People

Contributors

archo5 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

skyformat99 phaag

sgregex's Issues

regex for "US currency" test case

I'm testing a few "real world" regex's from regexlib.com.

This "US currency" regex does not match "$1,234,567.89" but it should.
^$([0-9]{1,3}(,[0-9]{3})*|([0-9]+))(.[0-9]{2})?$

It works on these:

Matches
$0.84 | $123458

Non-Matches
$12,3456.01 | 12345 | $1.234

regex for "dutch phone numbers" test case

I'm testing a few "real world" regex's from regexlib.com.

This "dutch-style phone number" regex does not match "023-5256677" but it should.

(^+[0-9]{2}|^+[0-9]{2}(0)|^(+[0-9]{2})(0)|^00[0-9]{2}|^0)([0-9]{9}$|[0-9-\s]{10}$)

It works on these:

Matches
+31235256677 | +31(0)235256677

Non-Matches
+3123525667788999 | 3123525667788 | 232-2566778

Crash without group parentheses,

Hi,
I am just evaluating a small C regex library and found your project. Thanks for that project!

However, I have a problem, that the matching process crashes:

Assertion failed: (R->captures[ 0 ][1] != RX_NULL_OFFSET), function srx_MatchExt, file sgregex.c, line 1218.
' like 'GET|POST'+Abort

You can reproduce this, if you add the following line in sgregex_test.c in the MATCHTEST section:

MATCHTEST("GET /index.html HTTP/1.1\r\n", "GET|POST", 1);

If you add group parentheses, it works:

MATCHTEST("GET /index.html HTTP/1.1\r\n", "(GET|POST)", 1);

Although "GET|POST" is a valid regex, it crashes.

Thanks for your support.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.