Git Product home page Git Product logo

regex's Introduction

regex

regex is a regular expression toolkit for regex-base with:

  • a text-replacement toolkit with type-safe text-replacement templates;
  • special datatypes for matches and captures;
  • compile-time checking of RE syntax;
  • a unified means of controlling case-sensitivity and multi-line options;
  • high-level AWK-like tools for building text processing apps;
  • the option of using match operators with reduced polymorphism on the text and result types;
  • regular expression macros including:
    • a number of useful RE macros;
    • a test bench for testing and documenting new macro environments;
  • built-in support for the TDFA and PCRE back ends;
  • comprehensive documentation, tutorials and copious examples.

See the About page for details.

regex and regex-examples

The library and tutorial, tests and examples have been split across two packages:

  • the regex package contains the regex library with the Posix TDFA back end
  • the regex-with-pcre library package contains the extra modules needed for the PCRE back end
  • the regex-examples package contains the tutorial, tests and example programs.

Road Map

See the Roadmap page for details.

The regex blog

Check out the regex blog for news articles and discussion concerning all things regex.

Build Status

Hackage BSD3 License Un*x build Windows build Coverage

See build status page for details.

Installing the Package

The package can be easily installed with cabal or stack on GHC-8.0, 7.10 or 7.8 for the above platforms. See the Installation page for details.

The Tutorial Tests and Examples

See the Tutorial page and Examples page for details.

Helping Out

If you have any feedback or suggestion then please drop us a line.

The Contact page has more details.

The API

The Haddocks can be found at http://hs.regex.uk.

The Macro Tables

The macro environments are an important part of the package and are documented here.

The regex.uk Directory

A handy overview of the regex.uk domain can be found here.

The Changelog

The changelog is posted here.

The Authors

This library was written and is currently maintained by Chris Dornan aka @cdornan

regex's People

Contributors

adinapoli avatar bergmark avatar cdornan avatar dependabot[bot] avatar elland avatar etherz10 avatar hs-viktor avatar hvr avatar josephcsible avatar saurabhnanda avatar t-c-k avatar vaibhavsagar avatar wizek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

regex's Issues

Minimise dependencies on library target

Any tests relying on external packages like tasty and smallcheck should not be placed in the library but in re-tests so the library doesn't pick up false testing dependencies.

Add escaping functions

Each backend should provide a function that will 'escape' strings to produce REs that will match those strings.

I don't think this is provided by regex-base so we will have to add functions in Text.RE.TDFA.RE and Text.RE.PCRE.RE to do this.

(Thanks to @ezyang for the suggestion.)

Complete the Tutorial, Tests and Examples

They are basically done but we need:

  • better presentation of the tutorial GHCI tryouts with the expected results shown explicitly;

  • add literate programme commentary for all of the .lhs modules in the library and examples (see #8).

Finish tidying up API

  • Minimise the Text.RE module to include just abstracted Match and Matches;
  • Fill out the API modules with the core toolkit for matching, searching and replacing, compiling REs, etc.;
  • Add the Text.RE.Summa module for collecting together all of the assets that don't belong to the back ends;
  • Add the Text.RE.Types collecting all of the Types modules.

Convert the library scanners to use Alex

Using regex for the scanners is fine on prototyping principles but we should review them with a view to rewriting in Alex.

(The tutorial collects all of these examples together.)

Complete the web site

Most of the pages are presentable but:

  • The About page needs to say more about the rationale;

  • The Contact page needs to say more about contributing to the project.

  • The Tutorial page provide a little more context.

Review Matches/Match/Capture Context

See #3

The Matches, Match and Capture types that are generated by the regex match operators, and passed into the replacement functions, keep the original matched text available so that:

  • Matches, Match, Capture are always understood to have the original search text, containing within them the full context of the match, which is highly convenient in general, but in particular,
  • the regex replacement functions always have access to the full context all the way up to the original search string.

This works very well in general for scripting applications that will in general be processing small-scale texts and for line-oriented applications that are matching (short) lines of text.

It may not work for applications that are processing large text files en-bloc.

To fix #3 we should probably be using special-purpose data types that may not be so convenient to use in general applications.

Add Static Website

The default GitHub themes are too horrible — was hoping to use them as a stop-gap but they aren't fit for that even that (in the better candidates, the fork-me-on-GitHub device is far too loud).

We just need a template and stylesheet -- otherwise, it is plain GitHub pages.

Add version scripts

Unfortunately, the badge that we include in the Cabal tarball will necessarily be using an outdated SVG for the Hackage button. We need to generate our own SVG.

May as well setup the version in the file while we are at it.

Make `regex` compatible w/ TH-less GHCs

I think regex can be made to avoid relying on TemplateHaskell+QuasiQuotes for recent GHC versions which provide the TemplateHaskellQuotes extension, which would have the benefit that GHCs which don't have interpreter support would be able to compile regex, and also the TemplateHaskellQuotes extension is considered "safe" under SafeHaskell, whereas TemplateHaskell is "unsafe".

There are currently 3 modules which rely on TemplateHaskell,

  • Text/RE/Options.lhs
  • Text/RE/TDFA/RE.hs
  • Text/RE/Internal/NamedCaptures.lhs

The first two are trivial to make THQ-compatible; the 3rd one however makes use of heredocs, thereby actually executes TH code:

import           Text.Heredoc

scan :: String -> [Token]
scan = alex' match al oops
  where
    al :: [(Regex,Match String->Maybe Token)]
    al =
      [ mk [here|\$\{([^{}]+)\}\(|] $         ECap . Just . x_1
      , mk [here|\$\(|]             $ const $ ECap Nothing
      , mk [here|\(\?:|]            $ const   PGrp
      , mk [here|\(\?|]             $ const   PCap
      , mk [here|\(|]               $ const   Bra
      , mk [here|\\(.)|]            $         BS    . s2c . x_1
      , mk [here|(.)|]              $         Other . s2c . x_1
      ]

would it be possible to avoid using heredocs and thus avoid having to execute TH code?

Fix release script

  • re-gen-cabal sdist shpould commit the Hackage release tar archives
    before generating the tags

  • add re-gen-cabal bump-version (alias for re-prep bump-version)

Separate out PCRE sub-package

ghcjs generally wants native Haskell packages, so we will separate out the PCRE api into a separate regex-with-pcre package.

Text.RE.PCRE.Text[.Lazy]

It would be great if we could add these but it will need some co-ordination with the regex-pcre maintainers, so it is going into the v2.0.0.0 milestone.

Constrain the type of the ed quasi quoters

The ed quasi quoter exported from Text.RE.TDFA.<t> should be of type

SeachReplace RE <t>

not

IsRegex RE s => SearchReplace RE s

as it is the case at the moment.

The Text.RE.TDFA and Text.RE.PCRE are currently doing the right thing of course (which could require FlexibleContexts but these modules are not recommended for simple usage).

Remove QQ from code coverage stats

This contains TemplateHaskell code that hpc can't measure so it is skewing the coverage stats. This should be noted in a new section on the build-status page.

Split into regex and regex-examples

The contents of the examples directory (i.e., the tutorial/tests/examples) should be split off into regex-examples leaving regex with just the dependencies needed for the library.

Better package organisation

We should:

  • move all regex types modules under Text.RE.Types;

  • move Parsers into TestBench.Parsers;

  • move Edit into Tools.edit;

  • cut down what we export from Text.RE:

    • do not export Options_, only SimpleOptions;

    • do not export Testbench;

    • do not export the Tools.

Fix coveralls

Coveralls was broken by 751198d, the last line of .travis.yml needing to be updated with the new targets.

Revise the README

This is the home page/README for the web site, GitHub and Hackage -- it needs to be right!

It needs to be concise with links out to the relevant website pages.

Tidy up the API

This is a follow-up to the recent re-organization.

We want to:

  • collect together the exports of the Tools modules into a single RE.Text.Tools module;

  • export the Parsers module from the TestBench module;

  • export the reSource, compileRegex, compileRegexWith and escape functions from the API modules instead of the RE modules.

As these technically break the API we need another minor version bump.

Windows failure on re-gen-modules-test (mega-regex)

re-gen-modules-test is failing on Windows in-place testing (on AppVeyor) with:

re-gen-modules-test.exe: src/Text/RE/TDFA/ByteString/Lazy.hs: openBinaryFile: does not exist (No such file or directory)

Looks like Windows git does not support symbolic links.

Make it fast

Apart from using fast backends very little effort has been applied to making the package efficient on the grounds that:

  1. we want to get it right before making it fast and

  2. the primary motivation is to make RE-based scripting in Haskell more attractive and many of those applications typically aren't performance sensitive (as the filters in the package used to process the literate Haskell programmes and generate the API modules are not performance sensitive).

As the dude says, if you need high-performance filters you should probably be writing them by hand — at least until this this issue has been fixed!

Fix template replace ordinals

Two problems:

  • we do not allow ${5} or ${42} to reference numbered captures;

  • we interpret $11 and $123 as captures.

Obvious fix:

  • captures can be referenced ${10}, etc.

  • $11 to be interpreted as ${1}1

Rename Options, Context and replace Methods

Some types are probably best renamed to make it clear they belong to regex:

  • Options -> REOptions
  • Context -> REContext

and the Replace should have the E suffix replaced with an R.

re-gen-modules-test failing

It is failing with:

Text/RE/TDFA/ByteString/Lazy.hs: openBinaryFile: does not exist (No such file or directory)

To work from a Hackage tarball (as distinct from a cloned repository) the src modules must locate the Haskell source modules under src.

Complete the Haddocks

The Haddocks are lacking introductory material and probably basic documentation in places.

Remove the captures from the TDFA macros

We want to do this in the same way it was done for the PCRE macros -- by using (?: ... ) for grouping. For this regex-tdfa will have to be extended to support pure grouping.

Fix and extend Replace class

The regexSource method of Replace needs to generate the text type of the class to be usable.

The reverse operations for compiling into the RE would also be useful.

Type safe replacement templates

Adam Bergmark asked on Haskell Cafe:

Have you considered doing anything fancy to make capture groups safer to use? If i could get a compile error when i'm using the wrong number/wrongly named groups I'd be very excited.

Inter-operation of =~/=~~ and named captures

This

replaceAll "${d}/${m}/${y}" $ src *=~ [re|${y}([0-9]{4})-${m}([0-9]{2})-${d}([0-9]{2})|]

could be accidentally written as

replaceAll "${d}/${m}/${y}" $ src =~ [re|${y}([0-9]{4})-${m}([0-9]{2})-${d}([0-9]{2})|]

and it would pass the type checker, but behave differently.

The named captures were designed to work with the new operators, which can easily preserve the capture names in the definite result type — not so easy in the case of =~.

I can see three options for the 1.0.0.0 release:

  1. leave everything as it is;

  2. remove the old =~ and =~~ operators from the API;

  3. fix them up so that when they yield Match or Matches results they preserve the capture names.

The question answers it self I think — we should do 3 of course.

Fix AppVeyor badge

The AppVeyor badge is pointing to the wrong account — I only realise now that the build has started spontaneously failing (see #79).

Sort out repl story

The current instructions in index.md for loading the tutorial into ghci with cabal repl are incorrect.

We need to fix those and add stack instructions.

Generalise sed'

There is no reason why sed' can't be completely generalised — except that we don't have a linesE method for Replace, which is easily fixable.

Any overlap between our named captures and PCRE

Evan Laforge expressed concern in the Haskell cafe about worried about 'any deviation from "standard" PCRE'. Of course anyone can just decline to use the non-standard construct, so that leaves us with:

  • a way of disabling the non-standard extensions to ensure they don't creep
    into a code base (which seems a bit OTT);

  • ensuring that they don't interfere with any PCRE RE notation.

My understanding is that regex named captures will not interfere with any PCRE extensions, but it would be nice to get a second opinion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.