r1chardj0n3s / parse Goto Github PK

Parse strings using a specification based on the Python format() syntax.

Home Page: http://pypi.python.org/pypi/parse

License: MIT License

Python 100.00%

parse's Introduction

Installation

pip install parse

Usage

Parse strings using a specification based on the Python format() syntax.

parse() is the opposite of format()

The module is set up to only export parse(), search(), findall(), and with_pattern() when import * is used:

>>> from parse import *

From there it's a simple thing to parse a string:

>>> parse("It's {}, I love it!", "It's spam, I love it!")
<Result ('spam',) {}>
>>> _[0]
'spam'

Or to search a string for some pattern:

>>> search('Age: {:d}\n', 'Name: Rufus\nAge: 42\nColor: red\n')
<Result (42,) {}>

Or find all the occurrences of some pattern in a string:

>>> ''.join(r[0] for r in findall(">{}<", "<p>the <b>bold</b> text</p>"))
'the bold text'

If you're going to use the same pattern to match lots of strings you can compile it once:

>>> from parse import compile
>>> p = compile("It's {}, I love it!")
>>> print(p)
<Parser "It's {}, I love it!">
>>> p.parse("It's spam, I love it!")
<Result ('spam',) {}>

("compile" is not exported for import * usage as it would override the built-in compile() function)

The default behaviour is to match strings case insensitively. You may match with case by specifying `case_sensitive=True`:

>>> parse('SPAM', 'spam', case_sensitive=True) is None
True

Format Syntax

A basic version of the Format String Syntax is supported with anonymous (fixed-position), named and formatted fields:

{[field name]:[format spec]}

Field names must be a valid Python identifiers, including dotted names; element indexes imply dictionaries (see below for example).

Numbered fields are also not supported: the result of parsing will include the parsed fields in the order they are parsed.

The conversion of fields to types other than strings is done based on the type in the format specification, which mirrors the format() behaviour. There are no "!" field conversions like format() has.

Some simple parse() format string examples:

>>> parse("Bring me a {}", "Bring me a shrubbery")
<Result ('shrubbery',) {}>
>>> r = parse("The {} who {} {}", "The knights who say Ni!")
>>> print(r)
<Result ('knights', 'say', 'Ni!') {}>
>>> print(r.fixed)
('knights', 'say', 'Ni!')
>>> print(r[0])
knights
>>> print(r[1:])
('say', 'Ni!')
>>> r = parse("Bring out the holy {item}", "Bring out the holy hand grenade")
>>> print(r)
<Result () {'item': 'hand grenade'}>
>>> print(r.named)
{'item': 'hand grenade'}
>>> print(r['item'])
hand grenade
>>> 'item' in r
True

Note that in only works if you have named fields.

Dotted names and indexes are possible with some limits. Only word identifiers are supported (ie. no numeric indexes) and the application must make additional sense of the result:

>>> r = parse("Mmm, {food.type}, I love it!", "Mmm, spam, I love it!")
>>> print(r)
<Result () {'food.type': 'spam'}>
>>> print(r.named)
{'food.type': 'spam'}
>>> print(r['food.type'])
spam
>>> r = parse("My quest is {quest[name]}", "My quest is to seek the holy grail!")
>>> print(r)
<Result () {'quest': {'name': 'to seek the holy grail!'}}>
>>> print(r['quest'])
{'name': 'to seek the holy grail!'}
>>> print(r['quest']['name'])
to seek the holy grail!

If the text you're matching has braces in it you can match those by including a double-brace {{ or }} in your format string, just like format() does.

Format Specification

Most often a straight format-less {} will suffice where a more complex format specification might have been used.

Most of format()'s Format Specification Mini-Language is supported:

[[fill]align][sign][0][width][.precision][type]

The differences between parse() and format() are:

The align operators will cause spaces (or specified fill character) to be stripped from the parsed value. The width is not enforced; it just indicates there may be whitespace or "0"s to strip.
Numeric parsing will automatically handle a "0b", "0o" or "0x" prefix. That is, the "#" format character is handled automatically by d, b, o and x formats. For "d" any will be accepted, but for the others the correct prefix must be present if at all.
Numeric sign is handled automatically. A sign specifier can be given, but has no effect.
The thousands separator is handled automatically if the "n" type is used.
The types supported are a slightly different mix to the format() types. Some format() types come directly over: "d", "n", "%", "f", "e", "b", "o" and "x". In addition some regular expression character group types "D", "w", "W", "s" and "S" are also available.
The "e" and "g" types are case-insensitive so there is not need for the "E" or "G" types. The "e" type handles Fortran formatted numbers (no leading 0 before the decimal point).

Type	Characters Matched	Output
l	Letters (ASCII)	str
w	Letters, numbers and underscore	str
W	Not letters, numbers and underscore	str
s	Whitespace	str
S	Non-whitespace	str
d	Digits (effectively integer numbers)	int
D	Non-digit	str
n	Numbers with thousands separators (, or .)	int
%	Percentage (converted to value/100.0)	float
f	Fixed-point numbers	float
F	Decimal numbers	Decimal
e	Floating-point numbers with exponent e.g. 1.1e-10, NAN (all case insensitive)	float
g	General number format (either d, f or e)	float
b	Binary numbers	int
o	Octal numbers	int
x	Hexadecimal numbers (lower and upper case)	int
ti	ISO 8601 format date/time e.g. 1972-01-20T10:21:36Z ("T" and "Z" optional)	datetime
te	RFC2822 e-mail format date/time e.g. Mon, 20 Jan 1972 10:21:36 +1000	datetime
tg	Global (day/month) format date/time e.g. 20/1/1972 10:21:36 AM +1:00	datetime
ta	US (month/day) format date/time e.g. 1/20/1972 10:21:36 PM +10:30	datetime
tc	ctime() format date/time e.g. Sun Sep 16 01:03:52 1973	datetime
th	HTTP log format date/time e.g. 21/Nov/2011:00:07:11 +0000	datetime
ts	Linux system log format date/time e.g. Nov 9 03:37:44	datetime
tt	Time e.g. 10:21:36 PM -5:30	time

The type can also be a datetime format string, following the 1989 C standard format codes, e.g. %Y-%m-%d. Depending on the directives contained in the format string, parsed output may be an instance of datetime.datetime, datetime.time, or datetime.date.

>>> parse("{:%Y-%m-%d %H:%M:%S}", "2023-11-23 12:56:47")
<Result (datetime.datetime(2023, 11, 23, 12, 56, 47),) {}>
>>> parse("{:%H:%M}", "10:26")
<Result (datetime.time(10, 26),) {}>
>>> parse("{:%Y/%m/%d}", "2023/11/25")
<Result (datetime.date(2023, 11, 25),) {}>

Some examples of typed parsing with None returned if the typing does not match:

>>> parse('Our {:d} {:w} are...', 'Our 3 weapons are...')
<Result (3, 'weapons') {}>
>>> parse('Our {:d} {:w} are...', 'Our three weapons are...')
>>> parse('Meet at {:tg}', 'Meet at 1/2/2011 11:00 PM')
<Result (datetime.datetime(2011, 2, 1, 23, 0),) {}>

And messing about with alignment:

>>> parse('with {:>} herring', 'with     a herring')
<Result ('a',) {}>
>>> parse('spam {:^} spam', 'spam    lovely     spam')
<Result ('lovely',) {}>

Note that the "center" alignment does not test to make sure the value is centered - it just strips leading and trailing whitespace.

Width and precision may be used to restrict the size of matched text from the input. Width specifies a minimum size and precision specifies a maximum. For example:

>>> parse('{:.2}{:.2}', 'look')           # specifying precision
<Result ('lo', 'ok') {}>
>>> parse('{:4}{:4}', 'look at that')     # specifying width
<Result ('look', 'at that') {}>
>>> parse('{:4}{:.4}', 'look at that')    # specifying both
<Result ('look at ', 'that') {}>
>>> parse('{:2d}{:2d}', '0440')           # parsing two contiguous numbers
<Result (4, 40) {}>

Some notes for the special date and time types:

the presence of the time part is optional (including ISO 8601, starting at the "T"). A full datetime object will always be returned; the time will be set to 00:00:00. You may also specify a time without seconds.
when a seconds amount is present in the input fractions will be parsed to give microseconds.
except in ISO 8601 the day and month digits may be 0-padded.
the date separator for the tg and ta formats may be "-" or "/".
named months (abbreviations or full names) may be used in the ta and tg formats in place of numeric months.
as per RFC 2822 the e-mail format may omit the day (and comma), and the seconds but nothing else.
hours greater than 12 will be happily accepted.
the AM/PM are optional, and if PM is found then 12 hours will be added to the datetime object's hours amount - even if the hour is greater than 12 (for consistency.)
in ISO 8601 the "Z" (UTC) timezone part may be a numeric offset
timezones are specified as "+HH:MM" or "-HH:MM". The hour may be one or two digits (0-padded is OK.) Also, the ":" is optional.
the timezone is optional in all except the e-mail format (it defaults to UTC.)
named timezones are not handled yet.

Note: attempting to match too many datetime fields in a single parse() will currently result in a resource allocation issue. A TooManyFields exception will be raised in this instance. The current limit is about 15. It is hoped that this limit will be removed one day.

Result and Match Objects

The result of a parse() and search() operation is either None (no match), a Result instance or a Match instance if evaluate_result is False.

The Result instance has three attributes:

fixed: A tuple of the fixed-position, anonymous fields extracted from the input.
named: A dictionary of the named fields extracted from the input.
spans: A dictionary mapping the names and fixed position indices matched to a 2-tuple slice range of where the match occurred in the input. The span does not include any stripped padding (alignment or width).

The Match instance has one method:

evaluate_result(): Generates and returns a Result instance for this Match object.

Custom Type Conversions

If you wish to have matched fields automatically converted to your own type you may pass in a dictionary of type conversion information to parse() and compile().

The converter will be passed the field string matched. Whatever it returns will be substituted in the Result instance for that field.

Your custom type conversions may override the builtin types if you supply one with the same identifier:

>>> def shouty(string):
...    return string.upper()
...
>>> parse('{:shouty} world', 'hello world', {"shouty": shouty})
<Result ('HELLO',) {}>

If the type converter has the optional pattern attribute, it is used as regular expression for better pattern matching (instead of the default one):

>>> def parse_number(text):
...    return int(text)
>>> parse_number.pattern = r'\d+'
>>> parse('Answer: {number:Number}', 'Answer: 42', {"Number": parse_number})
<Result () {'number': 42}>
>>> _ = parse('Answer: {:Number}', 'Answer: Alice', {"Number": parse_number})
>>> assert _ is None, "MISMATCH"

You can also use the with_pattern(pattern) decorator to add this information to a type converter function:

>>> from parse import with_pattern
>>> @with_pattern(r'\d+')
... def parse_number(text):
...    return int(text)
>>> parse('Answer: {number:Number}', 'Answer: 42', {"Number": parse_number})
<Result () {'number': 42}>

A more complete example of a custom type might be:

>>> yesno_mapping = {
...     "yes":  True,   "no":    False,
...     "on":   True,   "off":   False,
...     "true": True,   "false": False,
... }
>>> @with_pattern(r"|".join(yesno_mapping))
... def parse_yesno(text):
...     return yesno_mapping[text.lower()]

If the type converter pattern uses regex-grouping (with parenthesis), you should indicate this by using the optional regex_group_count parameter in the with_pattern() decorator:

>>> @with_pattern(r'((\d+))', regex_group_count=2)
... def parse_number2(text):
...    return int(text)
>>> parse('Answer: {:Number2} {:Number2}', 'Answer: 42 43', {"Number2": parse_number2})
<Result (42, 43) {}>

Otherwise, this may cause parsing problems with unnamed/fixed parameters.

Potential Gotchas

parse() will always match the shortest text necessary (from left to right) to fulfil the parse pattern, so for example:

>>> pattern = '{dir1}/{dir2}'
>>> data = 'root/parent/subdir'
>>> sorted(parse(pattern, data).named.items())
[('dir1', 'root'), ('dir2', 'parent/subdir')]

So, even though {'dir1': 'root/parent', 'dir2': 'subdir'} would also fit the pattern, the actual match represents the shortest successful match for dir1.

Developers

Want to contribute to parse? Fork the repo to your own GitHub account, and create a pull-request.

git clone [email protected]:r1chardj0n3s/parse.git
git remote rename origin upstream
git remote add origin [email protected]:YOURUSERNAME/parse.git
git checkout -b myfeature

To run the tests locally:

python -m venv .venv
source .venv/bin/activate
pip install -r tests/requirements.txt
pip install -e .
pytest

Changelog

1.20.1 The %f directive accepts 1-6 digits, like strptime (thanks @bbertincourt)
1.20.0 Added support for strptime codes (thanks @bendichter)
1.19.1 Added support for sign specifiers in number formats (thanks @anntzer)
1.19.0 Added slice access to fixed results (thanks @jonathangjertsen). Also corrected matching of full string vs. full line (thanks @giladreti) Fix issue with using digit field numbering and types
1.18.0 Correct bug in int parsing introduced in 1.16.0 (thanks @maxxk)
1.17.0 Make left- and center-aligned search consume up to next space
1.16.0 Make compiled parse objects pickleable (thanks @martinResearch)
1.15.0 Several fixes for parsing non-base 10 numbers (thanks @vladikcomper)
1.14.0 More broad acceptance of Fortran number format (thanks @purpleskyfall)
1.13.1 Project metadata correction.
1.13.0 Handle Fortran formatted numbers with no leading 0 before decimal point (thanks @purpleskyfall). Handle comparison of FixedTzOffset with other types of object.
1.12.1 Actually use the case_sensitive arg in compile (thanks @jacquev6)
1.12.0 Do not assume closing brace when an opening one is found (thanks @mattsep)
1.11.1 Revert having unicode char in docstring, it breaks Bamboo builds(?!)
1.11.0 Implement __contains__ for Result instances.
1.10.0 Introduce a "letters" matcher, since "w" matches numbers also.
1.9.1 Fix deprecation warnings around backslashes in regex strings (thanks Mickael Schoentgen). Also fix some documentation formatting issues.
1.9.0 We now honor precision and width specifiers when parsing numbers and strings, allowing parsing of concatenated elements of fixed width (thanks Julia Signell)
1.8.4 Add LICENSE file at request of packagers. Correct handling of AM/PM to follow most common interpretation. Correct parsing of hexadecimal that looks like a binary prefix. Add ability to parse case sensitively. Add parsing of numbers to Decimal with "F" (thanks John Vandenberg)
1.8.3 Add regex_group_count to with_pattern() decorator to support user-defined types that contain brackets/parenthesis (thanks Jens Engel)
1.8.2 add documentation for including braces in format string
1.8.1 ensure bare hexadecimal digits are not matched
1.8.0 support manual control over result evaluation (thanks Timo Furrer)
1.7.0 parse dict fields (thanks Mark Visser) and adapted to allow more than 100 re groups in Python 3.5+ (thanks David King)
1.6.6 parse Linux system log dates (thanks Alex Cowan)
1.6.5 handle precision in float format (thanks Levi Kilcher)
1.6.4 handle pipe "|" characters in parse string (thanks Martijn Pieters)
1.6.3 handle repeated instances of named fields, fix bug in PM time overflow
1.6.2 fix logging to use local, not root logger (thanks Necku)
1.6.1 be more flexible regarding matched ISO datetimes and timezones in general, fix bug in timezones without ":" and improve docs
1.6.0 add support for optional pattern attribute in user-defined types (thanks Jens Engel)
1.5.3 fix handling of question marks
1.5.2 fix type conversion error with dotted names (thanks Sebastian Thiel)
1.5.1 implement handling of named datetime fields
1.5 add handling of dotted field names (thanks Sebastian Thiel)
1.4.1 fix parsing of "0" in int conversion (thanks James Rowe)
1.4 add __getitem__ convenience access on Result.
1.3.3 fix Python 2.5 setup.py issue.
1.3.2 fix Python 3.2 setup.py issue.
1.3.1 fix a couple of Python 3.2 compatibility issues.
1.3 added search() and findall(); removed compile() from import * export as it overwrites builtin.
1.2 added ability for custom and override type conversions to be provided; some cleanup
1.1.9 to keep things simpler number sign is handled automatically; significant robustification in the face of edge-case input.
1.1.8 allow "d" fields to have number base "0x" etc. prefixes; fix up some field type interactions after stress-testing the parser; implement "%" type.
1.1.7 Python 3 compatibility tweaks (2.5 to 2.7 and 3.2 are supported).
1.1.6 add "e" and "g" field types; removed redundant "h" and "X"; removed need for explicit "#".
1.1.5 accept textual dates in more places; Result now holds match span positions.
1.1.4 fixes to some int type conversion; implemented "=" alignment; added date/time parsing with a variety of formats handled.
1.1.3 type conversion is automatic based on specified field types. Also added "f" and "n" types.
1.1.2 refactored, added compile() and limited from parse import *
1.1.1 documentation improvements
1.1.0 implemented more of the Format Specification Mini-Language and removed the restriction on mixing fixed-position and named fields
1.0.0 initial release

parse's People

Contributors

Stargazers

Watchers

Forkers

kennethreitz-archive jnrowe-retired-forks jenisys maisano wojons jkmacc pombredanne vallsv lkilcher lucien2k moonbot scooterman sparkslabs kivio benthomasson titussanchez moreati amigadave raylore2000 mjmvisser nivir lguyogiro es-so timofurrer ricyteach morabaraba richard-reece pirsquared jfrfonseca mpagel techalchemy adam-meya grishaspektor nokusukun bermanmaxim wdv4758h danshorstein bellyfat briancknight tuna25 tuksik mjfitzge kyluca martian111 dialneus akashdesarda stjordanis jab reynoldsnlp adamchainz gridl bionictk abhijeetmanhas rrosajp wrmsr xrosliang jarvan40 wasdee desabel suyujun91 maxxk a29107a qyttools eric-seekas jonike jonathangjertsen orions-stardom traceofpoem tomviner ii0 erichear tomerha menesis silygose tommyj83 ihfazhillah technodiver luffbee kaanizgi jamie-chang wandrys-dev python-repository-hub sdementen regmetrics tobynance mkokryashkin catalystneuro wimglenn arpitjain799 presteddy56 bendichter hussein-l-almadhachi krrt7 yetingli hellobrobro bbertincourt 150520 fxwhu yeatry

parse's Issues

LICENSE file in pypi tarball

I know it may sound tiresome after two issues filled because of license already. Could you please still consider to add LICENSE file also in pypi tarball?

Add an optional cardinality field for parsing at end of the parse schema

parse uses currently the same field schema like the str.format() function.
But parsing problems are often slightly different (and more complicated) compared to output formatting problems. I stumbled over a use case where it would be rather nice to have an optional cardinality field after the type field.

EXAMPLE:

#!python
schema = "I met {person:Person?} ..."  #< OPTIONAL DATA: Zero or one cardinality
schema = "I am meeting with {persons:Person+}"  #< MANY: One or more cardinality
schema = "I am meeting with {persons:Person*}"  #< MANY: Zero or more cardinality

The "many solution" is basically a comma-separated list of items for this datatype, like:

"I am meeting with Alice, Bob, Charly"

I have a canned, working solution (if my pull-request for the pattern attribute is accepted) that will allow to solve the underlying cardinality problem above: Generating a regular expression for the cardinality by using the regular expression of a user-defined (or built-in) data type.

Allow element indexes in field names

The format field names can have element indexes. See the python documentation.

field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
arg_name          ::=  [identifier | integer]
attribute_name    ::=  identifier
element_index     ::=  integer | index_string

But parse doesn't support it. Example:

>>> "test: {dict[0]}".format(dict=["red"])
'test: red'
>>> "test: {dict[color0]}".format(dict={"color0":"red"})
'test: red'

>>> parse.parse("test: {dict}", "test: blue")
<Result () {'dict': 'blue'}>
>>> parse.parse("test: {dict[0]}", "test: blue")
None
# return must be <Result () {'dict[0]': 'blue'}>
>>> parse.parse("test: {dict[color0]}", "test: blue")
None
# return must be <Result () {'dict[color0]': 'blue'}>

Pipe symbol not escaped

Literal text with a | symbol in it is not handled correctly:

>>> search('| {:d}', '| 10')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 1041, in search
    return Parser(format, extra_types=extra_types).search(string, pos, endpos)
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 678, in search
    return self._generate_result(m)
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 699, in _generate_result
    fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/parse.py", line 375, in f
    if string[0] == '-':
TypeError: 'NoneType' object has no attribute '__getitem__'

Escaping the | character explicitly makes it work:

>>> search('\\| {:d}', '| 10')
<Result (10,) {}>

Looking over the code base it should be trivial to fix by adding | to he REGEX_SAFETY pattern. However, I do wonder why re.escape() isn't used instead to escape regular expression metacharacters instead here. Am I missing something, does re.escape() escape too much?

Cannot parse concatenated string items

I don't know if this is a bug or just me but here is my use case. I have two items that are concatenated in a filename : a date (YYYMMDD) and a 2-digit string (model_run). I can't find a way to specify a pattern that will allow parsing of the two concatenated elements :

>>> filename_pattern = 'Some_string_{wx_variable}_ps2.5km_{YYYYMMDD}{model_run}_P{forecast_hour}-00.extension'
>>> afile = 'Some_string_A_variable_ps2.5km_1999121100_P012-00.extension'
>>> r = parse(filename_pattern, afile)
>>> r
<Result () {'forecast_hour': '012', 'wx_variable': 'A_variable', 'model_run': '999121100', 'YYYYMMDD': '1'}>
>>> r.named['YYYYMMDD']
'1'
>>> r.named['model_run']
'999121100'

How can I get the parsed items I expect, i.e. r.named['YYYYMMDD'] = '19991211' and r.named['model_run'] = '00' ?

I would like to avoid fiddling with the filename_pattern before parsing (e.g. add some format options like width, if that is even possible) because that would defeat the purpose of having a file pattern to begin with IMO.

Thanx !

Unmatched brace can still result in a match

Certain patterns can still result in a match even when there are unmatched braces. For example:

>>> parse("{who.txt", "hello")
<Result () {'who.tx': 'hello'}>

Even though there is no closing }, parse assumes the final character is the closing brace and matches the pattern accordingly. In this case, I'd expect parse to return None since there is no direct match.

add support for datetime.strftime directives?

There are some edge cases that this module does not cover, and rather than recreating the wheel I would like to discuss a method to support datetime.strftime directives.

The basic strategy I am imagining would be to preprocess the given string to replace these directives with appropriate format definitions from a hard-coded table so that they are loaded into the named set. These values can then be used to set a datetime on the named set after everything is parsed.

Walkthrough example:

FMT_STR="string with {stuff}, {}, and strftime directives like %Y, %d, and %b"
parse(FMT_STR, "string with myStuff, also_this, and strftime directives like 2018, 03 and Feb").named
>> {
    "stuff": "myStuff",
    "__Y": 2018,
    "__b": "Feb",
    "__d": 3,
    "__datetime": datetime(2018, 2, 3)
}

In the above example the format string would be pre-parsed into something like :

"string with {stuff}, {}, and strftime directives like {:4d}, {:2d}, and {:3w}"

using a mapping like:

map = {
    "%Y": "{:4d}", 
    "%d": "{:2d}", 
    "%b": "{:3w}"
}
for directive, fmt in map.items():
    string = string.replace(directive, fmt)

Does this seem reasonable? I may try an implementation unless there are potential issues with this I am overlooking.

Can't install v1.8.1/v1.8.2 with python3

Can't install parse to virtualenv in Debian/tesing.
1.8.0 is ok, 1.8.1 and 1.8.2 does not install.

/tmp$ virtualenv -p python3 virtualenv
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /tmp/virtualenv/bin/python3
Also creating executable in /tmp/virtualenv/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.

/tmp$  . virtualenv/bin/activate

(virtualenv) /tmp$ pip install parse
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /tmp/virtualenv/bin/python3
Also creating executable in /tmp/virtualenv/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
Collecting parse
  Using cached parse-1.8.2.tar.gz
Building wheels for collected packages: parse
  Running setup.py bdist_wheel for parse ... error
  Complete output from command /tmp/virtualenv/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-kz7sa7u4/parse/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpdxlborhzpip-wheel- --python-tag cp35:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib
  copying parse.py -> build/lib
  installing to build/bdist.linux-x86_64/wheel
  running install
  running install_lib
  creating build/bdist.linux-x86_64
  creating build/bdist.linux-x86_64/wheel
  copying build/lib/parse.py -> build/bdist.linux-x86_64/wheel
  running install_egg_info
  running egg_info
  writing parse.egg-info/PKG-INFO
  writing top-level names to parse.egg-info/top_level.txt
  writing dependency_links to parse.egg-info/dependency_links.txt
  reading manifest file 'parse.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'parse.egg-info/SOURCES.txt'
  Copying parse.egg-info to build/bdist.linux-x86_64/wheel/parse-1.8.2-py3.5.egg-info
  running install_scripts
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-build-kz7sa7u4/parse/setup.py", line 35, in <module>
      'License :: OSI Approved :: BSD License',
    File "/usr/lib/python3.5/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands
      self.run_command(cmd)
    File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
      cmd_obj.run()
    File "/tmp/virtualenv/lib/python3.5/site-packages/wheel/bdist_wheel.py", line 215, in run
      self.run_command('install')
    File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
      cmd_obj.run()
    File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/command/install.py", line 61, in run
      return orig.install.run(self)
    File "/usr/lib/python3.5/distutils/command/install.py", line 595, in run
      self.run_command(cmd_name)
    File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
      cmd_obj.run()
    File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/command/install_scripts.py", line 17, in run
      import setuptools.command.easy_install as ei
    File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 49, in <module>
      from setuptools.py27compat import rmtree_safe
    File "/tmp/virtualenv/lib/python3.5/site-packages/setuptools/py27compat.py", line 7, in <module>
      import six
  ImportError: No module named 'six'

Some hex values mistakenly parsed as zeroes

For some reason, parsing certain hex values with leading zeroes produces buggy and unreliable results.
Here's a quick demonstration using the latest version of parse (1.12.0):

>>> from parse import parse
>>> parse.parse('${:x}','$0b67')
<Result (0,) {}>
>>> parse.parse('${:x}','$0B67')
<Result (0,) {}>
>>> parse.parse('${:x}','$0B6')
<Result (0,) {}>
>>> parse.parse('${:x}','$0B')
<Result (11,) {}>
>>> parse.parse('${:x}','$B67')
<Result (2919,) {}>

It appears that parse() recognizes, but fails to parse certain combinations of hex digits, returning zero instead.
Interestingly, whether parsing fails depends on the digit next to zero.
Based on my testing, it always happens with numbers starting with "0Bxxx" (excluding "0B"; I know there was a separate [closed] bug report on that one, but it appears that the underlying issue is still there).

Please add a license

I want to use this project but can't because there is no license. No license on a project means all rights are reserved the author of the code, that prevents any use of code by other people. Please consider adding a license. You can use https://choosealicense.com/ or https://tldrlegal.com/ to determine what license is right for you.

Thanks!

Can I parse a JSON string with any dotted names?

I want to parse JSON string.
But I got an error and I can not parse.

from parse import *

pattern = '{"name": {data.name}, "age": {data.age}}'
print(parse(pattern, '{"name": "test", "age": 25}'))

result

ValueError: format spec 'name":' not recognised

Is it impossible to parse a JSON character string with any dotted names?

Request: Add a copy of the license in a separate file

Hey, we are vendoring this library over in https://github.com/pypa/pipenv and we are automating our vendoring process. As part of the broader distribution process we are trying to handle our current licensing issues by including explicit license files for each of our vendored dependencies. Would you be receptive to adding an additional LICENSE file with the text of the license of your software (MIT license I believe?)

If so I don't mind tossing a PR in this direction

Unicode support?

Ctrl+F "unicode" on README.rst here and on https://pypi.python.org/pypi/parse doesn't find anything. parse appears to hang indefinitely on unicode strings (as with e.g. from __future__ import unicode_literals). Is that the case? Is there any expectation it'll work in the future?

Parsing a continous string

I was trying to parse continous string having letters and numbers by the easiest way:

parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA5.2M')
<Result ('G', 3.8, 'XA', 5.2, 'M') {}>
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA4.2M')
<Result ('G', 3.8, 'XA', 4.2, 'M') {}>

as far so good, but

parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA04.2M')
<Result ('G', 3.8, 'XA0', 4.2, 'M') {}>
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA44.2M')
<Result ('G', 3.8, 'XA4', 4.2, 'M') {}>
parse('{:>w}{:g}{:w}{:g}{:w}', ' G3.80XA40M')
<Result ('G', 3.8, 'XA4', 0.0, 'M') {}>

changing input string a little bit I was getting bad results.

I am missing something?

Igor

hex letters are considered "digits", really?

Just got bitten by this, I think it's a bug...

>>> parse('wat {:d} wat', 'wat 12345 wat')
<Result (12345,) {}>
>>> parse('wat {:d} wat', 'wat 12f45 wat')
<Result (1245,) {}>
>>> parse('wat {:d} wat', 'wat 12g45 wat')
>>> parse('wat {:d} wat', 'wat ff3ff wat')
<Result (3,) {}>
>>> parse('wat {:d} wat', 'wat fffff wat')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-91053c27d470> in <module>()
----> 1 parse('wat {:d} wat', 'wat fffff wat')

/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in parse(format, string, extra_types, evaluate_result)
   1113     In the case there is no match parse() will return None.
   1114     '''
-> 1115     return Parser(format, extra_types=extra_types).parse(string, evaluate_result=evaluate_result)
   1116 
   1117 

/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in parse(self, string, evaluate_result)
    695 
    696         if evaluate_result:
--> 697             return self.evaluate_result(m)
    698         else:
    699             return Match(self, m)

/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in evaluate_result(self, m)
    762         for n in self._fixed_fields:
    763             if n in self._type_conversions:
--> 764                 fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
    765         fixed_fields = tuple(fixed_fields[n] for n in self._fixed_fields)
    766 

/home/wglenn/.virtualenvs/scratch/lib/python3.6/site-packages/parse.py in f(string, match, base)
    412         chars = CHARS[:base]
    413         string = re.sub('[^%s]' % chars, '', string.lower())
--> 414         return sign * int(string, base)
    415     return f
    416 

ValueError: invalid literal for int() with base 10: ''

parse of '0B' as an hexadecimal fails

While trying to parse a list of HID usages, I have a field that is represented by '0B' that I'd like to convert to the value 11.

When parsing the values, I am using parse.parse('{value:x}\t{name}', line), so I am specifying that I want the int value to be an hex. However, I am hitting https://github.com/r1chardj0n3s/parse/blob/master/parse.py#L441 (in int_convert), and parse decides that my hex value is a base 2 one, and returns 0.

One solution could be to enforce the size to be at least 3 in int_convert if the value starts with a '0' and a known prefix. But I think if the users provides the base for the conversion, the int_convertfunction should not try to be smart and simply use the provided base.

Add beginning / end of string indicator

Would be great to be able to indicate a match at beginning or end of string. E.g. if a pattern matches some records at the beginning and other records in the middle but you only want to target those at the beginning of the string, I don’t see an easy way to do that currently.

Add international support for days/months

Currently only English days/months are recognised. Could be better.

Cannot use name with ti specifier

This doesn't work:

>>> parse.parse('on {date:ti}', 'on 2012-09-17')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\parse.py", line 849, in parse
  File "build\bdist.win-amd64\egg\parse.py", line 526, in parse
  File "build\bdist.win-amd64\egg\parse.py", line 572, in _generate_result
KeyError: 'date'

This does work:

>>> parse.parse('on {:ti}', 'on 2012-09-17')
<Result (datetime.datetime(2012, 9, 17, 0, 0),) {}>

Doesn't matter what name you use or what else is in the pattern, it always throws a KeyError.

possible to specify "hungry" format specs?

I think this is more of a question, but it may be an issue as well.

If I format the following format spec string with values:

fmat='{}{}'
values=['a','b']

I of course get this result:

>>> fmat.format(*values)
'ab'

And parse handles this as expected.

>>> list(parse(fmat,'ab'))
['a', 'b']

However, I could get the same result by supplying these arguments (the final arg just being an empty string):

>>> values=['ab','']
>>> fmat.format(*values)
'ab'

The default parse behavior becomes a bit more clear when I do this:

>>> list(parse('{}{}', 'abcdef'))
['a', 'bcdef']

So it seems that the format fields "eat" as little as possible. This definitely makes sense as a default behavior.

If I naively supply the optional third argument to parse, in an attempt to signal that a field should be as hungry as possible and "eat" any string (str) that it finds (if it can), I get the same result:

>>> list(parse('{:s}{:s}', 'abcdef', dict(s=str))
['a', 'bcdef'] # rather than ['abcdef', '']

Any ideas on how to get the last argument to be the empty string using the existing API?

I do understand that an option like this would be tough to implement. For example: how should this be handled?

>>> list(parse('{:s}{:s}{:s}{:d}{:d}', 'abc123', dict(s=str), hungry=True))

Should it return None, e.g.?:

['abc123','','', ERROR, ERROR] # errors because no integers to eat after first string eats everything

Or should it pick out the integers first and leave the leftovers for the other fields (i.e., integers are hungrier than strings)?

['abc','','', 12, 3]

(NOTE in the last example the integers would also be "hungry", but not hungry enough to cause an error; i.e., integers are hungrier than strings, but not more hungry than each other.)

Brackets Break Date Parsing

import parse
def a(a):
    return a
a.pattern = '((3))'
parse.parse('{:a}q{:ti}', '3q2017-12-31', dict(a=a))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-9cc16e0f1e39> in <module>()
      3     return a
      4 a.pattern = '((3))'
----> 5 parse.parse('{:a}q{:ti}', '3q2017-12-31', dict(a=a))

~/miniconda3/lib/python3.6/site-packages/parse.py in parse(format, string, extra_types, evaluate_result)
   1115     In the case there is no match parse() will return None.
   1116     '''
-> 1117     return Parser(format, extra_types=extra_types).parse(string, evaluate_result=evaluate_result)
   1118 
   1119 

~/miniconda3/lib/python3.6/site-packages/parse.py in parse(self, string, evaluate_result)
    697 
    698         if evaluate_result:
--> 699             return self.evaluate_result(m)
    700         else:
    701             return Match(self, m)

~/miniconda3/lib/python3.6/site-packages/parse.py in evaluate_result(self, m)
    764         for n in self._fixed_fields:
    765             if n in self._type_conversions:
--> 766                 fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
    767         fixed_fields = tuple(fixed_fields[n] for n in self._fixed_fields)
    768 

~/miniconda3/lib/python3.6/site-packages/parse.py in date_convert(string, match, ymd, mdy, dmy, d_m_y, hms, am, tz, mm, dd)
    482         d=groups[dd]
    483     elif ymd is not None:
--> 484         y, m, d = re.split('[-/\s]', groups[ymd])
    485     elif mdy is not None:
    486         m, d, y = re.split('[-/\s]', groups[mdy])

ValueError: not enough values to unpack (expected 3, got 1)

Result.contains not implemented

Result.__getitem__ is implemented to perform lookup using index or key, but Result doesnt implement __contains__ and doesnt inherit from abc.Mapping

As a result, the following fails:

    if 'foo' in result:
         blah['foo'] = result['foo']

instead the following needs to be used:

    try:
         blah['foo'] = result['foo']
    except KeyError:
        pass

Lookup pattern but do not generate results

Is there a way to search for a string but not generate the results?

This would be especially useful when I have custom type converters but want to evaluate those later. I just want to make search my string is valid.

Curious: why wasn't stdlib `string.Formatter().parse` used

Mostly just curious, not an actual bug.

I'm working on a (tangentially) related idea of mine to rewrite format strings.

https://docs.python.org/3/library/string.html#string.Formatter.parse

Difference in `parse` and `search`

I know that parse.search should be used to match a pattern at any position in the string whereas parse.parse has to match the string exactly.

The following issue came up some days ago in the radish project: radish-bdd/radish#106
Especially this comment might be interesting: radish-bdd/radish#106 (comment)

However, this gives as interesting outcome:

>>> patt = parse.compile('I have a {}')
>>> patt.search('I have a apple')
<Result ('a',) {}>
>>> patt.parse('I have a apple')
<Result ('apple',) {}>

and

>>> patt = parse.compile('I {} a {}')
>>> patt.parse('I have a apple')
<Result ('have', 'apple') {}>
>>> patt.search('I have a apple')
<Result ('have', 'a') {}>

As you can see search and parse are giving different results. In this example it indeed be possible to just use parse - but in a lot of cases we use this library for is not.

Is this intended behavior?

Adding @rscrimojr

pip installation

Hello,
Would it be possible to have the package available via pip ?
I used pip install git+https://github.com/r1chardj0n3s/parse that works well but something like pip install parse would be nice !
Thanks

Bug in {:ta} and {:tg} parsing for AM/PM

When using the :ta parsing format, the hour between Noon and 13:00 (aka 1:00PM) generates a ValueError in datetime because hour must be in the range 0..23.

Example:

parse.version
'1.6.2'

parse.parse('Meet at {:tg}', 'Meet at 1/2/2011 1:00 PM')
<Result (datetime.datetime(2011, 2, 1, 13, 0),) {}>

example from the documentation. Now, changing this to 12:45 pm doesn't work.

parse.parse('Meet at {:tg}', 'Meet at 1/2/2011 12:45 PM')
Traceback (most recent call last):
File "", line 1, in
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 983, in parse
return Parser(format, extra_types=extra_types).parse(string)
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 640, in parse
return self._generate_result(m)
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 678, in _generate_result
fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
File "//anaconda/lib/python2.7/site-packages/parse-1.6.2-py2.7.egg/parse.py", line 518, in date_convert
d = datetime(y, m, d, H, M, S, u, tzinfo=tz)
ValueError: hour must be in 0..23

Parser constructor should allow to set the regular expression flags (re flags)

Currently, the Parser class always applies "re.IGNORECASE" internally where needed. This may not be always desired. Therefore, it would be best, if the constructor of Parser would allow to provide own "re flags" or disable the "re.IGNORECASE" flag.

Add parsing of other things (add requests here)

It'd be neat if it could parse:

URLs, producing the same result as urlparse.urlparse()
email addresses, producing a (realname, email address) pair like email.utils.parseaddr()
IPv4 and IPv6 addresses (producing .. what?)

...?

PM handling

the AM/PM are optional, and if PM is found then 12 hours will be added to the datetime object's hours amount - even if the hour is greater than 12 (for consistency.)

I realize you've already chosen your poison here, but the "add 12 always" rule either needs an explicit exception, or it should just do away with the "add 12 to PM values." Why?

noon is 12:00PM. 15 minutes after noon is 12:15PM. 12 should NOT be added to these value, or they'll register as later than e.g. 9:24PM.

In fact, 12AM (and 12:15AM) should result in a SUBTRACTION of 12 hours, as it occurs prior to 1AM on the given day.

Both issues can be worked around by assuming in the presence of AM/PM indicators, a subtraction of 12 is done. So, the later adding of 12 hours for having "PM" would restore the timeline.

Silly ancient peoples not inventing the concept of "0". Silly us for continuing to stick with a counter-intuitive notation.

Allow a field in the parse format to be optional

The suggested syntax from jenisys is to suffix the type with "?" which I believe is reasonable. Thus:

{honorific:s?} {given:s} {sur:s}

would match both of:

"Mr Richard Jones"
"Jens Engels"

The "honorific" element in the result object would have the value None.

Is lazy matching possible?

Consider the patterns:

a = "{hello} world"
b = "hello world"
parse(a, 'well hello there world')  # matches
parse(b, 'well hello there world')  # fails

Is there a way to get a to fail without specifying custom formats?

Alternatively, is there a way to override the default format type/matching behavior?

Greedy matching causes subsequent specifiers to be included in first match


In [22]: p = compile('I make a POST request to "{url_path_segment}"')

In [23]: p.parse('I make a POST request to "{url_path}" with file "{filename}" as "{key}"')
Out[23]: <Result () {'url_path_segment': '{url_path}" with file "{filename}" as "{key}'}>

expected

None

Parse chokes on parse("{n} {n}, "x x")

The parser is apparently in an infinite loop.

I think it should at least say that this format is not parsable.
At best, it should do:

parse("{n} {n}", "x x") -> {"n", "x"}
parse("{n} {n}", "x y") -> None

Type converter signature should be changed (or extended)

The type converter function signature should be changed to:

def type_converter(text, match=None, match_start=0):
    # -- NEW: match_start : int = 0, 
    # refers to the first group in the match object for a field where the converter is used.
    pass

This change would allow to provide a generic Parser without type knowledge.
In addition, user-defined types should also provide this signature. Currently, only the first parameter is supported there which prevents to use complexes type converter cases.

Due to backward compatibility reasons, the old user-defined signature should be supported, too, at least for some time.

Supported versions problem: python2.6, python2.5 (minor problem)

The "setup.py" file currently still states that python2.5 and python2.6 are supported.
This may be true, but can currently not be proven because the test suite contains at least tests that run only on python2.7 and newer.

EXAMPLE:

$ pytest
platform ... -- Python 2.6.9, pytest-3.2.5, ...
...
self = <test_parse.TestParseType testMethod=test_decimal_value>
    def test_decimal_value(self):
        value = Decimal('5.5')
>       str_ = 'test {}'.format(value)
E       ValueError: zero length field name in format

NOTE: string-format without index or named args work only for Python2.7.x or newer AFAIK.

POSSIBLE SOLUTIONS:

Drop support for older python versions
Ensure that tests pass on all supported python version (and fix the test)

Getting UnicodeEncodeError during installation

We are installing the parse as the dependency of the behave
And this is the traceback that we getting during the installation.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-sme0se0x/parse/setup.py", line 10, in <module>
    f.write(__doc__)
UnicodeEncodeError: 'ascii' codec can't encode character '\xeb' in position 13017: ordinal not in range(128)

Add search() and findall()

More than just match! findall() should be an iterator.

Several DeprecationWarning: invalid escape sequence

As noted in PR #71

Logging is done with root logger.

Currently parse uses logging.debug(); this sends the message to the root logger making it hard to filter. It would be great if parse could use logging.getLogger(__name__).debug() or log.debug() (log is assigned near the start but never used).

Thanks for that great module!

Fixed point should return Decimal

fixed point numbers should return a Decimal

format f is currently used for fixed point numbers, and returns a float. Changing that would break many things, so suggest this new mode is given the upper case letter F.

Problem with findall in parse module

Reference link : https://pypi.python.org/pypi/parse

data = """ 0 2 PRLI Def 666E00 B A 0 0
1 2 PRLI Def 017000 B A 0 0"""

for x in parse.findall("{Id:^d} {Index:^d} {State:^} {Emulation:^} {ID:^} {NN:^} {PN:^} {ABTS:^d} {SRR:^d}\n", data):
print ("====", x.named)

Above code will and and gets hang(stuck) after printing first line.
But if we reduce number of columns then it will work fine.
also if there is difference in number of columns in data and pattern string then findall will hang.

Support text alignment

Format supports text aligning, but parser does not. It'd be nice if it did :-)

Example:

from parse import parse
fmt = "{:>6}{:>7}"
print(("three", "four") == parse(fmt, fmt.format("three", "four"))) #this should be True

Way to fall back on default parsing behavior for an overridden type spec

Edit

After looking a bit closer at the Custom Type Conversions section of the pypi page, I think I can probably get things working the way I need using these. The page contains this statement:

Your custom type conversions may override the builtin types if you supply one with the same identifier.

Is there an exception I can raise, or a method I can call, in the custom type conversion so as to make it fall back on the default behavior for the type conversion (either the one supplied, or a modified version of the one that was supplied)?

Original Question

Is there a way to force the API to fall back on to the default behavior for formatting types when they have been overridden by extra_types? Below is an example of what I mean.

Desired float parsing behavior:

>>> parse('{: >f}{: >f}', '   1.025      1.033')
<Result (1.025, 1.033) {}> # expected result

Actual behavior (when overridden):

>>> parse('{: >f}{: >f}', '   1.025      1.033', extra_types=dict(f=float))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 1117, in parse
    return Parser(format, extra_types=extra_types).parse(string, evaluate_result=evaluate_result)
  File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 699, in parse
    return self.evaluate_result(m)
  File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 766, in evaluate_result
    fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
  File "C:\Users\ricky\Anaconda3\lib\site-packages\parse.py", line 882, in f
    return type_converter(string)
ValueError: could not convert string to float: '.025      1.033'

Use Case

I have a need for a couple of custom numerical types that can handle empty strings and spaces in a manner similar to zeroes. I've implemented them something like this:

class Blank():
    def __new__(cls, value):
        try:
            return super().__new__(cls, value)
        except ValueError:
            if value == '' or value == ' ':
                return super().__new__(cls, 0.0)
            else:
                raise
    def __str__(self):
        return '' if self==0 else super().__str__()
    def __format__(self, spec):
        if (spec.endswith('d') or spec.endswith('f') or spec.endswith('n')) and self==0:
            spec = spec[:-1]+'s'
            return format('',spec)
        else:
            return super().__format__(spec)

class BlankInt(Blank, int):
    '''An int that prints blank when zero.'''
    pass
        
class BlankFloat(Blank, float):
    '''A float that prints blank when zero.'''
    pass

This seems to partially work the way I had in mind:

>>> from parse import parse 
>>> parse('{: >5f}',  '     ', extra_types=dict(f=BlankFloat))
<Result (0.0,) {}>
>>> parse('{: >5f}'*5,  '     ', extra_types=dict(f=BlankFloat))
<Result (0.0, 0.0, 0.0, 0.0, 0.0) {}>

However, this doesn't work (since float doesn't work, either and BlankFloat is a subclass):

>>> parse('{: >f}{: >f}', '   1.025      1.033', extra_types=dict(f=BlankFloat))
ValueError: could not convert string to float: '.025      1.033'

Problem with integer number parsing

Hi there.
I'm having a problem with the 1.6.6 version in both python 2.7.9 and python 3.4.3.

I was able to reproduce it with the following example:

>>> import parse
>>> parse.parse("blablabla {x:d}", "blablabla 12")
<Result () {'x': 12}>
>>> parse.parse("blablabla {x:d}", "blablabla jdhhd")
>>> parse.parse("blablabla {x:d}", "blablabla cdc")                                                                                                                                                                                
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 1044, in parse
    return Parser(format, extra_types=extra_types).parse(string)
  File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 681, in parse
    return self._generate_result(m)
  File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 731, in _generate_result
    m)
  File "/home/calsaverini/.envs/default/lib/python3.4/site-packages/parse.py", line 398, in f
    return sign * int(string, base)
ValueError: invalid literal for int() with base 10: ''

Is this a bug or am I missing some kind of edge case when {:d} might match cdc that I'm not aware of?

Thanks for your help.

Optional "group_count" attribute for user-defined type converters

The current "parse" module has as small deficiency (or bug).
When user-defined type converter uses regular expression grouping in its pattern (attribute), the extracted result parameters are partly wrong in some params because this group index offset is not considered.
NOTE: This problem occurs only for fixed (unnamed) fields, named fields are OK.

# FILE: parse.py
# NECESSARY CHANGES:
…
        # -- Parser._handle_field()
        ...
        if type in self._extra_types:
            type_converter = self._extra_types[type]
            s = getattr(type_converter, 'pattern', r'.+?')
            # -- EXTENSION: group_count attribute
            group_count = getattr(type_converter, 'group_count', 0)
            self._group_index += group_count
            # -- EXTENSION-END

Percent format with decimal limit isn't supported

The following works: {field:%} but adding a decimal limit like {field:.2%} does not and throws an exception:

  File "/usr/local/lib/python3.2/dist-packages/parse.py", line 983, in parse
    return Parser(format, extra_types=extra_types).parse(string)
  File "/usr/local/lib/python3.2/dist-packages/parse.py", line 586, in __init__
    self._expression = self._generate_expression()
  File "/usr/local/lib/python3.2/dist-packages/parse.py", line 717, in _generate_expression
    e.append(self._handle_field(part))
  File "/usr/local/lib/python3.2/dist-packages/parse.py", line 770, in _handle_field
    format = extract_format(format, self._extra_types)
  File "/usr/local/lib/python3.2/dist-packages/parse.py", line 562, in extract_format
    raise ValueError('type %r not recognised' % type)

Unsure whether this is a bug or intended.

Thanks for the extremely quick fix on the logging issue by the way!

Document behaviour when the template is ambiguous

Example:

>>> pattern = '{dir1}/{dir2}'
>>> data = 'root/parent/subdir'
>>> parse(pattern, data).named
{'dir1': 'root', 'dir2': 'parent/subdir'}

But {'dir1': 'root/parent', 'dir2': 'subdir'} is also fitting the pattern. Is this behaviour is reliable, or should it be considered implementation detail? I couldn't find it in the docs anywhere.

Is there anyway to coerce the result one way or the other?

Cannot use { Character?

So.... you didn't list an escape character... so if my string contains "{", it just doesn't work?

GREAT work!

Issue with quoting and question marks: parse('"{}"?', '"teststr"?')

>>> '"{}"?'.format("teststr")
'"teststr"?'
>>> parse('"{}"?', '"teststr"?')
<Result ('teststr"?',) {}>

I would like to match only "teststr". Am I doing something wrong?

r1chardj0n3s / parse Goto Github PK

parse's Introduction

Installation

Usage

Format Syntax

Format Specification

Result and Match Objects

Custom Type Conversions

Potential Gotchas

Developers

Changelog

parse's People

Contributors

Stargazers

Watchers

Forkers

parse's Issues

Edit

Original Question

Use Case

Recommend Projects

Recommend Topics

Recommend Org