Git Product home page Git Product logo

ciso8601's Introduction

ciso8601

image

image

image

ciso8601 converts ISO 8601 or RFC 3339 date time strings into Python datetime objects.

Since it's written as a C module, it is much faster than other Python libraries. Tested with cPython 2.7, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11, 3.12.

(Interested in working on projects like this? Close is looking for great engineers to join our team)

Contents

Quick start

% pip install ciso8601
In [1]: import ciso8601

In [2]: ciso8601.parse_datetime('2014-12-05T12:30:45.123456-05:30')
Out[2]: datetime.datetime(2014, 12, 5, 12, 30, 45, 123456, tzinfo=pytz.FixedOffset(330))

In [3]: ciso8601.parse_datetime('20141205T123045')
Out[3]: datetime.datetime(2014, 12, 5, 12, 30, 45)

Migration to v2

Version 2.0.0 of ciso8601 changed the core implementation. This was not entirely backwards compatible, and care should be taken when migrating See CHANGELOG for the Migration Guide.

When should I not use ciso8601?

ciso8601 is not necessarily the best solution for every use case (especially since Python 3.11). See Should I use ciso8601?

Error handling

Starting in v2.0.0, ciso8601 offers strong guarantees when it comes to parsing strings.

parse_datetime(dt: String): datetime is a function that takes a string and either:

  • Returns a properly parsed Python datetime, if and only if the entire string conforms to the supported subset of ISO 8601
  • Raises a ValueError with a description of the reason why the string doesn't conform to the supported subset of ISO 8601

If time zone information is provided, an aware datetime object will be returned. Otherwise, a naive datetime is returned.

Benchmark

Parsing a timestamp with no time zone information (e.g., 2014-01-09T21:48:00):

Module Python 3.12 Python 3.11 Python 3.10 Python 3.9 Relative slowdown (versus ciso8601, latest Python) Python 3.8 Python 3.7 Python 2.7
ciso8601 98 nsec 90 nsec 122 nsec 122 nsec N/A 118 nsec 124 nsec 134 nsec
backports.datetime_fromisoformat N/A N/A 112 nsec 108 nsec 0.9x 106 nsec 118 nsec N/A
datetime (builtin) 129 nsec 132 nsec N/A N/A 1.3x N/A N/A N/A
pendulum N/A 180 nsec 187 nsec 186 nsec 2.0x 196 nsec 200 nsec 8.52 usec
udatetime 695 nsec 662 nsec 674 nsec 692 nsec 7.1x 724 nsec 713 nsec 586 nsec
str2date 6.86 usec 5.78 usec 6.59 usec 6.4 usec 70.0x 6.66 usec 6.96 usec
iso8601utils N/A N/A N/A 8.59 usec 70.5x 8.6 usec 9.59 usec 11.2 usec
iso8601 10 usec 8.24 usec 8.96 usec 9.21 usec 102.2x 9.14 usec 9.63 usec 25.7 usec
isodate 11.1 usec 8.76 usec 10.2 usec 9.76 usec 113.6x 9.92 usec 11 usec 44.1 usec
PySO8601 17.2 usec 13.6 usec 16 usec 15.8 usec 175.3x 16.1 usec 17.1 usec 17.7 usec
aniso8601 22.2 usec 17.8 usec 23.2 usec 23.1 usec 227.0x 24.3 usec 27.2 usec 30.7 usec
zulu 23.3 usec 19 usec 22 usec 21.3 usec 237.9x 21.6 usec 22.7 usec N/A
maya N/A 36.1 usec 42.5 usec 42.7 usec 401.6x 41.3 usec 44.2 usec N/A
python-dateutil 57.6 usec 51.4 usec 63.3 usec 62.6 usec 587.7x 63.7 usec 67.3 usec 119 usec
arrow 62 usec 54 usec 65.5 usec 65.7 usec 633.0x 66.6 usec 70.2 usec 78.8 usec
metomi-isodatetime 1.29 msec 1.33 msec 1.76 msec 1.77 msec 13201.1x 1.79 msec 1.91 msec N/A
moment 1.81 msec 1.65 msec 1.75 msec 1.79 msec 18474.8x 1.78 msec 1.84 msec N/A

ciso8601 takes 98 nsec, which is 1.3x faster than datetime (builtin), the next fastest Python 3.12 parser in this comparison.

Parsing a timestamp with time zone information (e.g., 2014-01-09T21:48:00-05:30):

Module Python 3.12 Python 3.11 Python 3.10 Python 3.9 Relative slowdown (versus ciso8601, latest Python) Python 3.8 Python 3.7 Python 2.7
ciso8601 95 nsec 96.8 nsec 128 nsec 123 nsec N/A 125 nsec 125 nsec 140 nsec
backports.datetime_fromisoformat N/A N/A 147 nsec 149 nsec 1.1x 138 nsec 149 nsec N/A
datetime (builtin) 198 nsec 207 nsec N/A N/A 2.1x N/A N/A N/A
pendulum N/A 225 nsec 214 nsec 211 nsec 2.3x 219 nsec 224 nsec 13.5 usec
udatetime 799 nsec 803 nsec 805 nsec 830 nsec 8.4x 827 nsec 805 nsec 768 nsec
str2date 7.73 usec 6.75 usec 7.78 usec 7.8 usec 81.4x 7.74 usec 8.13 usec
iso8601 13.7 usec 11.3 usec 12.7 usec 12.5 usec 143.8x 12.4 usec 12.6 usec 31.1 usec
isodate 13.7 usec 11.3 usec 12.9 usec 12.7 usec 144.0x 12.7 usec 13.9 usec 46.7 usec
iso8601utils N/A N/A N/A 21.4 usec 174.9x 22.1 usec 23.4 usec 28.3 usec
PySO8601 25.1 usec 20.4 usec 23.2 usec 23.8 usec 263.8x 23.5 usec 24.8 usec 25.3 usec
zulu 26.3 usec 21.4 usec 25.7 usec 24 usec 277.2x 24.5 usec 25.3 usec N/A
aniso8601 27.7 usec 23.7 usec 30.3 usec 30 usec 291.3x 31.6 usec 33.8 usec 39.2 usec
maya N/A 36 usec 41.3 usec 41.8 usec 372.0x 42.4 usec 42.7 usec N/A
python-dateutil 70.7 usec 65.1 usec 77.9 usec 80.2 usec 744.0x 79.4 usec 83.6 usec 100 usec
arrow 73 usec 62.8 usec 74.5 usec 73.9 usec 768.6x 75.1 usec 80 usec 148 usec
metomi-isodatetime 1.22 msec 1.25 msec 1.72 msec 1.72 msec 12876.3x 1.76 msec 1.83 msec N/A
moment ❌ |❌

|❌

|❌

|2305

822.8x ❌

|❌

|N/A

ciso8601 takes 95 nsec, which is 2.1x faster than datetime (builtin), the next fastest Python 3.12 parser in this comparison.

Tested on Linux 5.15.49-linuxkit using the following modules:

aniso8601==9.0.1
arrow==1.3.0 (on Python 3.8, 3.9, 3.10, 3.11, 3.12), arrow==1.2.3 (on Python 3.7), arrow==0.17.0 (on Python 2.7)
backports.datetime_fromisoformat==2.0.1
ciso8601==2.3.0
iso8601==2.1.0 (on Python 3.8, 3.9, 3.10, 3.11, 3.12), iso8601==0.1.16 (on Python 2.7)
iso8601utils==0.1.2
isodate==0.6.1
maya==0.6.1
metomi-isodatetime==1!3.1.0
moment==0.12.1
pendulum==2.1.2
PySO8601==0.2.0
python-dateutil==2.8.2
str2date==0.905
udatetime==0.0.17
zulu==2.0.0

For full benchmarking details (or to run the benchmark yourself), see benchmarking/README.rst

Supported subset of ISO 8601

ciso8601 only supports a subset of ISO 8601, but supports a superset of what is supported by Python itself (datetime.fromisoformat_), and supports the entirety of the RFC 3339 specification.

Date formats

The following date formats are supported:

Format Example Supported
YYYY-MM-DD (extended) 2018-04-29
YYYY-MM (extended) 2018-04
YYYYMMDD (basic) 20180429
YYYY-Www-D (week date) 2009-W01-1
YYYY-Www (week date) 2009-W01
YYYYWwwD (week date) 2009W011
YYYYWww (week date) 2009W01
YYYY-DDD (ordinal date) 1981-095
YYYYDDD (ordinal date) 1981095

Uncommon ISO 8601 date formats are not supported:

Format Example Supported
--MM-DD (omitted year) --04-29
--MMDD (omitted year) --0429
±YYYYY-MM (>4 digit year) +10000-04
+YYYY-MM (leading +) +2018-04
-YYYY-MM (negative -) -2018-04

Time formats

Times are optional and are separated from the date by the letter T.

Consistent with RFC 3339, ciso8601 also allows either a space character, or a lower-case t, to be used instead of a T.

The following time formats are supported:

Format Example Supported
hh 11
hhmm 1130
hh:mm 11:30
hhmmss 113059
hh:mm:ss 11:30:59
hhmmss.ssssss 113059.123456
hh:mm:ss.ssssss 11:30:59.123456
hhmmss,ssssss 113059,123456
hh:mm:ss,ssssss 11:30:59,123456
Midnight (special case) 24:00:00
hh.hhh (fractional hours) 11.5
hh:mm.mmm (fractional minutes) 11:30.5

Note: Python datetime objects only have microsecond precision (6 digits). Any additional precision will be truncated.

Time zone information

Time zone information may be provided in one of the following formats:

Format Example Supported
Z Z
z z
±hh +11
±hhmm +1130
±hh:mm +11:30

While the ISO 8601 specification allows the use of MINUS SIGN (U+2212) in the time zone separator, ciso8601 only supports the use of the HYPHEN-MINUS (U+002D) character.

Consistent with RFC 3339, ciso8601 also allows a lower-case z to be used instead of a Z.

Strict RFC 3339 parsing

ciso8601 parses ISO 8601 datetimes, which can be thought of as a superset of RFC 3339 (roughly). In cases where you might want strict RFC 3339 parsing, ciso8601 offers a parse_rfc3339 method, which behaves in a similar manner to parse_datetime:

parse_rfc3339(dt: String): datetime is a function that takes a string and either:

  • Returns a properly parsed Python datetime, if and only if the entire string conforms to RFC 3339.
  • Raises a ValueError with a description of the reason why the string doesn't conform to RFC 3339.

Ignoring time zone information while parsing

It takes more time to parse timestamps with time zone information, especially if they're not in UTC. However, there are times when you don't care about time zone information, and wish to produce naive datetimes instead. For example, if you are certain that your program will only parse timestamps from a single time zone, you might want to strip the time zone information and only output naive datetimes.

In these limited cases, there is a second function provided. parse_datetime_as_naive will ignore any time zone information it finds and, as a result, is faster for timestamps containing time zone information.

In [1]: import ciso8601

In [2]: ciso8601.parse_datetime_as_naive('2014-12-05T12:30:45.123456-05:30')
Out[2]: datetime.datetime(2014, 12, 5, 12, 30, 45, 123456)

NOTE: parse_datetime_as_naive is only useful in the case where your timestamps have time zone information, but you want to ignore it. This is somewhat unusual. If your timestamps don't have time zone information (i.e. are naive), simply use parse_datetime. It is just as fast.

ciso8601's People

Contributors

ajanuary avatar anemitz avatar banksy-git avatar brettcs avatar caseywebdev avatar chadwhitacre avatar explodingcabbage avatar felixxm avatar ianhoffman avatar jnrowe avatar kinow avatar kylebarron avatar movermeyer avatar nickg123 avatar norpol avatar peterbe avatar philfreo avatar suhailpatel avatar thomasst avatar viniciuschiele avatar wojcikstefan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ciso8601's Issues

Invalid datetime can produce partial parse. Should return None instead

I came across this silent failure when parsing a third party datetime with an unusual separator:

>>> import ciso8601

>>> ciso8601.parse_datetime('2017-05-29 17:36:00Z')
datetime.datetime(2017, 5, 29, 17, 36, tzinfo=<UTC>)

#Now for a completely invalid string
>>> print(ciso8601.parse_datetime('Completely invalid!'))
None

#Now for an out of spec string (invalid separator)
>>> ciso8601.parse_datetime('2017-05-29_17:36:00Z')
datetime.datetime(2017, 5, 29, 0, 0)

It seems that the parser sees a character that it doesn't recognize and stops parsing?

I would have expected that if the entire string cannot be parsed, it would return None as parse_datetime does for other parse failures, not silently return a datetime without time or tzinfo.

This lead to a sneaky bug in the code.

Incorrect parse for day = 0

In [1]: import ciso8601

In [2]: ciso8601.parse_datetime('20140200')
Out[2]: datetime.datetime(2014, 2, 1)

Expected the parse to fail, since there is no day 0 (in any month, let alone February).
Instead it succeeds because of this line.

unicode char in README.rst breaks installation with Python 3 and C locale

When installing ciso8601 as a package dependency, using Python 3 and C locale:

Searching for ciso8601
Reading https://pypi.python.org/simple/ciso8601/
Downloading https://pypi.python.org/packages/06/2e/2d7b09bb667bd7d862838c1ab7d0dd06be1de27ff60a7c1b0fb0db53fc93/ciso8601-1.0.5.tar.gz#md5=831bbf799722d34a5e60d91e1fd63cd6
Best match: ciso8601 1.0.5
Processing ciso8601-1.0.5.tar.gz
Writing /tmp/easy_install-b_gqrezf/ciso8601-1.0.5/setup.cfg
Running ciso8601-1.0.5/setup.py -q bdist_egg --dist-dir /tmp/easy_install-b_gqrezf/ciso8601-1.0.5/egg-dist-tmp-e66bnce5
Traceback (most recent call last):
  File "/var/lib/arvados/test/VENV3DIR/lib/python3.5/site-packages/setuptools/sandbox.py", line 158, in save_modules
    yield saved
  File "/var/lib/arvados/test/VENV3DIR/lib/python3.5/site-packages/setuptools/sandbox.py", line 199, in setup_context
    yield
  File "/var/lib/arvados/test/VENV3DIR/lib/python3.5/site-packages/setuptools/sandbox.py", line 254, in run_setup
    _execfile(setup_script, ns)
  File "/var/lib/arvados/test/VENV3DIR/lib/python3.5/site-packages/setuptools/sandbox.py", line 49, in _execfile
    exec(code, globals, locals)
  File "/tmp/easy_install-b_gqrezf/ciso8601-1.0.5/setup.py", line 4, in <module>
    # SPDX-License-Identifier: Apache-2.0
  File "/var/lib/arvados/test/VENV3DIR/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1561: ordinal not in range(128)

This is caused by non-ASCII character "µs" in README.rst

With some googling I found a discussion about this same problem relating to a different package:

arrow-py/arrow#208

Timezone information missing under python 3.5

I've noticed that when I'm parsing timestamp with timezone information under python 3.5 project I'm getting timezone unaware datetime objects:

Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
[...]
In [1]: import ciso8601

In [2]: t = ciso8601.parse_datetime('2014-01-09T21:48:00.921000+05:30')

In [3]: t.tzinfo
Out[3]: pytz.FixedOffset(330)
Python 3.5.1 (default, Sep 19 2016, 10:16:17) 
[...]
In [1]: import ciso8601

In [2]: t = ciso8601.parse_datetime('2014-01-09T21:48:00.921000+05:30')

In [3]: t.tzinfo

In [4]: type(t.tzinfo)
Out[4]: NoneType

Using ciso8601==1.0.2 in both cases. Tested on Fedora 24 with distro provided python binaries.

Add support for midnight (ie. "24:00:00")

ISO 8601 allows for a special case of midnight: 24:00:00.

For example 2007-04-05T24:00 is the same instant as 2007-04-06T00:00

But ciso8601 currently supports the latter:

In [1]: import ciso8601

In [2]: ciso8601.parse_datetime('2007-04-06T00:00')
Out[2]: datetime.datetime(2007, 4, 6, 0, 0)

In [3]: ciso8601.parse_datetime('2007-04-05T24:00')
Out[3]: None

Tested on ciso8601 version 1.0.7

parsing out of dates, times

I'm not sure if this is a performance feature, or bug, but the library currently doesn't throw exceptions when numeric date string parameters are outside the respected time field's range, i.e. setting a month to a value greater than 12 yields:

import ciso8601
print(ciso8601.parse_datetime('2014-13-01'))

2014-13-01 00:00:00.

I am wondering if this is the intended feature as a Python datetime object will throw a ValueError in this situation?

Change default branch name to `main`

The Git ecosystem has moved away from the default branch name of master, towards the name main.

e.g. All new repos on GitHub/GitLab/Bitbucket now use main as the default branch name for new repositories.

Change the default branch of ciso8601 to use main.

How is this done?

  1. Rename the default branch in the GitHub UI (Requires Administrator rights on the repo)
  2. Use git grep master and update mentions of the outdated branch name in documentation and URLs.

Developers with local clones will have to perform a one-time update of the local clones by running:

git branch -m master main
git fetch origin
git branch -u origin/main main
git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main

These commands are also shown to developers who visit the repo in the GitHub interface, so it doesn't require additional advertising work from our end.

Handle the changes introduced by ISO 8601-1:2019

ISO 8601-1:2019 changed the definition of what is a valid ISO 8601 timestamp.

For example, it explicitly disallowed the special case of 24:00 times.

Figure out how we want to handle the evolution of the specification.

Please provide wheels on PyPI

Hi there, I Googled python+dateutil+parse+slow and find the ciso8601. Thanks for developing the amazing library. Please consider provide wheels on PyPI to make it easier to install.

Fix compilation warnings

Variable declaration should be wrapped in #ifdef for Python versions where it's not needed (I believe >= 3.2).

module.c: In function ‘PyInit_ciso8601’:
module.c:414:15: warning: unused variable ‘pytz’ [-Wunused-variable]
     PyObject *pytz;

Also, should we trigger errors on compile warnings to catch these?

Regex version for the lazy...

https://gist.github.com/codehack/6350492822e52b7fa7fe

#!/usr/bin/python

import re

# i think i missed a couple of things in 8601 but this should cover 98% of cases.
iso8601 = re.compile(r'^(?P<full>((?P<year>\d{4})([/-]?(?P<mon>(0[1-9])|(1[012]))([/-]?(?P<mday>(0[1-9])|([12]\d)|(3[01])))?)?(?:T(?P<hour>([01][0-9])|(?:2[0123]))(\:?(?P<min>[0-5][0-9])(\:?(?P<sec>[0-5][0-9]([\,\.]\d{1,10})?))?)?(?:Z|([\-+](?:([01][0-9])|(?:2[0123]))(\:?(?:[0-5][0-9]))?))?)?))$')

# to perform the actual date match
m = iso8601.match('2014-12-05T12:30:45.123456-05:30')

# prints a dict with all matched groups
print m.groupdict()

# output: {'full': '2014-12-05T12:30:45.123456-05:30', 'mday': '05', 'hour': '12', 'min': '30', 'sec': '45.123456', 'mon': '12', 'year': '2014'}

Make warnings only fail the build during development

In #52 and #54, we made the compilation step fail if there were any warnings.

This is great for the detection of potential issues during development and PR review.

However, it is not great for production releases. ciso8601 is a source distribution, so each user compiles it with each installation. Users with exotic compiler configurations would have the installation of ciso8601 fail, because their compiler might be pickier about what it warns about.

Make ciso8601 only treat warnings as errors during development and PR review, and not during release installations.

FixedOffset.fromutc fails in 2.2.0 under python 3.6

import ciso8601

dt_ciso = ciso8601.parse_datetime('2021-01-01T12:12:01Z')
dt_ciso.tzinfo.fromutc(dt_ciso)

fails with ValueError: fromutc: non-None dst() result required. Previous version of ciso8601 worked (but it used python builtin tzinfo objects)

However, python builtin and pytz implementations work fine.

from datetime import datetime, timedelta, timezone
import pytz

dt_py = datetime(2021,1,1,12,12,1, tzinfo=timezone(timedelta(0)))
dt_py.tzinfo.fromutc(dt_py)

dt_pytz = datetime(2021,1,1,12,12,1, tzinfo=pytz.utc)
dt_pytz.tzinfo.fromutc(dt_pytz)

This is relevant when calling some_datetime.astimezone(instance_of_FixedOffset).

Add __version__ attribute

As part of my work on #55, I needed to get the version of ciso8601 programmatically.

This lead me to discover PEP 396, which defines the __version__ attribute for this purpose.
While the PEP is technically "differed", it seems that the majority of packages implement it.

Implement the __version__ attribute in ciso8601.

Draft a CONTRIBUTING.md

When I joined the project, there were a bunch of process things I had to learn.

I would like to create a CONTRIBUTING.md file, that would document the goals of the project, the tooling, and the process for contributing (see Atom for a complicated example).

Discussion: Change ciso8601 "marketing"/"brand" to incorporate changes to the ecosystem?

When ciso8601 was first created, there were no reasonable alternatives to it. By virtue of being implemented in C, it was simply much faster than Python implementations.

However, time have moved on, and there are viable alternatives, so the choice is more nuanced.

For example, as of Python 3.7, there exists datetime.fromisoformat that provides the inverse of datetime.isoformat. While it is not a complete replacement for ciso8601 (ex. it doesn't support formats like YYYYMMDD), for users that were only ever using ciso8601 to provide that inverse, they might be better served by using fromisoformat.

Further, the Pendulum project has implemented a fast C parser as well. While pendulum will be slower, they support a wider subset of the ISO 8601 spec.

So there is now a spectrum of quality parsers that users might want to use:

datetime.isoformat               ciso8601                        pendulum

<----- Performance                                    Completeness ----->

It might be time for ciso8601 to change its marketing to be more nuanced. Originally, when there were no real alternatives, it made sense to say:

"We're the best, no questions. Use ciso8601."

But perhaps now we should be doing something similar to a flow-chart:

  • If you only need support for Python 3.7, and just need an inverse to isoformat, use datetime.fromisoformat.
  • If you only need to parse datetimes with the most common subset of ISO 8601, use ciso8601.
  • If you need to parse datetimes with the entire ISO 8601 spec, use pendulum.

(Aside: I backported fromisoformat to pre-Python 3.7 versions: )

From a practical standpoint, this would involve mentioning alternatives in ciso8601's README (or perhaps a new document). It would also mean rewriting any other docs to tone down any rhetoric.

(Aside: In general, I'd like to see more collaboration between the various ISO 8601 parse projects, especially Pendulum and udatetime)

Unable to find method when using AWS Lambda

hello, so I used ciso8601 in a lambda function but I cannot use it because it is returning this error

{
    "errorMessage": "module 'ciso8601' has no attribute 'parse_datetime'",
    "errorType": "AttributeError",
    "stackTrace": [
        "  File \"/var/task/convertToUnix/function.py\", line 14, in lambda_handler\n    'date': convert(event['date'])\n",
        "  File \"/var/task/convertToUnix/function.py\", line 5, in convert\n    s = ciso8601.parse_datetime(text)\n"
    ]
}

this is the code

import json
import ciso8601
import  time
def convert(text):
    s = ciso8601.parse_datetime(text)
    return time.mktime(s.timetuple())

def lambda_handler(event, context):
    if event['date'] != None:
        response = {
           'date': convert(event['date'])
        }
    else:
        response = {
             'message':"error no passed date"
         }
    return response

Bad unicode characters not handled correctly

When parsing unicode strings with non-ascii characters the ValueError is not built correctly. In CPython this manifests itself as an empty ValueError, whereas in PyPy3 it actually causes a segfault.

For example

from ciso8601 import parse_datetime

try:
    parse_datetime("2019🐵01🐵01")
except ValueError as e:
    assert e.args and "Invalid character" in e.args[0]

will fail with either a segfault or an assertion error depending on your interpreter.

In the real world this was seen with non-ascii dashes - e.g. for "2019—01—01"

20140203T10::27 shouldn't parse seconds

A missing minutes fragment between two time separators should cause it to stop parsing and consider the rest of the string junk, but instead it treats minutes as zero.

Test:

ciso8601.parse_datetime('20140203T10::27')

Expected:

datetime.datetime(2014, 2, 3, 10, 0)

Actual:

datetime.datetime(2014, 2, 3, 10, 0, 27)

Unable to install on Windows

Hi all,

I'm trying to do a simple pip install on Windows 10, but I get the below error when trying to do so:

C:\Users\marti>pip install ciso8601
Collecting ciso8601
  Using cached ciso8601-2.1.3.tar.gz (15 kB)
Building wheels for collected packages: ciso8601
  Building wheel for ciso8601 (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: 'c:\users\marti\appdata\local\programs\python\python37\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\marti\\AppData\\Local\\Temp\\pip-install-uret8oh9\\ciso8601\\setup.py'"'"'; __file__='"'"'C:\\Users\\marti\\AppData\\Local\\Temp\\pip-install-uret8oh9\\ciso8601\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\marti\AppData\Local\Temp\pip-wheel-rd64cadi'
       cwd: C:\Users\marti\AppData\Local\Temp\pip-install-uret8oh9\ciso8601\
  Complete output (17 lines):
  running bdist_wheel
  running build
  running build_py
  package init file 'ciso8601\__init__.py' not found (or not a regular file)
  creating build
  creating build\lib.win-amd64-3.7
  creating build\lib.win-amd64-3.7\ciso8601
  copying ciso8601\__init__.pyi -> build\lib.win-amd64-3.7\ciso8601
  copying ciso8601\py.typed -> build\lib.win-amd64-3.7\ciso8601
  running build_ext
  building 'ciso8601' extension
  creating build\temp.win-amd64-3.7
  creating build\temp.win-amd64-3.7\Release
  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DCISO8601_VERSION=2.1.3 -Ic:\users\marti\appdata\local\programs\python\python37\include -Ic:\users\marti\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" /Tcmodule.c /Fobuild\temp.win-amd64-3.7\Release\module.obj
  module.c
  c:\users\marti\appdata\local\programs\python\python37\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.14.26428\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
  ----------------------------------------
  ERROR: Failed building wheel for ciso8601
  Running setup.py clean for ciso8601
Failed to build ciso8601
Installing collected packages: ciso8601
    Running setup.py install for ciso8601 ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\marti\appdata\local\programs\python\python37\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\marti\\AppData\\Local\\Temp\\pip-install-uret8oh9\\ciso8601\\setup.py'"'"'; __file__='"'"'C:\\Users\\marti\\AppData\\Local\\Temp\\pip-install-uret8oh9\\ciso8601\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\marti\AppData\Local\Temp\pip-record-5qx55zzz\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\marti\appdata\local\programs\python\python37\Include\ciso8601'
         cwd: C:\Users\marti\AppData\Local\Temp\pip-install-uret8oh9\ciso8601\
    Complete output (17 lines):
    running install
    running build
    running build_py
    package init file 'ciso8601\__init__.py' not found (or not a regular file)
    creating build
    creating build\lib.win-amd64-3.7
    creating build\lib.win-amd64-3.7\ciso8601
    copying ciso8601\__init__.pyi -> build\lib.win-amd64-3.7\ciso8601
    copying ciso8601\py.typed -> build\lib.win-amd64-3.7\ciso8601
    running build_ext
    building 'ciso8601' extension
    creating build\temp.win-amd64-3.7
    creating build\temp.win-amd64-3.7\Release
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DCISO8601_VERSION=2.1.3 -Ic:\users\marti\appdata\local\programs\python\python37\include -Ic:\users\marti\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" /Tcmodule.c /Fobuild\temp.win-amd64-3.7\Release\module.obj
    module.c
    c:\users\marti\appdata\local\programs\python\python37\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.14.26428\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\marti\appdata\local\programs\python\python37\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\marti\\AppData\\Local\\Temp\\pip-install-uret8oh9\\ciso8601\\setup.py'"'"'; __file__='"'"'C:\\Users\\marti\\AppData\\Local\\Temp\\pip-install-uret8oh9\\ciso8601\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\marti\AppData\Local\Temp\pip-record-5qx55zzz\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\marti\appdata\local\programs\python\python37\Include\ciso8601' Check the logs for full command output.

Errors while building wheel on Windows

module.c(508): error C2440: 'function': cannot convert from 'double' to 'const char *'
module.c(508): warning C4024: 'PyModule_AddStringConstant': different types for formal and actual parameter 3
module.c(508): error C2143: syntax error: missing ')' before 'constant'
module.c(508): error C2059: syntax error: ')'
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Professional\\VC\\Tools\\MSVC\\14.15.26726\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
Failed building wheel for ciso8601

Python 3.7.0 x64
VS Professional 2017 (15.8.5)
Windows 7 SP1 x64

New release

Thanks for fixing #85 in f21596e would you be able to do a release which includes the fix?

Currently we're downloading this dependency from git, rather than pypi, which is not ideal.

20140203 04:05:.123456 shouldn't parse fractional seconds

A missing seconds fragment between two time separators should cause it to stop parsing and consider the rest of the string junk, but instead it treats seconds as zero.

Test:

ciso8601.parse_datetime('20140203 04:05:.123456')

Expected:

datetime.datetime(2014, 2, 3, 4, 5)

Actual:

datetime.datetime(2014, 2, 3, 4, 5, 0, 123456)

Bad date causes crash

        dt = '2017-0012-27T13:35:19+0200'
        boom = ciso8601.parse_datetime(dt)

causes segmentation fault

parse_datetime wrongly accepts datetimes that mix extended and basic format

This isn't legal under ISO 8601 and should throw an error:

>>> ciso8601.parse_datetime('20010203T04:05:06Z')
datetime.datetime(2001, 2, 3, 4, 5, 6, tzinfo=datetime.timezone.utc)

So should this:

>>> ciso8601.parse_datetime('20010203T04:05')
datetime.datetime(2001, 2, 3, 4, 5)

The first is forbidden by the combination of section 4.3.2, which specifies the format for "complete" date and time representations, and section 4.4.4, about date interval representations. Section 4.3.2 says:

The hyphen [-] and the colon [:] shall be used, in accordance with 4.4.4, as separators within the date and time of day expressions, respectively, when required.

And section 4.4.4.1, says:

When the application identifies the need for a complete representation of a time interval, identified by its start and its end, it shall use an expression in accordance with 4.4.2 combining any two complete date and time of day representations as defined in 4.3.2, provided that the resulting expression is either consistently in basic format or consistently in extended format.

It's kind of weird that the section about how to represent complete datetimes basically says "Hey, to know all the rules about how to do this, go forward in the spec and read the unrelated section about datetime intervals, because we actually put some of our rules about what a legal datetime representation in there for some reason", but as far as I can tell that's exactly what section 4.3.2 says. It would've been more natural to put all the rules about what a legal complete datetime representation is in the section about complete datetime representations and refer back to that from the section about intervals, but that's not what the authors did. Meh.

The second example I gave above, with a "reduced accuracy" datetime, is more straightforward; it's banned by section 4.3.3, which says

For reduced accuracy, decimal or expanded representations of date and time of day, any of the
representations in [bla] followed immediately by the time designator [T] may be combined with any of the representations in [bla] provided that

a) [bla];
b) [bla];
c) [bla];
d) the expression shall either be completely in basic format, in which case the minimum number of separators necessary for the required expression is used, or completely in extended format, in which case additional separators shall be used in accordance with 4.1 and 4.2.

Release a new version

README mentions RFC3339 parsing, however latest release does not include this feature yet. It would be nice to cut a new release that includes it.

Add an RFC 3339 only mode

When parsing data that I know should be in RFC 3339, I'd like a strict mode that rejects non-RFC3339-compliant strings like 2018-07-12 or 2018-07-12T14Z or 2018-07-12T14:08:12.

(Mostly, I just want to be sure that I don't end up with a naive datetime when I expected a timezone-aware one, but validating that my data is actually RFC 3339 would be even better.)

It'd be nice to have a parse_rfc3339() function that is stricter than parse_datetime() and raises if it gets input that's valid ISO 8601 but not valid RFC 3339.

__version__ attribute just reads "CISO8601_VERSION"

Thanks for the great work on this library. I just upgraded some dateparsing code to use it and saw massive performance gains. I also noticed a minor bug -- the ciso_8601 module isn't correctly setting the "version" attribute:

# encoding: utf-8
# module ciso8601
# from /opt/pyre/lib/python3.5/site-packages/ciso8601.cpython-35m-x86_64-linux-gnu.so
# by generator 1.145
# no doc
# no imports

# Variables with simple values

__version__ = 'CISO8601_VERSION'

# functions

def parse_datetime(*args, **kwargs): # real signature unknown
    """ Parse a ISO8601 date time string. """
    pass

def parse_datetime_as_naive(*args, **kwargs): # real signature unknown
    """ Parse a ISO8601 date time string, ignoring the time zone component. """
    pass

def parse_rfc3339(*args, **kwargs): # real signature unknown
    """ Parse an RFC 3339 date time string. """
    pass

# no classes
# variables with complex values

__loader__ = None # (!) real value is ''

__spec__ = None # (!) real value is ''

I expect __version__ should equal 2.1.1, not CISO8601_VERSION. I peeked a bit into the C code and noticed this is set from there, using a macro defined in setup.py... I don't have much experience building python modules from C, but I'm happy to look into this if ya'll are jammed.

Thanks again for the great work.

Failed building wheel for ciso8601 Ubuntu 18.04

Hi Guys,

I'm having a hard time trying to install ciso8601==2.1.3
I already installed the wheel package (saw it on another issue).
Can you suggest a way to get it right?

Ubuntu 18.04
python 3.6.9
pip 9.0.1

Full Output

(venv) v4l3nt1n@v4l3nt1n-nb:~/_work/_dev/_dw/_dev/process-machine-project$ pip install ciso8601
Collecting ciso8601
Using cached https://files.pythonhosted.org/packages/2c/da/626910cf8aca7ed2d5b34355eee8aeaaeb6ddd4e16f98d00a9e2ddad3a08/ciso8601-2.1.3.tar.gz
Building wheels for collected packages: ciso8601
Running setup.py bdist_wheel for ciso8601 ... error
Complete output from command /home/v4l3nt1n/_work/_dev/_dw/_dev/process-machine-project/venv/bin/python3.6 -u -c "import setuptools, tokenize;file='/tmp/pip-build-j0oq0ixx/ciso8601/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/tmpbfg9puakpip-wheel- --python-tag cp36:
running bdist_wheel
running build
running build_py
package init file 'ciso8601/init.py' not found (or not a regular file)
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/ciso8601
copying ciso8601/init.pyi -> build/lib.linux-x86_64-3.6/ciso8601
copying ciso8601/py.typed -> build/lib.linux-x86_64-3.6/ciso8601
running build_ext
building 'ciso8601' extension
creating build/temp.linux-x86_64-3.6
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DCISO8601_VERSION=2.1.3 -I/home/v4l3nt1n/_work/_dev/_dw/_dev/process-machine-project/venv/include -I/usr/include/python3.6m -c module.c -o build/temp.linux-x86_64-3.6/module.o
unable to execute 'x86_64-linux-gnu-gcc': No such file or directory
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1


Failed building wheel for ciso8601
Running setup.py clean for ciso8601
Failed to build ciso8601
Installing collected packages: ciso8601
Running setup.py install for ciso8601 ... error
Complete output from command /home/v4l3nt1n/_work/_dev/_dw/_dev/process-machine-project/venv/bin/python3.6 -u -c "import setuptools, tokenize;file='/tmp/pip-build-j0oq0ixx/ciso8601/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-7xs_tlks-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/v4l3nt1n/_work/_dev/_dw/_dev/process-machine-project/venv/include/site/python3.6/ciso8601:
running install
running build
running build_py
package init file 'ciso8601/init.py' not found (or not a regular file)
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/ciso8601
copying ciso8601/init.pyi -> build/lib.linux-x86_64-3.6/ciso8601
copying ciso8601/py.typed -> build/lib.linux-x86_64-3.6/ciso8601
running build_ext
building 'ciso8601' extension
creating build/temp.linux-x86_64-3.6
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DCISO8601_VERSION=2.1.3 -I/home/v4l3nt1n/_work/_dev/_dw/_dev/process-machine-project/venv/include -I/usr/include/python3.6m -c module.c -o build/temp.linux-x86_64-3.6/module.o
unable to execute 'x86_64-linux-gnu-gcc': No such file or directory
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

----------------------------------------

Command "/home/v4l3nt1n/_work/_dev/_dw/_dev/process-machine-project/venv/bin/python3.6 -u -c "import setuptools, tokenize;file='/tmp/pip-build-j0oq0ixx/ciso8601/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-7xs_tlks-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/v4l3nt1n/_work/_dev/_dw/_dev/process-machine-project/venv/include/site/python3.6/ciso8601" failed with error code 1 in /tmp/pip-build-j0oq0ixx/ciso8601/

Thank you very much for this lib and work!!

Failed building wheel for ciso8601

Can't install ciso8601

Traceback

  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-u2k0mvob/ciso8601/setup.py'"'"'; __file__='"'"'/tmp/pip-install-u2k0mvob/ciso8601/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-ey7cx4f6
       cwd: /tmp/pip-install-u2k0mvob/ciso8601/
  Complete output (15 lines):
  running bdist_wheel
  running build
  running build_py
  package init file 'ciso8601/__init__.py' not found (or not a regular file)
  creating build
  creating build/lib.linux-x86_64-3.8
  creating build/lib.linux-x86_64-3.8/ciso8601
  copying ciso8601/__init__.pyi -> build/lib.linux-x86_64-3.8/ciso8601
  copying ciso8601/py.typed -> build/lib.linux-x86_64-3.8/ciso8601
  running build_ext
  building 'ciso8601' extension
  creating build/temp.linux-x86_64-3.8
  gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DCISO8601_VERSION=2.1.3 -I/usr/local/include/python3.8 -c module.c -o build/temp.linux-x86_64-3.8/module.o
  unable to execute 'gcc': No such file or directory
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for ciso8601

How to reproduce

Try to install it in clean Docker python3.8 image, e.g. python:3.8-slim-buster

Add datetime.fromisoformat to the benchmarks

Python 3.7 added datetime.fromisoformat to the datetime library, which might be another target to add to the benchmarks without timezones for completeness.

Now a few remarks about the function:

  • It is only meant to parse the result of isoformat, so it isn't super flexible compared to what you provide
  • I looked at the implementation and while it should be faster than using the equivalent strptime, it is still implemented in pure Python, so it's highly unlikely to beat your implementation.

Backport datetime.timezone to older Python versions?

Starting in Python 3.7, datetime.timezone objects became accessible through the C-API.

ciso8601 v2.0.0 is scheduled to take advantage of this when compiled against Python 3.7+. The performance improvements are dramatic.

ciso8601 should consider creating a C implementation of a tzinfo subclass. This could be a backport of cPython's timezone class, or a new implementation. Having such an implementation would:

  • allow Python < 3.7 to experience the same performance gains
  • allow ciso8601 to sever its dependency on pytz.

Parsing bad date strings crashes pypy interpreter

Nice job on the library guys. I'm not sure if this is an issue with pypy or with the library. If you pass in a bad string, this library crashes the interpreter entirely:

$ pypy
Python 2.7.9 (295ee98b69288471b0fcf2e0ede82ce5209eb90b, Jun 02 2015, 18:26:45)
[PyPy 2.6.0 with GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>> import ciso8601
>>>> ciso8601.parse_datetime_unaware('not-a-date')
pypy(36682,0x7fff7b7db300) malloc: *** error for object 0x1016df220: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

FWIW normal python 2.7.10 doesn't have the same problem:

$ python
Python 2.7.10 (default, Jul 22 2015, 21:18:21) 
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ciso8601
>>> ciso8601.parse_datetime_unaware('not-a-date')
>>> 

Thanks

Incorrect parsing "24" hour

Hello!

Stumbled on this strange behavior, consider myself as a bug, what do you think?

Python 3.8.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ciso8601
>>> ciso8601.parse_datetime("2020-02-28T23:00:00Z")
datetime.datetime(2020, 2, 28, 23, 0, tzinfo=datetime.timezone.utc)
>>> ciso8601.parse_datetime("2020-02-28T24:00:00Z")
datetime.datetime(2020, 2, 29, 0, 0, tzinfo=datetime.timezone.utc)
>>> ciso8601.parse_datetime("2020-02-28T25:00:00Z")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: hour must be in 0..23

It is not only passes "24" hour, but also returns different date. Logic is obvious (next day, 00 hour), but behavior is incorrect IMHO
"2020-02-28T24:00:00Z" != datetime.datetime(2020, 2, 29, 0, 0, tzinfo=datetime.timezone.utc)

Also, 24 is not in range 00..23

ciso8601 should specify and enforce a code style

The codebase of ciso8601 is a mix of various C coding/formatting styles.

The classic example in C is where opening { should go. In ciso8601, we have examples of both:

ciso8601 should specify a C formatting style and enforce it.
This could then be put into a .clang-format file, so that developer's IDE's could enforce the style as they code.

While I don't care which format is chosen, I do want ciso8601 to have a consistent style and for my contributions to match that style.

Create a feature comparison chart with other popular ISO 8601 parsers

Potential users should be able to compare the capabilities of ciso8601 with other popular ISO 8601 parsers.

Other parsers might look attractive at first glance, but many have bugs or limitations.

ciso8601 itself has some limitations (I think we already do a good job of documenting that though).

Other libraries have limited support for platforms (ex. udatetime doesn't support Windows).

Make a comparison document or chart that will help people see the differences between the libraries.

Automate the release process

As it stands, there are a number of actions needed in order to release a new version of ciso8601

  1. Bump the VERSION in setup.py
  2. Add the version to the CHANGELOG.md
  3. Add a git tag with the version number
  4. Create a draft release in GitHub
  5. Build the release artifacts
  6. Upload the release artifacts to the draft release
  7. Upload the release artifacts to PyPI (both test.pypi.org and pypi.org)
  8. Publish the GitHub release

We should be able to automate (nearly?) all of this.

Problems

  • These steps are manual and could be automated
    • Manual steps are error-prone / could be forgotten
  • The actual upload to PyPI can only be done by members of the Close Engineering team
    • This is because they are the only ones with the deployment keys for PyPI.
    • Maintainers (like myself) have to reach out to them in order to cut a release
    • This adds delays to our ability to release

Outline of a solution

We can use GitHub Actions to automate (nearly?) everything. Here's my current thinking of how it would work. I'm testing these configurations out on this test repo.

I imagine it being broken up into three GitHub Actions:

  1. Create the draft GitHub Release
  2. Add the release artifacts to the draft GitHub Release
  3. Publish the release artifacts to PyPI

1. Create the draft GitHub Release

Trigger

A developer pushes a tag that matches our expected version number format: v1.2.3

Results

  1. A new draft release is created in GitHub
  2. A build of the wheels is kicked off in CircleCI (using a POST to the API endpoint)

2. Add the release artifacts to the draft GitHub Release

Trigger

The build of the wheels is complete in CircleCI, and a webhook is sent to GitHub.

This is not trivial.

CircleCI no longer has first-party support for sending webhooks on build completion. Instead, you are supposed to use a "notification orb" to do it. I found an orb that sends a webhook that looks like how the first-party webhooks from CircleCI used to look like.

GitHub expects a specific format for the webhooks that can trigger a repository_dispatch event.

I'll likely fork the orb and publish a new orb that formats the webhook exactly how GitHub expects it to be.

Results

  1. The artifacts from the wheel build are uploaded to the draft release

3. Publish the release artifacts to PyPI

Trigger

When the developer publishes the draft release in the GitHub UI.

Results

  1. The artifacts from the release are uploaded to Test PyPI
  2. The artifacts from the release are uploaded to real PyPI

This can be done using the convenient pypa/gh-action-pypi-publish GitHub action, using GitHub repository Secrets. Secrets are encrypted values that GitHub can decrypt within the GitHub Action containers. That way, no one ever sees the decrypted values, and means that maintainers can be given the ability to publish to PyPI without needing the keys to the kingdom. Further, the secret would be a PyPI token that can be scoped to only work for this specific project.

Resulting Workflow

This is what a developer would have to do after this automation was in place:

  1. Push a git tag with the version number
  2. Wait until the build artifacts are present on the draft GitHub release
  3. Click publish on the GitHub release

Notes and Q&A

GitHub Workflows can't trigger other workflows

GitHub Workflows can't trigger other workflows. But workflows can be triggered by the completion of other workflows (workflow_run).

The important distinction is that there is no way (AFAIK) to pass the parameters from the upstream workflow to the downstream one.

Idempotency and error handling

There are many state changes described by this automation. Since these are not being done within any form of transaction, it will be important that each step is idempotent, so that the process can be simply restarted/retried without worrying about the state of the system after the failure.

Speaking of retries, there needs to be mechanisms to manually trigger retries in cases of failures. Neither GitHub nor CircleCI make this easy. In GitHub, you can use the workflow_dispatch trigger to manually trigger a workflow, though providing the same parameters that they workflow needs does not seem to be trivial. For example, I haven't yet figured out (I haven't put any effort into it yet either) how to do github.ref || github.event.inputs.ref yet.

In CircleCI, you can guard a workflow with a when clause and then the only way to trigger that workflow is via a POST to the API.

Why not switch entirely off of CircleCI for GitHub Actions?

CircleCI allows for arbitrary Docker containers for your builds, while GitHub Actions only provides a few "blessed" runners.

Since we are hoping to use the manylinux docker containers as part of the build, we cannot switch entirely to GitHub Actions.

Cache the timezone objects

Overview

Here is a flame graph of where the time is spent in ciso8601 v2.0.0:

flame_graph

This is a 10 million calls to parse_datetime with a timestamp with timezone information on Python 3.7.

Overall time: 1.14s

Action C function Time spent Percentage
Converting the input Python string to C string PyArg_ParseTuple 0.24s 21%
Creating the timedelta for the timezone object new_delta_ex 0.34s 30%
Creating the timezone object for the datetime new_timezone 0.15s 13%
Creating the final datetime new_datetime_ex 0.25s 22%
Rest of computation 0.16s 14%

Caching timezones

If we were to cache the timezone objects as we created them, then subsequent parses with the same timezone information would save that time.
This eliminates the expensive call to new_delta_ex, as well as the less expensive call to new_timezone. These are 43% of the run time.

I assert that in most use cases, only a handlful of offsets are ever used, so the performance would be much better after the first few parses.

There are only ((1440 - 1) * 2) + 1 = 2879 valid offsets in ISO 8601 (that Python supports), so we can also bound the maximum amount of memory used.

Build fails with GCC 10.2 and PyPy 7.3.3

I have versions:

$ pypy3 --version
Python 3.7.9 (7e6e2bb30ac5fbdbd443619cae28c51d5c162a02, Nov 24 2020, 10:03:59)
[PyPy 7.3.3-beta0 with GCC 10.2.0]

$ gcc --version
gcc (GCC) 10.2.0

When I run CFLAGS="$CFLAGS -fcommon" pypy3 -m pip install git+https://github.com/closeio/ciso8601.git, I get:

Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/closeio/ciso8601.git
  Cloning https://github.com/closeio/ciso8601.git to /tmp/pip-req-build-zank7ym7
  Running command git clone -q https://github.com/closeio/ciso8601.git /tmp/pip-req-build-zank7ym7
Building wheels for collected packages: ciso8601
  Building wheel for ciso8601 (setup.py): started
  Building wheel for ciso8601 (setup.py): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/pypy3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-zank7ym7/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-zank7ym7/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-k0gy_icr
       cwd: /tmp/pip-req-build-zank7ym7/
  Complete output (27 lines):
  running bdist_wheel
  running build
  running build_py
  package init file 'ciso8601/__init__.py' not found (or not a regular file)
  creating build
  creating build/lib.linux-x86_64-3.7
  creating build/lib.linux-x86_64-3.7/ciso8601
  copying ciso8601/__init__.pyi -> build/lib.linux-x86_64-3.7/ciso8601
  copying ciso8601/py.typed -> build/lib.linux-x86_64-3.7/ciso8601
  running build_ext
  building 'ciso8601' extension
  creating build/temp.linux-x86_64-3.7
  gcc -pthread -DNDEBUG -O2 -O2 -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite -mtune=broadwell -pipe -fstack-protector-explicit -fno-plt -g -gdwarf -gz -ggdb -Wno-error=deprecated-declarations -fcommon -fPIC -DCISO8601_VERSION=2.1.3 -I/opt/pypy3/include -c module.c -o build/temp.linux-x86_64-3.7/module.o
  module.c: In function ‘_parse’:
  module.c:446:30: warning: implicit declaration of function ‘PyTimeZone_FromOffset’ [-Wimplicit-function-declaration]
    446 |                     tzinfo = PyTimeZone_FromOffset(delta);
        |                              ^~~~~~~~~~~~~~~~~~~~~
  module.c:446:28: warning: assignment to ‘PyObject *’ {aka ‘struct _object *’} from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
    446 |                     tzinfo = PyTimeZone_FromOffset(delta);
        |                            ^
  module.c: In function ‘PyInit_ciso8601’:
  module.c:562:11: error: ‘PyDateTime_TimeZone_UTC’ undeclared (first use in this function); did you mean ‘PyDateTime_Time’?
    562 |     utc = PyDateTime_TimeZone_UTC;
        |           ^~~~~~~~~~~~~~~~~~~~~~~
        |           PyDateTime_Time
  module.c:562:11: note: each undeclared identifier is reported only once for each function it appears in
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for ciso8601
  Running setup.py clean for ciso8601
Failed to build ciso8601
Installing collected packages: ciso8601
    Running setup.py install for ciso8601: started
    Running setup.py install for ciso8601: finished with status 'error'
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/pypy3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-zank7ym7/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-zank7ym7/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-bxkrj8i3/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/jeff/.local/include/python3.7/ciso8601
         cwd: /tmp/pip-req-build-zank7ym7/
    Complete output (27 lines):
    running install
    running build
    running build_py
    package init file 'ciso8601/__init__.py' not found (or not a regular file)
    creating build
    creating build/lib.linux-x86_64-3.7
    creating build/lib.linux-x86_64-3.7/ciso8601
    copying ciso8601/__init__.pyi -> build/lib.linux-x86_64-3.7/ciso8601
    copying ciso8601/py.typed -> build/lib.linux-x86_64-3.7/ciso8601
    running build_ext
    building 'ciso8601' extension
    creating build/temp.linux-x86_64-3.7
    gcc -pthread -DNDEBUG -O2 -O2 -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote -mno-ptwrite -mtune=broadwell -pipe -fstack-protector-explicit -fno-plt -g -gdwarf -gz -ggdb -Wno-error=deprecated-declarations -fcommon -fPIC -DCISO8601_VERSION=2.1.3 -I/opt/pypy3/include -c module.c -o build/temp.linux-x86_64-3.7/module.o
    module.c: In function ‘_parse’:
    module.c:446:30: warning: implicit declaration of function ‘PyTimeZone_FromOffset’ [-Wimplicit-function-declaration]
      446 |                     tzinfo = PyTimeZone_FromOffset(delta);
          |                              ^~~~~~~~~~~~~~~~~~~~~
    module.c:446:28: warning: assignment to ‘PyObject *’ {aka ‘struct _object *’} from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
      446 |                     tzinfo = PyTimeZone_FromOffset(delta);
          |                            ^
    module.c: In function ‘PyInit_ciso8601’:
    module.c:562:11: error: ‘PyDateTime_TimeZone_UTC’ undeclared (first use in this function); did you mean ‘PyDateTime_Time’?
      562 |     utc = PyDateTime_TimeZone_UTC;
          |           ^~~~~~~~~~~~~~~~~~~~~~~
          |           PyDateTime_Time
    module.c:562:11: note: each undeclared identifier is reported only once for each function it appears in
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/pypy3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-zank7ym7/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-zank7ym7/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-bxkrj8i3/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/jeff/.local/include/python3.7/ciso8601 Check the logs for full command output.

Primary issue appears to be missing PyDateTime_TimeZone_UTC, which is new in Python 3.7 but this version of PyPy claims to implement Python 3.7.

GCC 10 causes these issues sometimes since it switched to the default of -fno-common, but that doesn't seem to matter here since adding -fcommon to the CFLAGS doesn't change anything.

Invalid date can be supplied. Should throw value error instead

>>> datetime.datetime(2016, 11,  31)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: day is out of range for month
>>> ciso8601.parse_datetime_unaware("2016-11-31T12:34:34.521059")
datetime.datetime(2016, 11, 31, 12, 34, 34, 521059)
>>> datetime.datetime(2016, 11, 31, 12, 34, 34, 521059)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: day is out of range for month
>>> 

Allow a specific format to be specified

I would like to be able to specify a specific format that a date/time must match to be considered valid, something like:

ciso.parse_datetime('2016-11-16T12:34:56.789', '%Y-%m-%dT%H:%M:%S.%f')
#or
ciso_parse_datetime('2016-11-16T12:34:56.789', ciso.DATE_TIME_WITH_MICROSECONDS)

Since this is a library for parsing ISO8601 date/times I think it would make sense to limit this to a strict set of formats (I do not want to be able to parse any arbitrary format).

This makes it easier to be strict on the format of date accepted for APIs etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.