jaymon / datatypes Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 1.0 2.56 MB

My personal standard library

License: MIT License

Python 100.00%

datatypes's People

Contributors

Stargazers

Watchers

Forkers

richlysakowski

datatypes's Issues

endpoints.http.Headers should be moved to this module

Both Headers and Environ could live in this module since they seem generic enough

Lightweight image handler

It would be great to pull some of the image handling code from testdata to create pure python image handlers that could give basic information about images like height and width or if a gif is animated without having to install a more substantial external library

html Tokenizer

Something like:

t = HTMLTokenizer(s, "a")
for atag in t:
    pout.v(atag)

It would be nice to have some light parsing functionality without having to pull in beautiful soup. I think the string.HTMLCleaner could be modified to do this

Datatypes's compat.py` module should be comprehensive

This way, on smaller projects I can just install datatypes and do from datatypes.compat import * in my files instead of creating my own compat.py, or use it as a placeholder in other projects's compat.py modules:

# <NEWPROJECT>/compat.py
from datatypes.compat import *

# customize for <NEWPROJECT>

String.xmlescape method?

Would it be worth making this a method on the String object?

from xml.sax.saxutils import escape

def xmlescape(data):
    return escape(data, entities={
        "'": "&apos;",
        "\"": "&quot;"
    })

via

Move endpoints.utils.Deepcopy into here

endpoints.utils.Deepcopy

probably into a copy.py module

Move prom's query.CacheNamespace to a thread.py module

Change the name also

https://github.com/Jaymon/prom/blob/master/prom/query.py#L1432

Path.*_class methods

Might be worth import classproperty and converting these methods (ie, path_class, file_class, etc) into dynamic class properties

Path module future additions

Here are the various path versions I have and some of the stuff I might still want to port over from them:

testdata.path - this was the primary base for datatypes.path, there are some things I didn't port over: Dirpath.create_files, Dirpath.create_files and none of the modules stuff (eg, Modulepath and the Dirpath module methods) was ported over. I didn't bring over the copy_into/put_into because you can just switch target and dest and get the same behavior with the current copy_to functionality.
stockton.path - This has a lot of utility methods that I didn't move over because I wasn't sure how niche they are, for example, writelines, contains, and delete_lines. If I find a need for these methods then they should be moved over, if not, if/when I update stockton I should just have stockton's classes extend datatype's classes and layer on the methods that stockton uses. I did move over and expand the Sentinal stuff
heard.path - I started porting and expanding the zip_to() functionality from this module and realized it was going to be more work than I wanted to do for the first pass, so I abandoned it mid port. I did port and expand the Tempdir from this module and also ported over the ext argument you can pass to Filepath.
bang.path - I probably want to bring over the Image code from this module, and also the Directory.create_file(), Directory.has_file(), and Directory.file_contents() methods. I might even want to bring over the DataDirectory class.

CSV context manager should create a file if path isn't defined

so if there is no path then create a tempfile that the CSV can write to?

CSV can't have unicode in headers

if the fieldnames have unicode characters it chokes.

SchemaDict

Similar to defaultdict, you could do something like this:

d = SchemaDict(foo=dict, bar=0, che="", boos=list)

d["foo"] # {}
d["bar"] # 0
d["che"] # ""
d["boos"] # []

So it's basically a default dict where you can have multiple keys

string.HTML.escape() method

that basically just wraps the standard library:

python 2.7:

import cgi
cgi.escape("<STRING>")

and python 3:

import html
html.escape("<STRING>")

Request parse user agent

Found this in some old application code, could probably be moved into Request core:

    def parse_user_agent(self, user_agent):
        """parses any user agent string to the best of its ability and tries not
        to error out"""
        d = {}

        regex = "^([^/]+)" # 1 - get everything to first slash
        regex += "\/" # ignore the slash
        regex += "(\d[\d.]*)" # 2 - capture the numeric version or build
        regex += "\s+\(" # ignore whitespace before parens group
        regex += "([^\)]+)" # 3 - capture the full paren body
        regex += "\)\s*" # ignore the paren and any space if it is there
        regex += "(.*)$" # 4 - everything else (most common in browsers)
        m = re.match(regex, user_agent)
        if m:
            application = m.group(1)
            version = m.group(2)
            system = m.group(3)
            system_bits = re.split("\s*;\s*", system)
            tail = m.group(4)

            # common
            d['client_application'] = application
            d['client_version'] = version
            d['client_device'] = system_bits[0]

            if application.startswith("Mozilla"):
                for browser in ["Chrome", "Safari", "Firefox"]:
                    browser_m = re.search("{}\/(\d[\d.]*)".format(browser), tail)
                    if browser_m:
                        d['client_application'] = browser
                        d['client_version'] = browser_m.group(1)
                        break

        return d

and the test:

    def test_user_agent(self):
        user_agents = [
            (
                "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36",
                {
                    'client_application': "Chrome",
                    'client_version': "44.0.2403.157",
                    'client_device': "Windows NT 6.3"
                }
            ),
            (
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36",
                {
                    'client_application': "Chrome",
                    'client_version': "44.0.2403.157",
                    'client_device': "Macintosh"
                }
            ),
            (
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:40.0) Gecko/20100101 Firefox/40.0",
                {
                    'client_application': "Firefox",
                    'client_version': "40.0",
                    'client_device': "Macintosh"
                }
            ),
            (
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/600.7.12 (KHTML, like Gecko) Version/8.0.7 Safari/600.7.12", # Safari
                {
                    'client_application': "Safari",
                    'client_version': "600.7.12",
                    'client_device': "Macintosh"
                }
            ),
            (
                "curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8x zlib/1.2.5",
                {
                    'client_application': "curl",
                    'client_version': "7.24.0",
                    'client_device': "x86_64-apple-darwin12.0"
                }
            )
        ]

        for user_agent in user_agents:
            d = self.user_agent(user_agent[0])
            self.assertDictContainsSubset(user_agent[1], d)

String.indent

I got what looked like some funky behavior, it didn't look like:

String("\nFOO\n").indent(1)

was indenting as expected, so I should add some tests to make sure it is doing what is expected, I would expect something like (period represents spaces):

....\n
....FOO\n

and I think it might be stripping that last \n because it is the last character

Move prom's utils.get_objects() into reflect classes/functions here

in a reflect.py module. I would've done it already but I couldn't decide on what interface I wanted so I just left it in prom for right now

Add testing if a file is binary to Filepath

If I needed a better check on if the file is binary, look here:

The gist seems to be to open the file and check for the NULL byte (b'\0').

I did something like this and it worked for what I needed, but at some point I might want to make this a Filepath method and flesh it out:

import mimetypes

def is_binary(ext):
    t = mimetypes.guess_type(ext)
    return "plain" in t or "text" in t

collections.iset

a case-insensitive set

Timedelta that can take months, years

python docs

Current datetime.timedelta doesn't take months or years, it would be nice if it could.

String encoding thing

I had this in another library's String class:

        if not encoding:
            # ??? use chardet to figure out what encoding val is?
            # https://stackoverflow.com/questions/196345/how-to-check-if-a-string-in-python-is-in-ascii/6988354#6988354
            encoding = sys.getdefaultencoding()

I'm just saving this, because I switched the library over to use datatypes's String class so this was going to get deleted.

Conventions cleanup

Url takes *paths when it should take *parts to be consistent with Path.
Both Url and Path use .create() for creating a new instance, all of this should be renamed to create_instance, this would free up .create to be used by children for whatever. Right now it's just a tad too confusing

Character class accept \N{name} escape sequence

Discussed here:

Escape Sequence: \N{name} meaning "Character named name in the Unicode database"

String.indent

Move in Pout's pout.utils.String.indent method to the String class:

    def indent(self, indent_count):
        '''
        add whitespace to the beginning of each line of val

        link -- http://code.activestate.com/recipes/66055-changing-the-indentation-of-a-multi-line-string/

        val -- string
        indent -- integer -- how much whitespace we want in front of each line of val

        return -- string -- val with more whitespace
        '''
        if indent_count < 1: return self

        s = ((environ.INDENT_STRING * indent_count) + line for line in self.splitlines(False))
        s = "\n".join(s)
        return type(self)(s)

There is a headers.Environ and an environ.Environ

They clobber each other, I just did:

from datatypes import Environ

Thinking I was importing environ.Environ and instead got headers.Environ

I think I should rename headers.Environ since I think that will be the less common one. Maybe HTTPEnviron?

Character class narrow unicode

I was seeing some interesting behavior when python2 had only unicode ucs2 support:

$ python
Python 2.7.18 (default, Sep  1 2020, 16:08:16)
>>> s = u'\uD859\uDFCC'
>>> s
u'\U000267cc'

u'\uD859\uDFCC'.encode("UTF-32").decode("UTF-32")
u'\U000267cc'

It was taking the utf-16 hex codes (\uD859 and \uDFCC) and converting them to the utf-32 hex code (\U000267cc) behind the scenes. I have methods like repr_string and repr_bytes and I might want to add some utf-8 (bytes), utf-16 (the \u values) and utf-32 (the \U values) methods just so you can get more information about the character. To see how all these come together, you can use fileformat.info and these are some pages I had open:

search:

python utf16 to utf32
convert utf16 to utf32

Datetime can't be compared to Date

from datetime import date

d = date()
dt = Datetime

d <= dt # TypeError: can't compare Datetime to datetime.date

I think there are methods I can override to make this compare possible

alias decorator

Would something like this work?

class Foobar(object):
    @alias("bar")
    def foo(self):
        return "foo"

fb = Foobar()
fb.foo() # foo
fb.bar() # foo

OrderedSet

This was in the caches comments, a version of this would go great here because I was surprised I didn't already have it:

https://code.activestate.com/recipes/576694/

bang.utils classes

Bang utils has a bunch of utility classes that might be nice:

ContextCache - provides a namespace cache
Scanner - Python implementation of Obj-c Scanner
UnlinkedTagTokenizer - This will go through an html block of code and return pieces that aren't linked (between and ), allowing you to mess with the blocks of plain text that isn't special in some way

Update Url class to common syntax

looks like query should be called url search params.

The complete url spec

CSV strict

you should be able to set a strict value to True and if strict is True then if you are missing a fieldname in the dict to add or if you have extra fieldnames then it should throw an error.

Path.grep

We have an rglob and reglob and it would be nice to have an rcontains and recontains also, which would search for matching files and then also match on what the file contains, basically doing something like this:

for p in basedir.rglob("<PATTERN>"):
    if "<STRING>" in p.read_text():
        # do something if it matches

Instead, you could do something like:

for p in basedir.rcontains("<STRING>", "<PATTERN>"):
    # do something if it matches

SFTPDirpath and SFTPFilepath

I've been trying to figure out a good interface for SFTP, and these types of classes would keep the same path interface but extend it to SFTP, so the idea would be you do this:

d = SFTPDirpath("/foo/bar")
for f in d:
    f.cp("local/filepath")

A big issue with this interface is when does it connect and disconnect, could use the io interface of .open and .close with a context manager, so something like:

d = SFTPDirpath("/foo/bar")
with d.open(username, password):
    for f in d:
        f.cp("local/filepath")

StreamTokenizer.count() method

Sometimes you just want to know how many tokens you have, it would be nice to be able to do something like this:

s = String("foo bar che")
s.tokenize().count() # 3

This could be done by saving the position, calling .readall() and then reseting the position

Convert the Path.create* methods to create_instance methods

So create is just too loaded a name to use raw. I think I should rename create to create_instance and create_file to create_file_instance, etc.

I could keep create_as the same, but I could also rename it to create_as_instance.

Add streaming to HTTPClient

You can stream files from requests like this:

# https://stackoverflow.com/a/16696317/5006
r = requests.get(url, stream=True)

with open("some/path", 'wb') as f:
    for chunk in r.iter_content(chunk_size=1024): 
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)

I'd love for http.HTTPClient to have this functionality also, but maybe if I ever needed it it would be better to just install Requests

Dict.rget

works the same way as ritems, it will find the matching the first matching key no matter how far buried in the dict it is

move testdata's basic logging into here?

Seems like it would be nice to have it outside of testdata sometimes

string.Regex.count() method

It would be nice to have a count method, because sometimes you just want to count how many of something you have:

s = String("foo bar foo")
s.regex(r"foo").count()

Datetime setting 6 months into the past or future

It would be great to turn this into a supported feature:

d = Datetime()
six_months_ago = Datetime(d.year + ((d.month - 5) / 12), ((d.month - 5) % 12) - 1, d.day)

via How do I calculate the date six months from the current date using the datetime Python module?

search:

python go back 6 months

Datetime should be timezone aware

Right now, Datetime creates a naive datetime with UTC time. It should probably create a UTC pegged datetime that has the tzinfo set to datetime.timezone.utc.

My guess is this would be more annoying than I think, because if you've got a tz set then you can't compare against naive datetimes and stuff, but it would be worth looking into making this work.

https://docs.python.org/3/library/datetime.html

String.wrap using String.truncate method

It would be great to add a wrap method that will split on word boundaries, then use that method in Captain Jaymon/captain#54

Basically, something like:

s = String("foo bar che")
s.wrap(5) # foo\nbar\nche
s.wrap(2) # ValueError cannot wrap lines with a max of 2 characters per line

Augment the dict to take a list as the key

You could do this in an rpop method, but the idea would be you could do something like this:

d = {
    "foo": {
        "bar": 1
        "che": 2
    }
}

d[["foo", "bar"]] # 1
d.pop(["foo", "bar"]) # 1

The reason why this would work is because, by default:

>>> d = {}
>>> d[["foo", "bar"]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Move endpoints's Url class into here

Endpoints's Url class would probably be useful outside of endpoints because there is a similar class in bang.utils

Add Environ.has method

def has(name):
    k = environ.key(name)
    return k in os.environ

Path.create doesn't seem to always return a file or dir instance even when they exist

I just need to audit this code and make sure I understand, and document, the codepaths that result in Path inferring. I know .create_as() is the codepath. I might want to add a .create_inferred_instance() or something like that also

OrderedList custom key bug

This failed:

fs = OrderedList(key=lambda f: f.integer_value)
for f in foo:
    fs.append(fs) # AttributeError: 'OrderedList' object has no attribute 'integer_value'

with this stacktrace:

File "filename.py", line N, in method
    fs.append(fs)
  File "/.../site-packages/datatypes/collections.py", line 445, in append
    k = self.key(x)
  File "filename.py", line N, in <lambda>
    fs = OrderedList(key=lambda f: f.fstat.st_mtime)
AttributeError: 'OrderedList' object has no attribute 'fstat'

Interestingly, this failed also:

fs = OrderedList(key=lambda self, f: f.integer_value)
for f in foo:
    fs.append(fs) # TypeError: <lambda>() takes exactly 2 arguments (1 given)

So it is inconsistent, I need to add tests and fix this issue so that key can be set or overridden.

cachedmethod similar to property

I do a lot of things like this:

def foo(self):
    foo = getattr(self, "_foo", None)
    if not foo:
        foo = 5
        self._foo = foo
    return foo

It would be great to have a cachedmethod similar to property so I can do something like:

@cachedmethod(_cached="_foo")
def foo(self):
    return 5

There is functools.cache but I'd like a bit of control over what property gets set

Path callback glob or iter

so you could pass in a callback and it will run that for every File/Directory it finds, what should be the name:

cbglob?
itercb?

Datetime.datetime() method that is similar to datetime.date()

Basically returns a vanilla datetime instance:

def datetime(self):
    return datetime(
        self.year,
        self.month,
        self.day,
        self.hour,
        self.minute,
        self.second,
        self.microsecond,
    )