I've recently been reading Python code where heterogeneous dictionary objects are used

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

We (<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Type for heterogeneous dictionaries with string keys,about python/typing

Comments (56)

gvanrossum commented on May 27, 2024 6

Riffing on the right syntax, I wonder if the way to spell such types shouldn't look more like NamedTuple than like Dict. How about:

Foo = Struct('Foo', x=int, y=Optional[List[str]])

Using PEP 526 we could write it like this:

class Foo(Struct):
    x: int
    y: Optional[List[str]]

but that syntax remains a pipe dream for most of us (since it only works in Python 3.6).

Also I propose not to bother with initial values and not to express in the type whether a given key is mandatory or can be left out. (Note that Optional means it can be None, not that it can be left out.)

from typing.

gvanrossum commented on May 27, 2024 4

In regards to subclassing, I don't like the idea of an "extends" keyword or a class decorator, but I think we could rig a metaclass so that at runtime, after

class Point1D(TypedDict):
    x: int
class Point2D(Point1D):
    y: int

both Point1D and Point2D are just aliases for dict, but mypy can still type-check code that uses these. However it should still use structural checking, so that Point2D would be exactly equivalent (both at runtime and in the type checker) as this definition:

class Point2D(TypedDict):
    x: int
    y: int

The type checker should reject isinstance(x, cls) calls using a TypedDict subclass, since at runtime those would all be equivalent to isinstance(x, dict)...

from typing.

DrPyser commented on May 27, 2024 2

Hi. Anybody knows what's happening with this(TypedDict/DictStruct)? Is it ever going to be in the standard library? Looking forward to this feature...
Thanks!

from typing.

gvanrossum commented on May 27, 2024 2

@iddan

Is there a plan for some required fields on TypedDict? Is there already a dedicated issue?

We did give this some thought and decided to punt on it. Maybe a volunteer can implement this. The main problem is how to spell it -- we currently have the total flag that can be set to require all fields or cleared to require none. The problem with requiring some but not all fields is that you have to invent a spelling for it. And you can't reuse Optional because that already has a meaning (the field is required but the value may be None).

If you're interested in pursuing this idea I recommend filing a new issue.

from typing.

gvanrossum commented on May 27, 2024 1

Let's go with TypedDict.

from typing.

ilevkivskyi commented on May 27, 2024 1

@shoyer I agree with Jukka here. This should be done via a plugin to mypy. Note that there is a PR python/mypy#4328 that extends the current plugin system (to allow special casing of decorators/base classes/metaclasses). With this new plugin system, a user will be able to write something like this (very approximately):

class MyTable(pandas.DataFrame):
    id: int
    name: str

table: MyTable
table.id  # OK, inferred type is 'array[int]'
table['name']  # Also OK, inferred type is 'array[str]'

Currently, the author of the mentioned PR is also working on the plugin for attrs, so you can keep an eye on this.

from typing.

davidfstr commented on May 27, 2024 1

Hi @DrPyser, I was originally providing a lot of the organizational energy around the original TypedDict design and implementation but my life has gotten super busy over the past year or so.

I've been out of the loop long enough that I don't know if anyone else has stepped up to lead the charge on polishing TypedDict to a state that's solid enough to standardize. If not, that would be a valuable role for someone to take on that cares and has time.

from typing.

JukkaL commented on May 27, 2024

Even though something like this might be useful, after discussing this with several people it seems that everybody agrees that this should be left out from the PEP, but this is a potential feature to add in a later Python release (after experimenting with an actual prototype implementation, etc.).

from typing.

markshannon commented on May 27, 2024

We (@JukkaL, @vlasovskikh and I) agreed that this is best left until 3.6, or dropped all together.

from typing.

davidfstr commented on May 27, 2024

I was going to mention a NamedTuple-style syntax myself, since the original semantics I had in mind were nearly equivalent to NamedTuple:

all keys are required
accesses must be performed with a string literal
accesses using a non-literal are treated the same by the typechecker as a getattr() or setattr() applied to a NamedTuple

These semantics are more restrictive than those originally proposed by @JukkaL.

Maybe there should also be a way of deriving subtypes of heterogeneous dictionary types (similar to inheritance) to avoid repetition.

In case it's a useful data point, my current Django codebase wouldn't make use of such subtypes.

from typing.

davidfstr commented on May 27, 2024

Consider:

Point = DictStruct(x=int, y=int)
point = dict(x=42, y=1337)  # type: Point

The type declaration and value definition look similar and in alignment.
There's no need to repeat the type name as the first argument to DictStruct since the type name is not actually needed at runtime.

Sometimes code defines the actual dict value using {}, so perhaps the following might also be useful to support in case an existing codebase uses {} extensively.

Point = DictStruct({'x': int, 'y': int})
point = {'x': int, 'y': int}

Edit: Added secondary syntax that uses {}.

from typing.

gvanrossum commented on May 27, 2024

The same argument could be made for NamedTuple or TypeVar or NewType. But when you print them (e.g. when introspecting) the name is handy.

from typing.

davidfstr commented on May 27, 2024

That's true. If the type object were to remain available at runtime (as NamedTuple, TypeVar, and NewType do) that would make perfect sense. (In a previous iteration of my thinking, the DictStruct would actually erase to a plain dict at runtime. But upon reflection I don't see any good reason for that erasure.)

So I'm onboard with providing the type name as a constructor argument.

from typing.

gvanrossum commented on May 27, 2024

OK, then it's time to play around with the mypy code to see if you can implement it!

from typing.

davidfstr commented on May 27, 2024

Yes indeed. I've spent a good chunk of today on reverse-engineering how mypy works in general and how it handles NamedTuple. Once I have a more detailed implementation proposal, I intend to bring it up in python/mypy#985 since this would be a mypy implementation proposal rather than a typing proposal per-se.

from typing.

gvanrossum commented on May 27, 2024

@JukkaL and I spent some time offline brainstorming about this. Jukka has a lot of additional ideas.

Anonymous Structs, e.g. def f(d: Struct(x=int)) -> None: ...
A way to infer a Struct instead of a regular Dict for a dictionary expression, e.g. x = {'x': 1, 'y': 'foo'} # type: Struct might infer Struct(x=int, y=str) as the type for x instead of Dict[str, object] (that's what it currently does and I think it ought to remain the default).
From these two we infer that Struct type checking should be structural, i.e. two Structs defined with the same keys and value types ought to be considered equivalent.
People will want to subclass their Structs, either to define other structs with additional fields or to add methods (we've seen this for NamedTuple too, the latter is already possible).
What to do about heterogeneous **kwds? Some people will want to use Structs here. It's tricky though, since the current syntax assumes the dict is homogeneous. Also, people really should use "lone star" syntax if they have keyword-only arguments, e.g. def f(self, *, x: int) -> None: ... -- but this syntax only works in Python 3.
Sometimes you have a dict that's mostly homogeneous except for a few keys. Maybe it should be possible to give a Struct a default type for additional keys. (It's tricky though, unless the key type is different.)
Some other rules we may borrow from NamedTuple, e.g. no partially defined Structs. (@JukkaL: I'm not sure what you meant by that? That all keys should be present? I think I've seen code that checks whether a given key is present, but if it is, assumes it has a certain type.)
JSON is probably one of the main use cases.
Various automatically inferred uses of this could be supported by the conditional type binder in mypy (hmm, that's probably too mypy-specific for this tracker).
There's some potential confusion with the struct module, which defines struct.Struct quite differently (of course the etymology is the same, from C structs).

from typing.

ilevkivskyi commented on May 27, 2024

@gvanrossum

but that syntax remains a pipe dream for most of us (since it only works in Python 3.6)

There is no need to chose. Struct could be implemented to support both syntax: the 3.6+ one and the backward-compatible one. This could be done the same way it is done for NamedTuple -- the same code is called on instantiation and subclassing.

from typing.

gvanrossum commented on May 27, 2024

There is no need to cho[o]se.

Agreed, but the pre-3.6 syntax needs to be reasonable since that's what almost everybody will see for the forseeable future.

from typing.

JukkaL commented on May 27, 2024

By partially defined structs, I meant struct with some missing keys. We have at least these options:

Make all keys required. If a key is missing, a type checker will have to be shut up explicitly, such as by using a cast. Checking whether a key exists using in is still possible but it won't be enforced by the type checker. For example, you could write x = cast(Struct(x=int), {}), but x = {} # type: Struct(x=int) would be an error.
Make all keys optional (this is different from Optional[t]). Thus {} would be compatible with all struct types. Some legitimate errors would be silently ignored by type checkers.
Allow optionality to be specified individually for keys. For example, something like Struct(x=int, y=Something[str]), where Something specifies an optional key. We can't use Optional, as it already means something quite different. Not sure what would be a good name.
Like (3), but the default would be optional, and required keys would have to marked explicitly.

from typing.

JukkaL commented on May 27, 2024

Structural subtyping would likely also imply that dict(x=1, y='a') is compatible with Struct(x=int).

Using this to support adding new keys and automatically inferring a new type is problematic:

def f(x: Struct(x=int)) -> None:
    x['y'] = 2  # Is this okay? Would we infer type Struct(x=int, y=int) from this point on?

d = dict(x=1, y='a')  # type: Struct
f(d)  # Probably okay?
d[y] + 'b'  # Runtime error!

The latter example might be beyond the scope of PEP 484, but it's perhaps worth at least considering here. Maybe struct 'extension' as in f() above would require a cast to make it explicit that it is potentially unsafe. For example:

def f(x: Struct(x=int)) -> None:
    x = cast(Struct(x=int, y=int), x)
    x['y'] = 2  # Now okay

from typing.

ilevkivskyi commented on May 27, 2024

@gvanrossum

Agreed, but the pre-3.6 syntax needs to be reasonable since that's what almost everybody will see for the forseeable future.

In addition to the proposed syntax, I think something even more similar to NamedTuple could be considered:

Foo = Struct('Foo', [('x', Optional[List[str]]), ('y', int)])

Although it is a bit verbose, new features could be easily added, e.g. default values:

Foo = Struct('Foo', [('x', Optional[List[str]]), ('y', int, 42)])

class Foo(Struct):
    x: Optional[List[str]]
    y: int = 42

Then also specifying a default value to something special (ellipsis ...) means that this key could be omitted, etc.

from typing.

gvanrossum commented on May 27, 2024

Do you propose that in addition to or instead of the Struct('A', x=int) syntax?

from typing.

ilevkivskyi commented on May 27, 2024

I am not sure. On one hand the Struct('A', x=int) looks cleaner, on other hand the NamedTuple-like syntax is more flexible. Having both would cover more situations, but there should be one right way to do it. Taking into account that we already have this syntax for NamedTuple I would probably say "instead of".

from typing.

gvanrossum commented on May 27, 2024

How is the NamedTuple-derived syntax more flexible?

from typing.

ilevkivskyi commented on May 27, 2024

It is more flexible in the sense that we can accept 3-tuples (for example with default values) in the list of fields. Such option could be added later as a minimal change. While I don't see how this could be done with the dict-like syntax.

from typing.

gvanrossum commented on May 27, 2024

I guess it could be done using an extra wrapper, e.g. Struct('A', x=Required[int], y=WithDefault[str, '']). Although default values seem hard to map to the dict implementation -- you have to explicitly pass the default to get() if it's not None. A "required" flag might be useful; it would decide whether a dict with some keys missing is acceptable or not. (I guess the checker would also have to flag d['x'] as invalid unless you've already checked whether 'x' in d; d.get('x') would always be valid.)

from typing.

JukkaL commented on May 27, 2024

Default values don't seem to fit in seamlessly with our philosophy, as I don't see how they would be useful without a runtime effect, and elsewhere we try to avoid type annotations having a runtime effect. Removing typing-related stuff from a program generally shouldn't affect behavior.

from typing.

gvanrossum commented on May 27, 2024

So the list-of-tuples form has little to recommend it.

…

--Guido (mobile)

from typing.

ilevkivskyi commented on May 27, 2024

OK, then I agree that we should go with Struct('A', x=int) syntax. Concerning the partially defined Struct I think the 3rd option proposed by @JukkaL is convenient while still safe. One could write:

A = Struct('A', x=int, y=Facultative[int])
d = {'x': 1}  # type: A # OK
if 'y' in d:
    d['y'] += 1  # Also OK

from typing.

JukkaL commented on May 27, 2024

I also feel like required should be the default, not optional, and we should aim at safety, at least unless we find some very common use cases which are somehow inherently unsafe. But even then, we can require users to tweak their code a bit before it can be statically safe.

If static typing is going to be successful, it's possible that the vast majority of statically typed code will eventually have been written with static typing in mind from the beginning. When not working on legacy code, safety guarantees more easily win over convenience.

from typing.

davidfstr commented on May 27, 2024

Anonymous Structs, e.g. def f(d: Struct(x=int)) -> None: ...

It seems like this would get verbose pretty fast. Although I could perhaps see it with simple cases like Struct(x=int, y=int). Even with simple cases I'd probably declare a separate type. In this example a Point2D.

So I'm not completely convinced of the practical utility of anonymous structs, even though I recognize the theoretical utility.

From these two we infer that Struct type checking should be structural, i.e. two Structs defined with the same keys and value types ought to be considered equivalent.

Agreed on this implication if anonymous structs exist.

People will want to subclass their Structs, either to define other structs with additional fields or to add methods (we've seen this for NamedTuple too, the latter is already possible).

Structs as I currently envision them are a zero-cost abstraction. Therefore adding methods doesn't really make sense to me. Struct is a type only, not a runtime class that generates instances.

What to do about heterogeneous **kwds? Some people will want to use Structs here.

This feels like an edge case to me. I don't think a typechecker need check the splat of a Struct (**some_struct) explicitly.

no partially defined Structs.

If by this you mean that all keys must be specified, than I am in agreement.

JSON is probably one of the main use cases.

Strongly agree.

There's some potential confusion with the struct module

I actually don't like the name "Struct". It evokes the same general-purposeness of "object" when the primary attribute of a Struct is that it specifically wraps a dictionary with some typing information. I would be in favor of a name more along the lines of:

TypedDict, KeyTypedDict
NamedDict -- although the similarly to NamedTuple incorrectly implies a non-zero-cost abstraction and runtime key checking
StructDict <--> DictStruct
ObjectDict <--> DictObject

from typing.

gvanrossum commented on May 27, 2024

Agreed 'Struct' is not a perfect name. But it's short, and pronounceable (I could never say StructDict three times fast -- nor can I type it :-). The only other thing I'd like to put in is that regardless of the motivation I think this should use structural type checking, if only because there are seem to be several opportunities for implicit definition of compatible types.

from typing.

davidfstr commented on May 27, 2024

The only other thing I'd like to put in is that regardless of the
motivation I think this should use structural type checking, if only
because there are seem to be several opportunities for implicit definition
of compatible types.

I think you're probably right that structural type checking would be optimal here, particularly if these types have no runtime presence. No isinstance checks to worry about breaking.

from typing.

JukkaL commented on May 27, 2024

I've seen a lot of uses of **args where this would potentially be useful, depending on the specifics of the design. This use case would likely benefit from more specialized features (for example, see #270). But maybe we shouldn't spend too much time on this yet, at least until we have a basic proposal that works for JSON and other, more straightforward use cases.

from typing.

davidfstr commented on May 27, 2024

Agreed 'Struct' is not a perfect name. But it's short, and pronounceable (I
could never say StructDict three times fast -- nor can I type it :-).

Would TypedDict or NamedDict be easier to pronounce? :-)

In all seriousness, I think it would be most clear if the name included "Dict" somewhere in the name. Preferably in __ADJECTIVE__ Dict form as per standard English. Such a naming emphasizes that it's just a kind of specialized dictionary rather than an entirely new concept.

Prior art that is similar includes (namedtuple + NamedTuple), OrderedDict, and JsonDict (from mypy).

from typing.

JukkaL commented on May 27, 2024

I don't really like TypedDict -- Dict[int, str] already is a typed dictionary, after all. It doesn't communicate the idea that we have separate types for various (named) keys. I think that NamedDict is a little better, though again an anonymous named dict could be confusing.

More ideas (these are just whatever comes to mind right now):

KeyDict
KeyedDict (does this even make sense?)
DictForm
DictPattern
KeywordDict

from typing.

ilevkivskyi commented on May 27, 2024

Maybe just NameDict (single 'd'), or NameSpace?

from typing.

dmoisset commented on May 27, 2024

@JukkaL regarding #28 (comment) , another possible application of this is typing the __dict__ attribute of both classes and objects (here's a place when specifying ClassAttr as in PEP-526 helps)

Regarding names, the main property seem to be that the keys are strings from a predefined set of strings, so it evokes more a FixedDict or FixedKeyDict (although that doesn't capture the "string" part).

from typing.

gvanrossum commented on May 27, 2024

I don't like NamedDict, because the analogy with NamedTuple fails on many
levels; NameDict is too subtly different.

I had expected the objection against TypedDict, but I'd like to override
it, since Dict[...] is only typed in the trivial sense that everything is
typed. And the rest will be obvious from context -- if you see

A = TypedDict('A', x=int, y=List[str])

that should be pretty clear to someone who has never seen it before.

from typing.

davidfstr commented on May 27, 2024

Let's talk a bit how TypedDict fits into the type system.

Given a TypedDict such as the following, what can you do and what can you not do?

Point = TypedDict('Point', {'x': int, 'y': int})
point = Point({'x': 0, 'y': 0})

Here are the semantics I propose:

What can you convert a TypedDict to?

You can convert a TypedDict to itself (or to another TypedDict with the same key set):

def identity(p: Point) -> Point:
    return p

If we do NOT allow subtyping of TypedDicts then you cannot convert a TypedDict to a narrower TypedDict:

Point1D = TypedDict('Point1D', {'x': int})
def narrow(p: Point) -> Point1D:
    return p  # ERROR

If we DO allow subtyping of TypedDicts then you can convert a TypedDict to a narrower TypedDict:

Point1D = TypedDict('Point1D', {'x': int})
def narrow(p: Point) -> Point1D:
    return p  # OK

But you cannot convert a TypedDict to a wider TypedDict:

Point3D = TypedDict('Point3D', {'x': int, 'y': int, 'z': int})
def widen(p: Point) -> Point3D:
    return p  # ERROR

You can also convert a TypedDict to a immutable Mapping[str, V], where V is the supertype of all value types for the typed dict:

def demote(p: Point) -> Mapping[str, int]:
    return p

However you cannot convert a TypedDict to a mutable Dict because then you could del keys from it, breaking the assurance that all keys declared by a TypedDict actually exist:

def invalid_demote(p: Point) -> Dict[str, int]:
    return p  # ERROR

p_dict = invalid_demote(Point({'x': 0, 'y': 0}))
del p_dict['y']  # yikes

You can convert a TypedDict to any supertype of Mapping[str, V], including Sized, Iterable, Container, or object:

def demote(p: Point) -> object:
    return p

You can convert a TypedDict to Any, since anything can be converted to Any:

def unprotect(p: Point) -> Any:
    return p

You cannot convert a TypedDict to any other type that is not mentioned above.

What operations are permitted on a TypedDict value?

You can do anything with a TypedDict that you could do with a Mapping[str, V], where V is the supertype of all value types for the typed dict. In particular the following methods are supported:

__getitem__, __iter__, __len__
__contains__, keys, items, values, get, __eq__, and __ne__

However the use of __getitem__ is restricted, as described below.

Additionally the __setitem__ method is supported, with restrictions described below.

No other methods are supported. In particular __delitem__ is not supported.

What about `setitem`?

__setitem__ will accept a string-literal key that matches a declared key of the typed dict, and will check the value type to ensure it is compatible:

def set_x(p: Point, x: int):
    p['x'] = x

def set_y(p: Point, y: object):
    p['y'] = y  # ERROR: Cannot assign object to int

If we do NOT allow subtyping of TypedDicts, __setitem__ will accept ONLY a string-literal key that matches a declared key of the typed dict:

def set_z(p: Point, z: int):
    p['z'] = z  # ERROR: 'z' is not a valid key for Point. Expected one of {'x', 'y'}.

If we DO allow subtyping of TypedDicts, __setitem__ could accept any str key:

def set_z(p: Point, z: int):
    p['z'] = z  # OK: Can assign int to Any.

...but then again we might decide to disallow it anyway since this behavior is unusual and could indicate a misspelled key: (And because there's no way to read any key written in this manner.)

def set_x(p: Point, x: int):
    p['z'] = x  # ERROR: 'z' is not a valid key for Point. Expected one of {'x', 'y'}.

__setitem__ will not accept keys that are not string literals, even if they are otherwise strs:

def set_coordinate(p: Point, key: str, value: int):
    p[key] = value  # ERROR: Cannot prove 'key' is a valid key for Point. Expected one of {'x', 'y'}.

What about `getitem`?

__getitem__ will accept ONLY a string-literal key that matches a declared key of the typed dict, and will check the value type to ensure it is compatible:

def get_x(p: Point) -> int:
    return p['x']

def get_y(p: Point) -> str:
    return p['y']  # ERROR: Cannot assign int to str.

In particular __getitem__ will not accept keys that are not string literals, even if they are otherwise strs:

def get_coordinate(p: Point, key: str) -> int:
    return p[key]  # ERROR: Cannot prove 'key' is a valid key for Point. Expected one of {'x', 'y'}.

Open Questions

Should TypedDict support subtyping?

Earlier discussions seemed to be leaning toward yes.

from typing.

ilevkivskyi commented on May 27, 2024

@davidfstr Thanks for a nice summary! Maybe I have overlooked it or it is obvious, but I would like a simple structural subtyping to be added to your summary:

class Point2D(TypedDict):
    x: int
    y: int

def fun(p: Point2D) -> None: ...
fun({'x': 1, 'y': 2})  # This should be allowed

Concerning subclassing, I think that TypedDict should be subclassable, in analogy with NamedTuple. If one takes into account that in Python x.y is almost x.__dict__['y'], then the analogy between these two becomes quite strong:

Point2D = TypedDict('Point2D', {'x': int, 'y': int})
p = Point2D({'x': 1, 'y': 2})
p['x'] + p['y']

Point2D = NamedTuple('Point2D', [('x', int), ('y', int)])
p = Point2D(1, 2)
p.x + p.y

from typing.

davidfstr commented on May 27, 2024

class Point2D(TypedDict):
    x: int
    y: int

That a nice syntax with variable annotations. One slight difficulty is
that making the TypedDict type constructor a superclass is that it would
imply that TypedDict has a runtime presence, when it is intended as a
zero-cost abstraction over dictionaries. This is difference between it
and NamedTuple.

Here is a similar syntax that could be implemented with zero-cost semantics:

@TypedDict
class Point2D:
    x: int
    y: int

The preceding would be equivalent to Point2D = TypedDict('Point2D', dict(x: int, y: int)) (and ultimately Point2D = dict) at runtime.

Concerning subclassing, I think that |TypedDict| should be subclassable,
in analogy with |NamedTuple|.

I could potentially see type-instances of TypedDict (ex: Point2D) being
subclassable in the same way that dict is subclassable. Under those
circumstances the subclass would in fact attain a runtime presence.

However with a runtime presence you lose the advantages of being able to
do zero-cost casts to the type, since that is no longer possible. The
only advantage you retain is that a type checker will recognize certain
keys. With only those advantages you might as well use a NamedTuple or a
subclass thereof.

from typing.

gvanrossum commented on May 27, 2024

From the have-your-cake-and-eat-it department: maybe it's possible to have the class structure as suggested but make instantiations just return plain dicts? That's almost zero runtime cost: if we inherit from dict and don't define the constructor, the only runtime cost would be looking up the constructor in the class hierarchy (a few dict lookups at most).

from typing.

davidfstr commented on May 27, 2024

maybe it's possible to have
the class structure as suggested but make instantiations just return plain
dicts?

I could see that.

It occurs to me that my "Should TypedDict support subtyping?" question above was ambiguous.

Originally I was only considering the question of structural subtyping. That is, whether or not the following would be accepted:

Point2D = TypedDict('Point2D', dict(x=int, y=int))
Point3D = TypedDict('Point2D', dict(x=int, y=int, z=int))

def narrow(p: Point3D) -> Point2D:
    return p

Again, prior discussion suggests yes.

But here are a lot of other crazy ideas that also come to mind in the realm of "subtyping":

(1) Extending with an "extends" keyword to TypedDict

Point1D = TypedDict('Point1D', dict(
    x=int
))
Point2D = TypedDict('Point2D', extends=Point1D, fields=dict(
    y=int
))
Point3D = TypedDict('Point3D', extends=Point2D, fields=dict(
    z=int
))

(2) Extending with class-based syntax

class Point1D(TypedDict):
    x: int
class Point2D(Point1D):
    y: int
class Point3D(Point2D):
    z: int

Although the preceding syntaxes have the feel of nominal subtyping, it is just syntactic suger for fully spelling out all fields. The type system would still use the more general structural subtyping when checking type compatibility.

from typing.

ilevkivskyi commented on May 27, 2024

@davidfstr

it is intended as a zero-cost abstraction over dictionaries

As Guido already mentioned, there are various kinds of manipulations with __new__ all having different speed and possibilities. Probably the fastest one would be something similar to NewType: metaclass's __new__ will return a function that returns its argument:

class TDMeta(type):
    def __new__(cls, name, bases, ns, *, _root=False):
        if _root:
            return super().__new__(cls, name, bases, ns)
        return lambda x: x
class TypedDict(metaclass=TDMeta, _root=True): ...

class Point2D(TypedDict):
    x: int
    y: int

assert Point2D({'x': 1, 'y': 2}) == {'x': 1, 'y': 2}

However, one will be not able to subclass this (as one cannot subclass NewType). I would prefer exactly what Guido proposes, it will be a bit slower, but will support your example (2).

from typing.

gvanrossum commented on May 27, 2024

(Also see the next stage of the implementation, WIP here: python/mypy#2342)

from typing.

davidfstr commented on May 27, 2024

The type checker should reject isinstance(x, cls) calls using a TypedDict subclass, since at runtime those would all be equivalent to isinstance(x, dict)...

from typing.

shoyer commented on May 27, 2024

Would a generic version of TypedDict be feasible? There are some strong use cases for this for scientific computing / data science applications:

NumPy has structured data types, where each array element is struct (xref https://github.com/numpy/numpy_stubs/issues/7).
pandas.DataFrame probably the most popular data structure when using Python for data science. Essentially, it is a dictionary where indexing returns a vector of all entries in a column.

In both cases, it is quite common to define ad-hoc "types" in applications analogous to TypedDict, e.g., a "user dataframe" which is defined to have a fixed set of column names with particular data types. Type checking would be quite valuable for checking such code.

from typing.

JukkaL commented on May 27, 2024

@shoyer Can you give a few concrete code examples of where this could be useful?

from typing.

ilevkivskyi commented on May 27, 2024

@shoyer Note that you can already have generic protocols. Together with literal types (that are proposed in another issue) you can just overload __getitem__. For example:

class MyFrame(Protocol[T]):
    @overload
    def __getitem__(self, item: Literal['name']) -> str:
    @overload
    def __getitem__(self, item: Literal['value']) -> T:

or similar. In principle TypedDict (as well as NamedTuple) can be made generic and there are several requests for NamedTuple, there are however not so many for TypedDict.

from typing.

shoyer commented on May 27, 2024

The use cases for TypedDataFrame (my hypothetical TypedDict/pandas.DataFrame combination) line up very closely with TypedDict itself. Basically, any time you would use an ad-hoc record type, but need performance or interoperability with the Python for data stack.

To build off the example in mypy's docs, you would use TypedDataFrame if you wanted to build an application to analyze a database of movies. Various functions could create Movies (e.g., from a CSV file or relational database) and other function could transform (e.g., by filtering entries) or consume them (e.g., to compute statistics or make plots).

As is the case for TypedDict, most of these use cases would also work fine with a dataclass or namedtuple (in this case, where the entries are 1-dimensional arrays), but there are advantages to standardizing on common types and APIs, and using types that can be defined dynamically when desired. In the PyData ecosystem, pandas.DataFrame fills a similar niche to dict: if you want to pass around an adhoc namespace (of data), it's the idiomatic way to do it.

@ilevkivskyi Yes, I suppose protocols with literals would work for some use cases, but that wouldn't be a very satisfying solution. There are a long list of methods beyond indexing that only take column names found in the DataFrame as valid entries, e.g., to group by a column, plot a column, set an index on a column, data, rename columns, etc.

I only have a vague idea what support for writing custom key-value types would look like, but perhaps it would pay dividends, because in some sense this is a generalized version of typing for TypeDict, NamedTuple and dataclasses. There would need to be some way of defining paired key/value TypeVar instances, and then you could define methods in any desired fashion, e.g., perhaps

K = TypeVar('K')
V = TypeVar('V')

class TypedDict(Enumerated[K, V], Dict[str, Any]):
    def __getitem__(self, key: K) -> V: ...

class NamedTuple(Enumerated[K, V], namedtuple):
    def __getattr__(self, name: K) -> V: ...

(Feel free to declare this out of scope for now or push it to another issue -- I don't want to pull TypedDict off track!)

from typing.

JukkaL commented on May 27, 2024

@shoyer Generalizing TypedDict in the way you are suggesting seems out of scope, unfortunately. TypedDict and NamedTuple are currently very special-purpose constructs, and I don't expect this to change. However, perhaps it's possible to special case pandas.DataFrame in a similar way? Experimenting with that could happen as a mypy plugin, for example. This still wouldn't be trivial, and the current mypy plugin system would have to be extended.

from typing.

ilevkivskyi commented on May 27, 2024

@DrPyser it is supported by mypy for a year and half, but it leaves in mypy_extensions, until there is a PEP. After it is accepted TypedDict can move to typing.

from typing.

JukkaL commented on May 27, 2024

Maybe we should close this issue now? We can create follow-up issues about remaining work that isn't covered by other issues.

from typing.

ilevkivskyi commented on May 27, 2024

OK, let us close this. I think we have issues about missing features on mypy tracker and the pandas question is unrelated (btw @shoyer the plugin hook I mentioned was added to mypy and attrs successfully uses it, I think we can write a similar plugin for data frames as I described above in #28 (comment)).

from typing.

iddan commented on May 27, 2024

Is there a plan for some required fields on TypedDict? Is there already a dedicated issue?

from typing.

Type for heterogeneous dictionaries with string keys about typing HOT 56 CLOSED

Comments (56)

What can you convert a TypedDict to?

What operations are permitted on a TypedDict value?

What about `setitem`?

What about `getitem`?

Open Questions

(1) Extending with an "extends" keyword to TypedDict

(2) Extending with class-based syntax

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (56)

What can you convert a TypedDict to?

What operations are permitted on a TypedDict value?

What about __setitem__?

What about __getitem__?

Open Questions

(1) Extending with an "extends" keyword to TypedDict

(2) Extending with class-based syntax

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

What about `setitem`?

What about `getitem`?