python / typing Goto Github PK

View Code? Open in Web Editor NEW

1.5K 59.0 222.0 2.73 MB

Python static typing home. Hosts the documentation and a user help forum.

Home Page: https://typing.readthedocs.io/

License: Other

Python 72.58% HTML 27.42%

python types typing static-typing gradual-typing

typing's Introduction

Static Typing for Python

Documentation and Support

The documentation for Python's static typing can be found at typing.readthedocs.io. You can get help in our support forum.

Improvements to the type system should be discussed on Python's Discourse, and are tracked in the issues in this repository.

For conversations that are more suitable to a chat platform, you can use one of the following:

gitter
discord #type-hinting channel

Repository Content

This GitHub repository is used for several things:

The documentation at typing.readthedocs.io is maintained in the docs directory. This includes the specification for the type system. See especially the update procedure for the spec.
A discussion forum for typing-related user help is hosted here.
Conformance test for Python static type checkers. The latest conformance test results are here.

Historically, this repository also hosted:

The typing_extensions package, which now lives in the typing_extensions repo. It used to be in the typing_extensions directory.
A backport of the typing module for older Python versions. It was removed after all Python versions that lack typing in the standard library reached end of life. The last released version, supporting Python 2.7 and 3.4, is available at PyPI.

typing's People

Contributors

Stargazers

Watchers

Forkers

encukou scoder flying-sheep ceridwen bigmacchia jimjjewett xennygrimmato lucemia ilevkivskyi rutsky o11c vi4m rowillia silky matthiaskramm vladiibine bintoro brettcannon refi64 momandine ddfisher gsmurf kmod michael0x2a vgr255 bakkerthehacker jellezijlstra roganov tonyfast wheerd rjkmurray75 emilyemorehouse naufraghi adetokunbo vlasovskikh gitter-badger ilinum onionka mitar curistarist zauddelig harinivengala wilfred nasim17 chadrik carljm rkr-at-dbx nickatak mariusvniekerk tdb-alcorn samuelsantanam djsutherland suutari erikwright fish2000 akalomia mr-c crobobo felixonmars isgasho jayvdb louisparis dalavancloud igordzreyev cclauss resinten luxianjun dkgi vamsi-v4m pcd1193182 jstasiak mrkmndz gcarq kelledin marco-c jivid crusaderky tapaswenipathak zac-hd lingxiaoyang danielaguirre708 mluscon s2nner ethanhs vemel stjordanis luyidong mabu-dev hamzahamidi bibhuashish hugovk srittau wnoise frenzymadness sailfish009 east825 jwxia2014 alucryd hoodmane noelevans

typing's Issues

Describe how to define generic classes

Currently the PEP doesn't seem to mention how to define generic classes. The PEP should explain it.

Mypy uses this approach:

from typing import TypeVar, Generic, Undefined

T = TypeVar('T')

class MyBucket(Generic[T]):
    def __init__(self, value: T) -> None:
        self.value = value

b = Undefined(MyBucket[int])
b = MyBucket(1)

Interoperability of java.Util.Arraylist with typing.List (in Jython)

This very specific case was brought up by Jim Baker. While in general typing.List is meant to be the parametrizable version of builtins.list, I am fine with having it support the registration mechanism of ABCs and registering java.util.ArrayList to be equivalent, given that Jim says that in practice it is indeed duck-substitutable (is that even a word? :-) for the built-in list type. (I assume that this means it has a sort() method.)

Do we need an explicit cast() function?

In mypy there's a cast(TYPE, EXPR) function. Do we still need this, given that we can do the same thing with # type: comments? (However, see issue #9.)

Add type for variable-length, uniform tuples

Some ideas:

Tuple[int, ...]
Sequence[int]

Explain the Any type in more detail

The PEP draft talks about Any as a union type, but in mypy it's actually very different from a union type. For example:

List[Any] is compatible with List[int] (and vice versa), but this is not true for union types.
A value of type Any can be assigned to a variable with an arbitrary type and vice versa, which is not true for union types.

I think we should be more explicit about at least some of the expected properties of the Any type.

The PEP should clarify the status of typing vs. types

Lukasz wrote:

The remark about removing types from being mentioned actually
makes me think we could solve many issues before they arise by
introducing a short section The place of the typing module in
the standard library which would explain how the authors intend for
it to be used and what is its role compared to builtin types,
types, collections, and collections.abc. The worries
that Guido has about the types module being ill-suited for type
hinted are spot on, we should mention that in the document.

Define conventions for stubs

The PEP should define some conventions for library stubs. Here are some things that might be relevant:

How to declare variable types? Maybe use Undefined(...) exclusively, so that a parser that processes the stubs doesn't need to be able to access comments?
Do we need to use __all__, or should this be implicit? What about visibility of names imported from other modules?
Should we have a convention for recognizing a Python file as a stub file? Maybe they could optionally have a different extension, or maybe any files in directory named stubs are considered stubs automatically?

Should we encourage abstract types?

In principle yes, in practice there are issues (long type names, strings and bytes being iterables and sequences).

Do we need a standard way of silencing checkers?

It may be useful to have a way of saying "don't complain about this type error; trust me, I know what I'm doing". For example, consider this code:

def foo(f: Function[...]) -> None:
    f._attr_ = True   # Error? But what if this is intentional?

A type checker would probably not allow assigning to an attribute of a function. However, sometimes programmers do this, and it works for user-defined functions. Maybe we could have a standard way of asking the type checker to shut up and ignore this particular warning only.

If we don't specify this in the PEP, there could end up being multiple tool-specific ways of doing this.

Apparently pylint supports something like this:

# pylint: disable=fixme

Hack also has something like this (example copied from Hack docs):

<?hh
function unsafe_foo(int $x, int $y): int {
  if ($x > $y) {
    // UNSAFE
    return "I am not checked by the type checker"; // Covered by UNSAFE
  }
  return 34; // NOT covered by UNSAFE
}

Link to relevant Hack docs: http://docs.hhvm.com/manual/en/hack.modes.unsafe.php

Annotations for class and object fields

Right now it doesn't look like there's a way to specify class and object fields to have types. Of course, the analyzer can see method definitions and use them to create class/object types, but what's the right thing to do with a class definition like

class C:
  x = 42

Here there is no annotation to suggest that there would be anything wrong with a statement like C.x='foo'' or C().x='bar', and the programmer might want them to succeed (i.e. C.x has type Any); on the other hand it seems highly desirable to be able to specify that x is an int in some way if that's their expected behavior. The # type : syntax could work here, but it would be nice to have an alternative for analyzers that don't parse comments.

Additionally, having a distinction between class fields and object fields would be nice. This enable analyzers to report that, in a program like

class D:
  def __init__(self):
    self.x = 42

the field lookup D().x is well-typed and D.x might not be. It seems like mypy doesn't have this distinction, though I may be missing a way to do so.

Two ways to do this that come to mind: any assignments to self in __init__ will be treated as specifying object fields (and a # type : or similar sort of annotation indicating that it has a non-Any static type). Or, the classdef could contain calls similar to Undefined that specify its fields; like

class D:
  x = Field(int)
  def __init__(self):
     self.x = 42

The advantage of the latter approach is that it lets programmers explicitly decide what the interface to their object is, rather inferring it.

(We currently use a third approach, specifying class/object fields through decorators, but that doesn't seem to fit with the style proposed in the PEP, and I'm not a big fan of it anymore either after seeing the better stuff you've come up with :)

Annotating namedtuples item types

There should be a way of annotating the item types of named tuples. Here is my suggested syntax:

X = namedtuple('X', ['a', 'b']) # type: (int, str)

The type of a would be int and the type of b would be str.

Make everything abstract and consistent

remove Dict, List, Set, and FrozenSet.

create a new type for abstract Sets without mutability methods (likely similar to what frozenset can do) and rename AbstractSet to MutableSet.

leave Tuple because it has special status as “ordered collection of fixed length with possibly different types per element”

Should we make built-in classes special?

Mypy allows marking some classes as 'built-in' in stubs, since these classes (C classes in CPython) are semantically different from ordinary classes. For example, multiple inheritance is different:

class X(int, str):  # multiple bases have instance lay-out conflict
    pass

Mypy uses the typing.builtinclass decorator for this. Here's a simplified example from mypy stubs for builtins:

from typing import builtinclass

@builtinclass
class str(...):
    ...

Here are some questions:

Should we define a way of annotating a class as a built-in class?
Multiple inheritance is possible from some sets of built-in classes. Do we need to able to define precisely when a built-in class has an instance layout that is different from the base class?
What's the best term for this -- should we call them built-in classes, as 3rd part modules can also define them?

Other uses of annotations

Currently the proposal suggests a dictionary-based syntax for allowing the arbitrary annotations currently used in Python 3 to coexist with type annotations (https://github.com/ambv/typehinting/blob/master/pep-NNNN.txt#L194):

def notify_by_email(employees: {'type': List[Employee], 'min_size': 1, 'max_size': 100}): ...

This meshes well with existing usage if the existing non-type annotation is already a dictionary, but if it's some other kind of value it forces additional refactoring. Maybe instead, typing could provide a TaggedType or similar class, which cleanly separates type information from other information that the programmer wants in their annotations:

def notify_by_email(employees: TaggedType(List[Employee], {'min_size': 1, 'max_size': 100}): ...
or
def notify_by_email(employees: TaggedType(List[Employee], 'list of employees to be notified')): ...

The first argument is always a type, and the second argument is any arbitrary value.

This would also free up dictionaries for possibly representing structural types/protocols (see #11).

TypeVar example in PEP is confusing

[I think Guido pointed this out elsewhere, but maybe this should be addressed separately here so that it won't be forgotten.]

The following type variable constraint example from the PEP is confusing:

from typing import Iterable
X = TypeVar('X')
Y = TypeVar('Y', Iterable[X])
def filter(rule: Callable[[X], bool], input: Y) -> Y:
...

I think this could be written equivalently without having a constraint:

from typing import Iterable
X = TypeVar('X')
def filter(rule: Callable[[X], bool], input: Iterable[X]) -> Iterable[X]:
...

Also, if this is similar to the values keyword argument to mypy's typevar, having a type variable with a single value would be equivalent to having a type alias (since subtypes are not valid), so the only interesting use cases would have 2 or more constraints.

There is actually no non-trivial example of using constraints in the PEP. I suggest adding an AnyStr example, as this seems like the most common use case for constraints (at least of the sort supported by mypy right now). Maybe the discussion should mention how this is different from bounded polymorphism/quantification as in Java, for example, as this is probably the most non-mainstream feature introduced in the PEP.

Convert README.rst into separate issues

(Or update existing issues in case of overlap.)

The PEP examples should be complete and meaningful

Examples are important. Following advice by Josh Bloch, I propose that we use meaningful variable, class and function names, and complete (if simple) examples. Death to foo and bar (or spam and ham). For example, if we're talking about subclass relationships, we can use Employee and Manager as class names; when talking about string functions we could use examples from filename manipulation (e.g. determining the extension).

There currently are some examples using formatting operations (% or .format()); these are not advisable. Other examples use 'Callable' without specifying a signature, which is awkward because in a typical situation you are also going to call a callable with a specific set of arguments.

Convertible-to constraints

Many type constraints in both stdlib and external functions are not of the form "x is an instance of type T (or a subtype thereof)", but of the form "x is something that can be converted to type T (or a subtype thereof)".

In fact, the de-facto dynamic type system in Python is in some ways closer to C++'s static type system, which is built all around conversions, than to a more traditional static type system. But in C++, convertibility is explicitly definable.*

The problem is that there are a variety of kinds of type-conversion systems in use, many of them user-extensible, including:

Dunder-method protocols, like "x.__index__ exists". This one's almost easy—an ABC with a __subclasshook__ takes care of it. Except that the static type checker then has to understand ABCs besides the ones that are built in, down to the level of being able to call their __subclasshook__ methods.
Construction: T(x) would succeed. If there were some way to specify "any type that matches the second argument type of T.__new__ and T.__init__", but that isn't.
Alternate construction: Some function f(x) (or classmethod T.f(x)) would succeed. (For example, I think the only hard part of defining NumPy's "array-like" is that it's a Union of a bunch of things—ndarray, the NumPy scalar types, native Python numbers, and anything for which np.array(x) is allowed.)
Delegation to methods. Besides the obvious, this includes delegation to methods which are explicitly intended to be overridden in subclasses. For example, JSONEncoder.encode takes any type that self.default takes.
Delegation to attributes: T(x._as_parameter_) would succeed. This can also include callables; the return type of a ctypes function is f.restype if it's not callable, but if if it's a Callable[[X], Y] it's Y.
Registries. Consider how various SQL query libraries handle "can be stored in a REAL column" by, e.g., looking in a map to see if register_adapter(type(x), REAL) has been called.

Not all of these cases need to be statically type-checked, of course. (And ctypes seems like it pretty obviously could/should be a special case, at least in a v2 proposal.) But I think things like MySequence.__getitem__ or np.add are the kinds of things people might want to type. And it would be nice to be able to declare that my function can take anything convertible to, say, ip_address, instead of having to manually Union together ip_address and its constructor's argument type. And so on.

_{* It's actually quite a mess. An implicit conversion (like passing an argument to a function) you have both built-in rules (subtype conversion for reference or pointer types, const qualification, traditional C coercions, etc.) and user-defined (either x.operator T() exists and is accessible, or there's an unambiguous overload for T(x) and it's accessible and not explicit) ones, and can have two conversion steps as long as only one of them is user-defined; an explicit conversion (like assigning an initial value to a variable declaration) has different rules. But at least it's a clearly-documented mess of unbreakable rules, and if you want to extend it with, say, a registry of adapters, you have to write templates that do that at compile time in terms of the rules.}

The PEP Abstract should probably say a little more about the shape of the proposal

What if a function has argument types but no return type?

The PEP draft has an example where there are argument type annotations but no return type annotation. What does it mean? Mypy currently treats the return type in a case like this as Any, but this is probably confusing, and it's easy to forget to give a return type. Maybe this should be disallowed?

For example, should this code be valid:

def f(x: int):
    return x + 1

There at least 4 potential approaches:

Treat missing return type as Any.
Treat missing return type as None.
Treat missing return type as an error (if any argument has an annotation).
If return type is missing but an argument has a type, infer return type automatically.

I'm inclined to support 3. I'm against 2 (because of inconsistency for functions which take no arguments, which must have an explicit None return type as otherwise they don't have an annotation at all) and 4 (explicit is better than implicit, implementation complexity).

How to annotate variables?

In mypy there are two ways to annotate a variable: with a # type: comment or a cast expression. The following are equivalent from mypy's POV:

    result = []  # type: List[int]
    result = List[int]()

Each has its disadvantages:

Jython's parser doesn't preserve comments and they don't want to change the parser
The cast expression incurs significant overhead (List[int] creates a new parametrize type object)

There's no clear solution at this point; something has got to give (though the PEP should probably support both and point out the issues).

For some specific cases (e.g. a variable initialized to an empty concrete container that is used directly in a 'return' statement) we may be able to improve the type inferencer, but in general that's a path we would rather not take (there's a good reason we start with function annotations only).

In the future (e.g. Python 3.6 or later) we may be confident enough to introduce variable annotations, e.g.

var result: List[int] = []

or perhaps even

result: List[int] = []

But this remains to be seen, and we definitely don't want to do this for the current PEP or Python 3.5. Plus, it might still incur exactly the same runtime overhead.

Description of Any in PEP does not match philosophy document

The PEP say this about Any:

A special kind of union type is Any, a class that responds
True to issubclass of any class. This lets the user
explicitly state that there are no constraints on the type of a
specific argument or return value.

But Any is actually quite different from a union type (see discussion of gradual typing in Guido's philosophy document), so it should be introduced as a separate concept. The above makes Any sound equivalent to object, which is not the case.

What should bare Union be?

Background: I am trying to implement a prototype of what the typing.py module for Python 3.4 might look like. I am trying to make it so that e.g. typing.Sequence is a subclass of collections.abc.Sequence (without modifying the latter). I think this is working out fine for the most part, e.g. typing.Sequence[int] is a subclass of typing.Sequence is a subclass of collections.abc.Sequence, and builtins.list is a (virtual) subclass of typing.Sequence (it reuses the registry of collections.abc.Sequence). I am implementing isinstance() and issubclass() hooks (__instancecheck__ and __subclasscheck__) so that you can do runtime type checks if you really want them, e.g. isinstance([1,2,3], typing.Sequence). I think I'll make it so that isinstance([1, 2, 3], tying.Sequence[int]) is also true, and isinstance([1, 2, 3.13], typing.Sequence(int)) is false -- yes, this is slow for a large list, caveat emptor.

So now I am getting to unions. Most things are straightforward, e.g. Union[int, float] == Union[float, int], isinstance(42, Union[int, float]), Union[int] is int, Union[int, Employee, Manager] == Union[int, Manager], etc. But now I am looking at edge cases, and I'm not sure where in the type system bare Union fits. It seems clear that Sequence[int] is a subclass of Sequence, but should Union[int, float] be a subclass of Union? Is bare Union even a type? Perhaps it should mean the same as Union[](which isn't valid syntax), i.e. the union of no types, i.e. a type that has no instances? I guess that would mean that bare Union is actually a subclass of all types, not the other way around. I.e. Union == Any???

But then what should an introspection type use if it wants to check whether a given type object (e.g. retrieved from an annotation) is a Union? Maybe isinstance(t, UnionMeta)?

Signal that a decorator implies `@no_type_check`

The idea is simple: some way to signal that a decorator implies no_type_check.

I'm not sure of the ideal syntax. Maybe it's no syntax at all: if no_type_check is automatically transitive through decorators, then if I define a top-level function with @no_type_check and then use that function (or a call of that function) as a decorator, the decorated function is also not type checked? But I don't know if that's too complex to require the type checker to handle.

The motivating example is frameworks that expose Python functions to other environments—wrap them up as CLI functions, or RPC server methods, or IRC bot handlers, or XMPP services, or whatever. Some of these frameworks (optionally) use annotations to express how command-line/service/etc. arguments get mapped to function parameters, and many more of them could. (The author of clize just brought this up on python-ideas as an argument for opt-in instead of opt-out typing, but I don't think it's a very good argument for that; if anything, it's just an argument for making opt-out easier.)

There are two basic patterns for this: either you subclass some framework class and your methods are exposed, or you write regular functions and decorate them with @clify.cli or @xrpc.service('spam.eggs') or similar.

To handle the latter cases, instead of having to additionally decorate each of those functions as @no_type_check or mark the whole block of exposed functions with # typing: OFF or whatever, it would be nice to just decorate the decorator itself, in the framework, and then the users of the framework don't have to do anything at all.

Type comment and multiple assignment

Mypy supports these syntax variants for declaring the types of multiple variables:

a, b = 1, 'x' # type: (int, str)
# or
c, d = 1, 'x' # type: int, str
# or
e, f = Undefined, Undefined # type: int, str
# or
g, h = Undefined(int), Undefined(str)

I don't think that this case is mentioned in the PEP draft.

An alternative would be use Tuple[...] for this, but I think that this is a bit confusing, since no variables actually will have a tuple type:

a, b = 1, 'x' # type: Tuple[int, str]

Find a better place to host the "theory of type hinting" article

Maybe an informational PEP? This is about https://quip.com/r69HA9GhGa7J .

Should we adopt mypy's @overload?

In mypy there's an @overload decorator. However, it was introduced before Union, and the latter is usually sufficient (or using type variables).

Is it still needed? If we need it, should it be part of this PEP or should it be a separate proposal for a runtime feature to be added to functools (like @singledispatch)? But if separate, how would it interact with generic types?

Some words to avoid needless worrying

On python-ideas I wrote this:

At the same time -- let me emphasize this one more time -- I hope there
will never be a time in the future where type hints are mandatory or
otherwise always expected to exist.

It was remarked that this should be stated clearly in the PEP. (Perhaps also other stuff I wrote in the same message.)

Should the PEP retain the list of comparable projects?

I'm thinking that the PEP will be long enough without also having to contain a survey of how other languages or frameworks approach similar issues. So perhaps we can move your (admirable, and useful!) survey somewhere else? Maybe a separate informational PEP that we can reference, so we won't have to worry about where to host it.

How to declare type variables

There are two issues around type variables:

What function to call a type variable
How to specify constraints (and what do the constraints mean)

Currently, mymy uses

T = typevar('T', values=[t1, t2, t3])

while the PEP currently proposes

T = Var('T', t1, t2, t3)

In any case it should be noted in the PEP that this is not the same as

T = Union[t1, t2, t3]

The reason is quite subtle, and the best example is currently the predefined type variable

AnyStr = Var('AnyStr', bytes, str)

Consider a polymorphic function that does something to filenames and works on both bytes and str, e.g.

def extension(file: AnyStr) ->AnyStr:
   return file.rsplit(b'.' if isinstance(file, bytes) else '.', 1)[1]

We really need AnyStr to be a type variable here, because we want to express that if the argument is a bytes, so is the return value, and if the argument is a str, the return value is too.

But that's not all! Such a type variable is constrained to exactly the given types. Consider the case where the argument is an instance of a user-defined subclass of bytes. We don't want the declaration to mean that the return value is then also an instance of that same subclass -- we want it to mean that the return value is a bytes (not a str).

I believe this makes the use of such a type variable equivalent to a collection of overloaded functions; in the above example

def extension(file: bytes) -> bytes:
    ...
def extension(file: str) -> str:
    ...

Open issues:

What should the rules be if the type variable is unconstrained?
Other languages often have a syntax for specifying a different type of constraint on type variables, e.g. the presence of a certain method; and then the implied type usually does vary with the actually specified type (I think). Do we need a way to do this?
Should we name this Var(), typevar(), or something else? (Var() is a PEP 8 violation.)
Should the constraints be additional positional arguments, or a keyword argument? If the latter, what should it be named?

Type for heterogeneous dictionaries with string keys

I've recently been reading Python code where heterogeneous dictionary objects are used a lot. I mean cases like this:

foo({'x': 1, 'y': 'z'})

The value for key 'x' must be an integer and the value for key 'z' must be a string. Currently there is no precise way of specifying the type of the argument to foo in the above example. In general, we'd have to fall back to Dict[str, Any], Mapping[str, Union[int, str]] or similar. This loses a lot of information.

However, we could support such types. Here is a potential syntax:

def foo(arg: Dict[{'x': int, 'y': str}]) -> ...: ...

Of course, we could also allow Dict[dict(x=int, y=str)] as an equivalent. I don't really love either syntax, though.

Alternatively, we could omit Dict[...] as redundant:

def f(arg: dict(x=int, y=str)) -> ...

Using type aliases would often be preferred:

ArgType = Dict[{'x': int, 'y': str}]

def f(arg: ArgType) -> ...

These types would use structural subtyping, and missing keys could plausibly be okay. So Dict[dict(x=int, y=str)] could be a subtype of Dict[dict(x=int)], and vice versa (!).

Maybe there should also be a way of deriving subtypes of heterogeneous dictionary types (similar to inheritance) to avoid repetition.

Maybe we'd also want to support Mapping[...] variants (for read-only access and covariance).

Some existing languages have types resembling these (at least Hack and TypeScript, I think).

Protocols (a.k.a. structural subtyping)

In mypy's typing,py there's a neat feature called Protocol. Perhaps we should add this to the PEP. Or perhaps it's similar to a Python ABC?

Dealing with modules that don't expose type objects

Some modules (for example, re) don't expose some of the underlying type objects, which makes it tricky to annotate code that uses these types. Mypy defines custom type aliases in typing for some of the more common types (including pattern and match object types for re), but this does not easily extend to 3rd party libraries.

Example:

>>> import re
>>> r = re.compile('x')
>>> type(r)
<class '_sre.SRE_Pattern'>
>>> import _sre
>>> _sre.SRE_Pattern
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'SRE_Pattern'

A related issue is types that are implicit, such as protocol types / structural types (in case we support them). Where should a structural type needed for annotating a library module be defined?

Also, generic library types pose a similar problem, since vanilla Python classes don't support indexing, and we'd have to use the string escape syntax to use a library class as a generic type.

Here are alternatives I've came up with:

Fall back to the type Any if the type object is not available. This is clearly suboptimal.
Define the type in typing or a related module as an alias. This does not scale to an arbitrary number of modules, but it would probably work for the most common cases. We could fall back to Any elsewhere.
Define a parallel module hierarchy for additional typing-related definitions, such as typing.re for re. Not sure how this will work if these modules come from multiple sources. Should this hierarchy be managed centrally?
Expose the type objects of standard library modules in Python 3.5 (as was suggested by Guido). This does not help with earlier Python releases. In addition to just exposing the types they should also have meaningful names, unlike _sre.SRE_Pattern in the above example.

Support Any(val)?

I just noticed that mypy's typing.py says that Any can be used to cast a value to type Any. E.g.

x = 42  # type: int
y = Any(x)
y.append(42)  # No error

Do we want/need this? What other notation could be used that we already support?

What to do with IO, TextIO and BytesIO

Mypy's typing has ABCs IO[AnyStr], TextIO and BytesIO for file-like objects. TextIO and BytesIO are subclasses of IO[AnyStr]. They have various issues, including these:

IO[Any] can be used to represent arbitrary file-like objects, but it doesn't support properties and methods only supported by text or binary files, but not by both. Maybe these should be added to the IO ABC, so that IO[Any] could be used whenever we don't statically know whether a file is a text or a binary file (but the programmer may be able to predict this).
If we implement the above change, we may be able to remove TextIO and BytesIO or make them aliases of IO[str] and IO[bytes], respectively.
There is no way to represent readibility or writability of files. Maybe there should be separate types for readable, writable and readable+writable files.
The write method of IO[bytes] only accepts bytes objects, since there is no way in the type system make it more general (e.g., to also accept bytearray objects). BytesIO can have the more general method type, however.

A problem with bool subclassing int and JSON; excluding subclasses using "exactly()" notation?

This came up in python-idea under the subject "Should bool continue to inherit from int?" by Michael Mitchell.

Suppose I have a function that takes an integer argument and then JSON-encodes it and sends it to some server that requires a JSON int. Now suppose you are adding type hints to your code, and you add ": int" to the parameter under discussion. And now suppose you have a faulty caller which calls this function with the argument set to True. This program will type-check correctly, because True is a bool which is a subclass of int, but it will run incorrectly, because (under this assumption) True will be converted to a JSON true value, which the server rejects.

My hunch is that we shouldn't try to address this but instead recommend using a schema-based way of generating JSON. But it's an interesting concern nevertheless. Maybe it should be possible to tell the typechecker that exactly an int is required, and a subclass won't do? I think this came up in some other contexts too.

[TBD: cross-links]

Syntax, semantics and use cases of forward references

The best proposal we have for forward references is:

If a type annotation must reference a class that is not yet defined at the time the annotation is evaluated at run time, the annotation (or a part thereof) can be enclosed in string quotes, and the type checker will resolve this.

This typically occurs when defining a container class where the class itself appears in the signature of some of its methods; the (global) name for the class is not defined until after the body of the class definition has been executed. (In some cases there may be multiple mutually recursive classes, so a shorthand for "this class" isn't really enough.)

A typical example is:

T = TypeVar('T')
class Node(Generic[T]):
    def add_left(self, node: 'Node[T]'):
        self.left = node

Note that the entire annotation expression 'Node[T]' is quoted, not just the class name Node (because 'Node'[T] makes no sense).

The question I'm trying to ask here is whether there is a reasonable limit to the complexity of the syntax that we must support inside string quotes. And if not, whether we may need to invent some other way to specify forward references. For example, something like this has been proposed:

T = TypeVar('T')
Node = ForwardRef('Node')
class Node(Generic[T]):
    def add_left(self, node: Node[T]):
        self.left = node

A related question is whether the __annotations__ dict should be "patched" to reference the intended class, or whether it is up to the code introspecting __annotations__ to interpret forward references (and if the latter, what kind of support typing.py should export to help).

I've got a feeling that the ForwardRef() approach makes things a little easier for the runtime support, at the cost of slight verbosity. The question is how common forward references really are (apart from stubs for builtin/stdlib container).

Should int be compatible with float and mypy's @ducktype decorator

Mypy treats int as compatible with float (and float as compatible with complex) even though the types aren't ordinary subtypes of each other. This is implemented via a @ducktype class decorator that allows specifying that a class should be treated essentially as a subclass of another class, even though there is no concrete subclass relationship and even if the interfaces aren't actually fully compatible. The mypy stub for int looks like this (a bit simplified):

@ducktype(float)
class int(... a bunch of ABCs ...):
    ...

This could also be useful for things like UserString (which is like str but not a subclass) and mock classes in tests.

At the least, we should specify int as compatible with float so that people won't have to write code like this:

Float = Union[int, float]

def myfunc(x: Float) -> float:
    return x * 5.0 - 2.0

Of course, all stubs for built-in modules would have to use unions everywhere for floating point arguments for the above to work in practice, and I'd like to avoid that as it seems like extra complexity with little benefit. The above would not let use some float methods for values with the union type (such as hex() which is only defined for float objects but not int objects).

If using the ducktype approach (or special casing some classes to be compatible and not making ducktype public) all float methods would be available, but they might fail at runtime if the actual object is an int. I think this is reasonable since these methods are pretty rarely used.

As a bonus, the ducktype decorator would effectively make all classes abstract, hopefully helping address the abstract vs. concrete type problem (see #12, for example).

Prefer the mypy restrictions on type variables

the PEP has this example:

X = Var('X')
Y = Var('X')
  def first(l: Sequence[X]) -> Y:   # Generic function
      return l[0]

In mypy, the equivalent thing with typevar() won't work -- Y = typevar('X') is rejected because the variable name is not the same as the name argument, and a type variable imported from another module with the same name still doesn't make the example work.

I like the mypy restrictions better.

Define kinds of conditional definitions that analyzers / type checkers should support

For things like Python 2/3 compatibility we'd often want use conditional type alias definitions. Example:

if PY2:
    text = unicode
else:
    text = str

def f() -> text: ...

A static analyzer or a type checker can't support arbitrary conditional definitions. Maybe the PEP should define a minimal set of conditional definitions that should be supported. Here are some ideas:

Have some way of checking whether we are on Python 2 or 3. Maybe provide typing.PY2 and typing.PY3 constants?
Literal True / False and 1 / 0 should be supported.
Maybe platform-specific definitions?
Maybe arbitrary boolean constants?

There should be a way to tell the type checker not to check (specific) 3rd party packages

Suppose I am writing an app using type hints but I am importing a 3rd party package that uses annotations for some other purpose. And suppose the author of that package isn't motivated to add # type: OFF comments to the top of every module, and I don't want to have to patch the package every time I install or upgrade it. Then there should be a way to tell the type checker not to check that package, or at least to ignore all annotations in it.

This can probably be a configuration option for the type checker, so perhaps this should be a mypy issue, not a PEP issue, but I think this would be a useful thing to mention in the PEP so people don't worry too much about 3rd party code using annotations for other purposes.

(An alternative would be not to type-check modules that don't import [from] typing, but I think that's limiting, because lots of code never needs any of the facilities defined there, it just wants to use annotations that use built-in types or locally-defined classes.)

The use of comments to communicate with the type checker

mypy supports comments of the form # type: <some_type> and there are proposals for other uses of comments (e.g. # typing: off). Comments are a convenient solution for things that are hard or inefficient to express in existing Python syntax (i.e. code that works at runtime). But they have the downside that they are inaccessible to runtime machinery that might want to use type annotations, and the Python AST module doesn't preserve comments, so code that needs access to the comments must implement its own parser (like mypy does).

I ask: should we define such comments in the PEP or not?

Be able to handle str/bytes in Python 2/3 code

In Python 2/3 code, what str and bytes represent can be considered somewhat muddled depending under what interpreter you are executing (and unicode should be simply left out). I'm not sure if the preferred approach is to have tools like mypy assume str means unicode in Python 2 and str in Python 3 and bytes means Python 3 bytes of to have typing.py have Str and Bytes types to make it abundantly clear. Since function annotations are Python 3 syntax I'm assuming the tools will be more-or-less be assuming Python 3 semantics, but it might be good to state upfront that's the expectation when specified in Python 2/3 code. Going with the former approach does mean the usefulness to Python 2-only code is diminished since the concept of native strings becomes hard to express. The latter solution has the annoyance of not using the built-in types.

Declaring type of variable without initializing with a value

Mypy has the Undefined helper that serves two purposes.

First, it can be used as a dummy value when declaring the types of variables:

from typing import Undefined, List
x = Undefined # type: List[int]

This declares x as a list of int, but it doesn't give x a valid value. Any operation on Undefined (other than passing it around and operations that can't be overloaded) will raise an exception.

Second, Undefined can be used to declare the type of a variable without using a comment:

from typing import Undefined, List
x = Undefined(List[int])

The second example is semantically equivalent to the first example.

In my experience, being able to declare the type of a variable without initializing it with a proper value is often useful. Undefined is better than None for this purpose since None is a valid value. If we see Undefined values in tracebacks then we know that somebody forgot to initialize a variable, but if we see a None value things are murkier. Also, None are valid for certain operations such as if statements and equality checks (and dict lookups), and thus an unexpected None value can easily go undetected. This is less likely with an Undefined value.

Also, we'd like to be able to type check uses of None values (for example, by requiring an Optional[...] type), but Undefined is compatible with all types.

Some tools/implementations (jython, for example, according to Jim Baker) can't easily access types declared as comments, and the Undefined(...) syntax could be used as a workaround.

If we decide to support both of the above variants (+ just using a # type: comment without Undefined), we have the problem that there is no longer a single obvious way to declare the type of a variable. I don't think that the inconsistency is a major issue, since all of the variants serve useful purposes.

Equivalence between similar-looking types

Lukasz asks a series of questions about type equivalence.

Set is not Set[Employee]. Correct!
Set != Set[Employee]. (He didn't ask this, but I'm inserting it in the list because it's a useful question to ask.) Correct again.
Set is Set[Any]. No. I'm tempted to make these different objects, even if they have the same meaning, just so that we can have G[t1][t2] == G[t1, t2] for any generic type taking two arguments. For example a dict with string keys but unspecified values could be written as Dict[str] or Dict[str, Any], but only the first form could be further parametrized with a value type, so Dict[str][int] is equivalent to Dict[str, int].
Set == Set[Any]. Again, no, since these behave differently.
Set[Employee] is Set[Employee]. I think not. Given that the is operator means object identity, this would require some kind of caching of all parametrized type objects, and I'd like to avoid that.
Set[Employee] == Set[employee]. Yes. (The cat should be out of the bag by now. I am proposing an eq method on generic type objects to make this true.)
issubclass(Set[Employee], Set). Hm... This reeks of the covariance/contravariance discussion (issue #2). But for the special case of a parametrizable type and a parametrized version of that type I think this is reasonable.
issubclass(Set[Manager], Set[Employee]). See issue #2.

Describe generic classes as base classes

The PEP should probably mention what happens if a generic class is used as a base class.

Examples to consider:

class StrList(List[str]): ...  # StrList is not generic

T = TypeVar('T')

class MyList(List[T]): ...   # Mypy requires Generic[T] as a base class, but it's ugly

x = Undefined(MyList[int])

What's the proper way to spell the type of a callable?

Python supports many types of parameters -- positional, with defaults, keyword only, var-args, var-keyword-args. The intended signature can not always be unambiguously derived from the declaration (the 'def' syntax). Built-in functions provide additional issues (some don't have names for positional parameters).

The first point to make is that we needn't provide ways to spell every signature type. The most common use case for signatures is callback functions (of various kinds), and in practice these almost always have a fixed number of positional argument. (The exception is the callback for built-in functions like filter() and map(); the callback signatures for these correlate in a weird way to the other arguments passed, and many languages deal with these by overloading a number of fixed-argument versions.)

Second, Python tries hard to avoid the word "function" in favor of "callable", to emphasize that not just functions defined with 'def' qualify, but also built-ins, bound methods, class objects, instances of classes that define a call method, and so on. In mypy, function types are defined using Function, which I actually like better than Callable, but I feel that my hands are bound by the general desire to reuse the names of ABCs (the abc and collections.abc modules) -- collections.abc defines Callable.

Given a function with positional arguments of type (int, str) and returning float, how would we write its signature? In mypy this is written as Function[[int, str], float]. In general, mypy's Function is used with two values in square brackets, the first value being a list of argument types, the second being the return type. This is somewhat awkward because even the simplest function signature requires nested square brackets, and this negatively impacts the readability of signature types.

But what else to use? Using round parentheses runs into all sorts of problems; since Callable is a type it looks like creating an instance. We could drop the inner brackets and write Callable[int, str, float](still reusing the same example). My worry here is that in many cases the return value is irrelevant and/or unused, and it's easy to think that the type of a procedure taking two bool arguments is written Callable[bool, bool] -- but actually that's a function of one bool argument that also returns a bool. The mypy approach makes it easier to catch such mistakes early (or even to allow Callable[[bool, bool]] as a shorthand for Callable[[bool, bool], None]).

I think I've seen other languages where the return type comes first; arguably it's harder to forget the first argument than the last. But everywhere else in Python the return type always comes after the argument types.

In the end I think that mypy's notation, with Callable substituted for Function, is the best we can do. (And if you find Callable too awkward, you can define an alias "Function = Callable" or even "Fun = Callable".)

An Intersection type?

Sometimes you want to say that an argument must follow two or more different protocols. For example, you might want to state that an argument must be an Iterable and a Container (for example, it could be a Set, a Mapping or a Sequence). It would be nice if this could be spelled like this:

def assertIn(item: T, thing: Intersection[Iterable[T], Container[T]]) -> None:
    if item not in thing:
        # Debug output
        for it in thing:
            print(it)

Structural typing

I'm not entirely sure what this is (probably a different work for duck typing, where two classes are equivalent if they define the same methods, public attributes etc.). In Jeremy Siek's notes:

Retroactive Conformance

Jeremy wants to allow concrete types to be usable with
abstract types for which they did not inherit from.
Structural types allow this, and there is research on how to
do this with nominal types.
Jukka voiced the concern that some of those approaches give
up modular type checking. Jeremy agreed that modular type checking
is important, but that perhaps there's a way to have our cake
and eat it too.

This issue is related to the use of the register method on ABC's to
get the right behavior with isinstance.

Structural vs. Nominal Types

We briefly discussed the possibility of having structual
object types in addition to the nominal types currently in MyPy.
Reticulated currently suppose structural object types and
not nominal types.

Covariance or contravariance

As Lukasz mentions in NOTES-ambv.txt: "The covariance/contravariance discussion needs to happen at some point."

Reference: http://en.wikipedia.org/wiki/Covariance_and_contravariance_(computer_science)

I haven't fully digested all the viewpoints yet. It seems that mypy stays on the safe side and treats e.g. Set[Employee] and Set[Manager] as distinct types. However, Set[Employee] is acceptable where a Set (without parameter) is expected.