quora / pyanalyze Goto Github PK
View Code? Open in Web Editor NEWA Python type checker
License: Apache License 2.0
A Python type checker
License: Apache License 2.0
If pyanalyze sees an access to an attribute that it doesn't know about on a typed value, it saves it to the end of its run, then compares the set of attribute reads to the set of attribute writes and emits an error if it thinks the attribute doesn't exist. This is a weird mechanism that I'd like to move away from, but it might not always be possible.
However, there are a few categories of classes for which we can safely assume we know all attributes (unless they override __getattr__
or __getattribute__
, but then all bets are off):
__slots__
__dict__
For such classes, we can skip calls to the ClassAttributeChecker and just emit an error immediately if we see an access to an attribute that doesn't exist.
Currently pyanalyze just gives up on typechecking any call that passes *args or **kwargs. Let's fix that. #51 could perhaps help here. The code doing this is at:
pyanalyze/pyanalyze/name_check_visitor.py
Line 3287 in 7366253
Is there a way to access the parsed AST, with the inferred types? I want to write a code translation tool, and knowing the types would be very useful.
I mean, having a regular AST (similar to ast.AST
) but with added attributes, such as: declared_type
, and inferred_types
.
I tried to look through the code, but couldn't figure it out.
For compatibility, pyanalyze should support # type: ignore
in addition to # static analysis: ignore
. Error code-specific ignores should still work the same way. The functionality for using a custom ignore comment should not accept # type: ignore
.
This can be implemented mostly in node_visitor.py
.
This could potentially make pyanalyze.extensions.AsynqCallable
redundant.
pyanalyze currently crashes on Tuple[()]
In files with a file-level # static analysis: ignore
, we should not report unused objects either.
Currently, when there is a type mismatch, we just print out the two full types, but that produces errors like this in code with complicated Unions:
Incompatible argument type for column: expected Union[str, qclient.db.builder.Column, qclient.db.builder.BinOp, qclient.db.builder.Function, qclient.db.builder.As, qclient.db.builder._UnaryFunc, qcore.helpers.MarkerObject] but got Union[Any, str, qclient.db.builder.Column, qclient.db.builder.BinOp, qclient.db.builder.Function, qclient.db.builder.As, qclient.db.builder._UnaryFunc, qcore.helpers.MarkerObject, tuple[str, Union[int, float, str, bytes, enum.Enum, qtype.ThriftEnum, qclient.db.builder.Column, qclient.db.builder.BinOp, qclient.db.builder.Function, qclient.db.builder._UnaryFunc, qclient.db.builder.SubSelect, collections.abc.Collection[Union[int, float, str, bytes, enum.Enum, qtype.ThriftEnum, qclient.db.builder.Column, qclient.db.builder.BinOp, qclient.db.builder.Function, qclient.db.builder._UnaryFunc, qclient.db.builder.SubSelect]]]]] (code: incompatible_argument)
Instead of dumping out the whole type, the error message should tell us which part of the union on RHS isn't assignable to the LHS. Similarly, if the RHS is a DictIncompleteValue and some specific value in the dict isn't assignable to the expected type, tell us which key-value pair it is.
"x" in Union[None, str]
doesn't show an error even with #85 enabled.
Pyanalyze currently produces no errors on this code:
def f(cond: bool) -> int:
if cond:
return 0
pyanalyze currently produces no errors on this code:
from asynq import asynq
@asynq()
def f() -> int:
yield f.asynq()
For example, reveal_type(formatter.NullFormatter)
shows:
Revealed type is 'Literal[<class 'formatter.NullFormatter'>]', signature is (writer=Composite(value=KnownValue(val=None), varname=None, node=None)) -> formatter.NullFormatter (code: inference_failure)
But it should pick up the annotation in https://github.com/python/typeshed/blob/master/stdlib/formatter.pyi#L9
The unused object finder frequently flags module objects as unused. It looks like this affects modules that are only imported through from a.b import c
—this marks a.b.c
as used, but not a.b
.
A few possible solutions:
from a.b import c
also mark a.b
and a
as used.For now we're using the second solution internally but the first one would be better.
In signature.py, we use follow_wrapped=False
when getting the signature of a callable:
pyanalyze/pyanalyze/arg_spec.py
Line 449 in 2ae6915
It would be nice if we don't, because that will allow us to get accurate signatures for more decorated functions. When I tried this, it caused problems with some decorators that modify the function's signature but use @functools.wrap
, notably @mock.patch
(which passes additional arguments to the decorated function). I've also run into issues with @contextlib.contextmanager
, which changes the return type of the decorated function.
Possible solutions:
.__signature__
attribute on the decorated function (but that's not an option for third-party decorators).patchings
attribute; we could internally ignore the parameter corresponding to any entries in .patchings
that have their .new
attribute set to mock.DEFAULT
. But there doesn't seem to be a similar way to identify a function that was decorated with @contextmanager
.Pyanalyze currently doesn't support overloaded functions at all. It should. Some thoughts in no particular order:
pyanalyze.extensions.overload
decorator that somehow does register the overloads at runtime. It could be an alias of from typing import overload
under if TYPE_CHECKING
for compatibility with other type checkers. We could add an @overload_implementation
decorator that collects the other overloads and stores them in the function dict or something.implementation.py
. It might require changes to typeshed-client though.pyanalyze.signature.Signature
. It would hold all of the overloaded signatures in an attribute.@overload
def f(x: int) -> Literal[1]: ...
@overload
def f(x: str) -> Literal[2]: ...
y: Union[int, str]
f(y)
I would want this to return Literal[1, 2]
, but naively looping over the overloads would not work for that; it would require special handling for Union. Interestingly, though, TypeScript doesn't let you do this. You have to write a separate overload for the Union instead.
list.append
etc.), for which there are currently complicated impl functions. These overloads would require being able to dispatch on whether a value is annotated with WeakExtension.Value: Literal[<function NewType.<locals>.new_type at 0x7f92f8d885e0>], signature: Signature(signature=<Signature (x) -> UnresolvedValue()>, implementation=None, callable=<function NewType.<locals>.new_type at 0x7f92f8d885e0>, has_return_annotation=False) (code: inference_failure)
In newt.py at line 6:
3:
4: NT = NewType("NT", int)
5:
6: dump_value(NT)
^
The inferred signature should be (x: int, /) -> NT
.
We should support PEP 647's TypeGuard. The pyanalyze.extensions.ParameterTypeGuard
primitive added in #115 is very similar, so it shouldn't be much additional work. Implementation outline:
TypeGuard
to pyanalyze.extensions
(unless it makes it into typing_extensions
first)TypeGuardExtension
to value.py
annotations.py
turns TypeGuard[T]
into AnnotatedValue(bool, [TypeGuardExtension(T)])
signature.py
that consumes ParameterTypeGuardExtension
is changed to also handle TypeGuardExtension
in a similar way. The main tricky part is going to be to make it match the second instead of the first parameter for methods.Currently, codes can be additively enabled and disabled on top of the default config with the -e
and -d
CLI options. However, when --enable-all
is passed, -d
is ignored. It would be very useful to be able enable all codes except for the specified ones, without the tedious and fragile alternative of manually specifying every default-disabled code with -e
. This is easy to accomplish by simply moving lines 338-342 up a level in node_visitor.py
.
Equally, it would be useful to very useful to support being able to only enable specific specific listed codes, e.g. with a --disable-all
argument, in concert with the change. Again, this would be simple to add, just add another branch to the if above plus the arg itself. It could either be made exclusive with --enable-all
, or could override it (as duplicate codes with -e
currently does with -d
).
Given its simple to do, I'd be happy to submit a PR to this effect. Thanks!
Currently, pyanalyze doesn't support Protocols natively. (@runtime_checkable
protocols are somewhat supported because we delegate to isinstance()
.) Proper support for Callable (#52) is probably a prerequisite for doing this.
This is not good:
Value: list (code: inference_failure)
In pyan.py at line 3:
1: from pyanalyze import dump_value
2: def f():
3: dump_value([*(1, 2), *(3, 4)])
^
$ python -m pyanalyze tests
Traceback (most recent call last):
File "/Users/tekumara/.virtualenvs/brp/lib/python3.6/site-packages/pyanalyze/name_check_visitor.py", line 632, in _load_module
return self.load_module(self.filename)
File "/Users/tekumara/.virtualenvs/brp/lib/python3.6/site-packages/pyanalyze/name_check_visitor.py", line 653, in load_module
filename, self.config.PATHS_EXCLUDED_FROM_IMPORT
File "/Users/tekumara/.virtualenvs/brp/lib/python3.6/site-packages/pyanalyze/importer.py", line 56, in load_module_from_file
module = _importer.importFromPath(abspath, module_path)
File "/Users/tekumara/.virtualenvs/brp/lib/python3.6/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/Users/tekumara/.virtualenvs/brp/lib/python3.6/site-packages/nose/importer.py", line 79, in importFromDir
fh, filename, desc = find_module(part, path)
File "/Users/tekumara/.pyenv/versions/3.6.10/lib/python3.6/imp.py", line 297, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'Users'
Classes annotated with @final
should not allow subclasses, and methods decorated with @final
should not allow overrides.
The return value from boolean ops (e.g. x and y
) is currently always inferred as Any. It should be a union of the operands, with some constraints on the left operands.
When I introduced AnnotatedValue I didn't really think through the implications of how it interacts with MultiValuedValue (Union). I now think the right thing to do is to introduce an invariant that the inner type of an AnnotatedValue is never a MultiValuedValue. We should automatically transform Annotated[Union[A, B], C]
into Union[Annotated[A, C], Annotated[B, C]]
.
>>> iv # our variable
MultiValuedValue(vals=(SequenceIncompleteValue(typ=<class 'list'>, args=(UnresolvedValue(),), members=(UnresolvedValue(),)), KnownValue(val=[])))
>>> iv.get_type()
None
>>> iv.get_type_value()
MultiValuedValue(vals=(KnownValue(val=<class 'list'>), KnownValue(val=<class 'list'>)))
>>> iv.get_type_value().get_type()
None
I would expect get_type()
to return list
, since that is clearly the type of this node.
#164 added a new errors framework that supports multiple errors per test case and is more precise about where errors appear. We should migrate all test cases to this system and remove the @assert_fails
decorator.
Implementation functions are callbacks for specific functions that are handled specially. Currently, they are passed three arguments: a dictionary with the arguments passed to the function, the AST node for the call (mostly useful for showing errors), and the visitor object.
This design makes it hard to change what information we send to the implementation functions. If instead we passed a special object (let's call it a CallContext
), we could more easily adjust what data is passed to the visitors.
Here's what this context could look like:
Value: Union[int, None] (code: inference_failure)
In getitemconst.py at line 6
3:
4: def capybara(dct: Dict[int, Optional[int]]) -> None:
5: if dct[0] is not None:
6: dump_value(dct[0])
^
7:
This should produce int
instead. We already support this for attributes.
Consider this code:
from collections.abc import AsyncIterable
async def agen_things() -> AsyncIterable[str]:
yield "xx"
mypy
is happy with it, but pyanalyze
reports a traceback TypeError: 'ABCMeta' object is not subscriptable
To allow users to use the runtime tools pyanalyze provides without installing the whole checker at runtime, we should provide a separate pyanalyze_extensions
package that contains only the few functions that are commonly useful at runtime.
It should contain:
extensions.py
file@used
and @test_helper
decorators from the unused object finderdump_value
function, although adding support for reveal_type
might be betterVariables annotated with Final should not allow reassignment
There is some limited logic for handling unpacking (a, b = c
) in the _visit_display
method, but it should be better:
MultiValuedValue
The pyanalyze.argspec.ExtendedArgSpec
type is used for type checking function calls. It's a bit of a pile of hacks: it uses a piece of autogenerated exec
ed code to properly handle keyword arguments; it goes through many complicated layers of unwrapping to get to the underlying function; and it was built mostly with Python 2's inspect.ArgSpec
in mind.
But Python 3 provides inspect.Signature
, which is much more powerful and handles some of what we need natively. For example, the Signature.bind
method could probably provide a replacement for the exec
dance we currently do. Similarly, Signature
would provide support for positional-only arguments in versions of Python that don't have native support for them.
We don't need the Python 2 support internally. It could be useful for others who are doing a Python 2/3 migration, but I haven't heard of anyone doing that. I'd like to drop Python 2 support so pyanalyze can use type annotations internally and I have to worry less about syntax in test cases.
And while we're at it, probably also the experimental Required/NotRequired syntax.
Thanks for your work on this! Running pyanalyze produced a DeprecationWarning
due to using the standard library imp
module in pyanalyze/name_check_visitor.py, which has bee deprecated since Python 3.4 in favor of importlib. Can it be replaced with the modern importlib
(especially now that Python 2 support has been dropped)? Thanks!
If you inherit explicitly from Generic, the order of the type parameters passed to Generic is what matters:
>>> class Coroutine(Awaitable[_V_co], Generic[_T_co, _T_contra, _V_co]): pass
...
>>> Coroutine.__parameters__
(+_T_co, -_T_contra, +_V_co)
But I think pyanalyze (at least when reading types from typeshed) will just put them in the order it encounters them in the bases, which would put _V_co first.
The following code demonstrates the bug:
a = [1,2,3]
if a[:0]:
print("// bad")
import pyanalyze
def main():
tree = pyanalyze.ast_annotator.annotate_file(__file__)
print(repr(tree.body[1].test.inferred_value)) # UnresolvedValue()
print(repr(tree.body[1].test.value.inferred_value)) # KnownValue(val=[1, 2, 3])
main()
The issue from my standpoint, is that when I ask for the type of the if's condition, I get None
instead of list
, even though there should be no doubt that it is a list.
I'd be willing to attempt to fix it myself, if you could give me a few pointers.
Btw,
Your library has been very useful to me so far, to establish call graphs and discern types, so thank you!
The current runtime implementation of AsynqCallable
makes it so that Optional[AsynqCallable]
throws a runtime error because of runtime typechecks in typing.py.
We have some support for typevars now, but currently pyanalyzes ignores most non-standard options:
Consider these two signatures:
(*args: tuple[int]) -> Any
(*args: tuple[int, int]) -> Any
These two signatures are incompatible because a call f(1)
would succeed on (1) but not on (2), but currently Signature.can_assign
would allow them. This is low priority to fix because it isn't even possible to express these types in annotations, only by manually creating a signature.
There is an analogous issue with TypedDict and **kwargs
. There, fixing it should probably go hand in hand with improving support for required/non-required keys.
I promised in python/typing#213 that I'd add support for Intersection, so I'd better do it. I'd like to get Protocol support (#163) in first, though, because most realistic use cases for Intersection involve protocols. Here's a sketch of how the implementation could work:
pyanalyze.extensions.Intersection
, which would work similar to Union
at runtime. Intersection[int, str]
means a value which is both an int and a str. We should also support int & str
in string annotations, similar to the existing support for int | str
.IntersectionValue
to value.py
and handle it in lots of places. For example:
IntersectionValue.can_assign(val)
would check that val is assignable to all members of the intersectionval.can_assign(intersectionValue)
would succeed if any member of the intersection is assignable to val"type[int]" should be legal in recent versions of Python but pyanalyze still rejects it.
See https://bugs.python.org/issue42195 for some context
Currently pyanalyze basically doesn't support typing.Callable
; the arguments and return type are simply ignored. A proper implementation of Callable type would share a lot of characteristics with the ExtendedArgSpec
type that pyanalyze uses for type checking function calls, so #51 could also help make this issue easier.
I wrote much of pyanalyze with a focus on avoiding false positives at all costs. That's nice, but it means that often pyanalyze is a bit too eager to just fall back to Any/UNRESOLVED_VALUE types and give up on finding errors. There's probably a bunch more lurking issues like #63 where we can fairly easily do better. Let's find more of them.
Some strategies:
NameCheckVisitor.visit
method), then run pyanalyze on a fully typed codebase and see what happens.If I install pyanalyze
via pip
, it will install the latest typeshed-client
(1.0.x) together with it. However, pyanalyze
tries to pass a version
argument to the Resolver
constructor and that argument existed in 0.4.1 but doesn't exist anymore in the latest release.
In #110 the code of pyanalyze
was updated to deal with this issue, but that fix is from a month ago while the last release of pyanalyze
on PyPI is almost a year old. I think it would be good to either release more often, or change the README to suggest installing pyanalyze
from Git instead of PyPI.
Traceback (most recent call last):
File "/v/site-packages/pyanalyze/name_check_visitor.py", line 705, in visit
ret = method(node)
File "/v/site-packages/pyanalyze/name_check_visitor.py", line 2843, in visit_Subscript
return_value, _ = self._get_argspec_and_check_call(
File "/v/site-packages/pyanalyze/name_check_visitor.py", line 3372, in _get_argspec_and_check_call
extended_argspec = self._get_argspec_from_value(callee_wrapped, node)
File "/v/site-packages/pyanalyze/name_check_visitor.py", line 3465, in _get_argspec_from_value
return self._get_argspec(callee_wrapped.val, node, name=name)
File "/v/site-packages/pyanalyze/name_check_visitor.py", line 3495, in _get_argspec
return self.arg_spec_cache.get_argspec(obj, name=name, logger=self.log)
File "/v/site-packages/pyanalyze/arg_spec.py", line 1124, in get_argspec
argspec = self._cached_get_argspec(obj, kwargs)
File "/v/site-packages/pyanalyze/arg_spec.py", line 1136, in _cached_get_argspec
extended = self._uncached_get_argspec(obj, kwargs)
File "v/site-packages/pyanalyze/arg_spec.py", line 1161, in _uncached_get_argspec
return BoundMethodArgSpecWrapper(argspec, KnownValue(obj.__self__))
File "/v/site-packages/pyanalyze/arg_spec.py", line 142, in __init__
assert isinstance(argspec, ExtendedArgSpec), "invalid argspec %r" % (argspec,)
AssertionError: invalid argspec BoundMethodArgSpecWrapper(
argspec=ExtendedArgSpec(
_has_return_value=False,
arguments=[Parameter(default_value=<object object at 0x7fa6a833f990>, name='self', typ=None)],
implementation=None,
kwargs='kwargs',
kwonly_args=None,
name='GenericAlias',
params_of_names={
'self': Parameter(default_value=<object object at 0x7fa6a833f990>, name='self', typ=None),
'args': Parameter(default_value=<object object at 0x7fa6a833f990>, name='args', typ=TypedValue(typ=<class 'tuple'>)),
'kwargs': Parameter(default_value=<object object at 0x7fa6a833f990>, name='kwargs', typ=TypedValue(typ=<class 'dict'>))},
return_value=UnresolvedValue(), starargs='args'),
self_value=TypedValue(typ=<class 'types.GenericAlias'>))
Internal error: AssertionError(-"-) (code: internal_error)
I'm sorry the code is closed-source, I could try to track it down somehow...
I get this error using Python 3.9.0b1
but not 3.8
During some profiling I noticed a lot of time being spent in MultiValuedValue.can_assign
where the value contained a ton of string literal KnownValues. It would be more efficient if we put them all in a set. This could perhaps be implemented as a subclass of MultiValuedValue that would only hold KnownValues of hashable objects. Then it can internally keep a set of all values, and do fast checking for some common cases.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.