quora / pyanalyze Goto Github PK

View Code? Open in Web Editor NEW

302.0 9.0 34.0 2.5 MB

A Python type checker

License: Apache License 2.0

Python 100.00%

linter python static-analysis typechecker types typing

pyanalyze's People

Contributors

Stargazers

Watchers

pyanalyze's Issues

Don't use ClassAttributeChecker for classes we know all attrs for

If pyanalyze sees an access to an attribute that it doesn't know about on a typed value, it saves it to the end of its run, then compares the set of attribute reads to the set of attribute writes and emits an error if it thinks the attribute doesn't exist. This is a weird mechanism that I'd like to move away from, but it might not always be possible.

However, there are a few categories of classes for which we can safely assume we know all attributes (unless they override __getattr__ or __getattribute__, but then all bets are off):

Dataclasses or attrs classes (if all their parent classes are also dataclasses)
namedtuples
Classes implemented in C (usually)
Classes with __slots__
More generally, classes for which instances will not have a __dict__

For such classes, we can skip calls to the ClassAttributeChecker and just emit an error immediately if we see an access to an attribute that doesn't exist.

Type check calls with *args/**kwargs

Currently pyanalyze just gives up on typechecking any call that passes *args or **kwargs. Let's fix that. #51 could perhaps help here. The code doing this is at:

pyanalyze/pyanalyze/name_check_visitor.py

Line 3287 in 7366253

elif has_args_kwargs:

Annotated AST

Is there a way to access the parsed AST, with the inferred types? I want to write a code translation tool, and knowing the types would be very useful.
I mean, having a regular AST (similar to ast.AST) but with added attributes, such as: declared_type, and inferred_types.

I tried to look through the code, but couldn't figure it out.

Support `# type: ignore`

For compatibility, pyanalyze should support # type: ignore in addition to # static analysis: ignore. Error code-specific ignores should still work the same way. The functionality for using a custom ignore comment should not accept # type: ignore.

This can be implemented mostly in node_visitor.py.

Support ParamSpec and Concatenate (PEP 612)

This could potentially make pyanalyze.extensions.AsynqCallable redundant.

Support empty tuple type

pyanalyze currently crashes on Tuple[()]

Unused finder should ignore ignored files

In files with a file-level # static analysis: ignore, we should not report unused objects either.

Improve error messages for complicated types

Currently, when there is a type mismatch, we just print out the two full types, but that produces errors like this in code with complicated Unions:

Incompatible argument type for column: expected Union[str, qclient.db.builder.Column, qclient.db.builder.BinOp, qclient.db.builder.Function, qclient.db.builder.As, qclient.db.builder._UnaryFunc, qcore.helpers.MarkerObject] but got Union[Any, str, qclient.db.builder.Column, qclient.db.builder.BinOp, qclient.db.builder.Function, qclient.db.builder.As, qclient.db.builder._UnaryFunc, qcore.helpers.MarkerObject, tuple[str, Union[int, float, str, bytes, enum.Enum, qtype.ThriftEnum, qclient.db.builder.Column, qclient.db.builder.BinOp, qclient.db.builder.Function, qclient.db.builder._UnaryFunc, qclient.db.builder.SubSelect, collections.abc.Collection[Union[int, float, str, bytes, enum.Enum, qtype.ThriftEnum, qclient.db.builder.Column, qclient.db.builder.BinOp, qclient.db.builder.Function, qclient.db.builder._UnaryFunc, qclient.db.builder.SubSelect]]]]] (code: incompatible_argument)

Instead of dumping out the whole type, the error message should tell us which part of the union on RHS isn't assignable to the LHS. Similarly, if the RHS is a DictIncompleteValue and some specific value in the dict isn't assignable to the expected type, tell us which key-value pair it is.

Better check for `contains`

"x" in Union[None, str] doesn't show an error even with #85 enabled.

Catch conditional missing return

Pyanalyze currently produces no errors on this code:

def f(cond: bool) -> int:
    if cond:
        return 0

Catch missing return in asynq function

pyanalyze currently produces no errors on this code:

from asynq import asynq

@asynq()
def f() -> int:
    yield f.asynq()

`init` signatures are not picked up from typeshed

For example, reveal_type(formatter.NullFormatter) shows:

Revealed type is 'Literal[<class 'formatter.NullFormatter'>]', signature is (writer=Composite(value=KnownValue(val=None), varname=None, node=None)) -> formatter.NullFormatter (code: inference_failure)

But it should pick up the annotation in https://github.com/python/typeshed/blob/master/stdlib/formatter.pyi#L9

Unused finder has lots of false positives for modules

The unused object finder frequently flags module objects as unused. It looks like this affects modules that are only imported through from a.b import c—this marks a.b.c as used, but not a.b.

A few possible solutions:

Make from a.b import c also mark a.b and a as used.
Exclude modules completely.

For now we're using the second solution internally but the first one would be better.

Remove follow_wrapped=False when getting signatures

In signature.py, we use follow_wrapped=False when getting the signature of a callable:

pyanalyze/pyanalyze/arg_spec.py

Line 449 in 2ae6915

return inspect.signature(obj, follow_wrapped=False)

It would be nice if we don't, because that will allow us to get accurate signatures for more decorated functions. When I tried this, it caused problems with some decorators that modify the function's signature but use @functools.wrap, notably @mock.patch (which passes additional arguments to the decorated function). I've also run into issues with @contextlib.contextmanager, which changes the return type of the decorated function.

Possible solutions:

Modify these decorators to set an accurate .__signature__ attribute on the decorated function (but that's not an option for third-party decorators)
Add special casing for some of these. For example, patched functions have a .patchings attribute; we could internally ignore the parameter corresponding to any entries in .patchings that have their .new attribute set to mock.DEFAULT. But there doesn't seem to be a similar way to identify a function that was decorated with @contextmanager.

Support overloaded functions

Pyanalyze currently doesn't support overloaded functions at all. It should. Some thoughts in no particular order:

Supporting overloads for runtime code is hard because the overloads aren't stored in the module dict. The best solution I can think of is to add a pyanalyze.extensions.overload decorator that somehow does register the overloads at runtime. It could be an alias of from typing import overload under if TYPE_CHECKING for compatibility with other type checkers. We could add an @overload_implementation decorator that collects the other overloads and stores them in the function dict or something.
But it's much easier to support them in stubs and implementation.py. It might require changes to typeshed-client though.
Internally an overloaded function could be represented as a new sister class of pyanalyze.signature.Signature. It would hold all of the overloaded signatures in an attribute.
Checking a call to an overload should try them one by one and succeed if any overload succeeds. But consider this example:

@overload
def f(x: int) -> Literal[1]: ...
@overload
def f(x: str) -> Literal[2]: ...
y: Union[int, str]
f(y)

I would want this to return Literal[1, 2], but naively looping over the overloads would not work for that; it would require special handling for Union. Interestingly, though, TypeScript doesn't let you do this. You have to write a separate overload for the Union instead.

Overloads could be useful internally for representing the generic mutators (list.append etc.), for which there are currently complicated impl functions. These overloads would require being able to dispatch on whether a value is annotated with WeakExtension.

Infer correct signature for NewType constructors

Value: Literal[<function NewType.<locals>.new_type at 0x7f92f8d885e0>], signature: Signature(signature=<Signature (x) -> UnresolvedValue()>, implementation=None, callable=<function NewType.<locals>.new_type at 0x7f92f8d885e0>, has_return_annotation=False) (code: inference_failure)
In newt.py at line 6:
   3: 
   4: NT = NewType("NT", int)
   5: 
   6: dump_value(NT)
      ^

The inferred signature should be (x: int, /) -> NT.

Support for TypeGuard

We should support PEP 647's TypeGuard. The pyanalyze.extensions.ParameterTypeGuard primitive added in #115 is very similar, so it shouldn't be much additional work. Implementation outline:

Add TypeGuard to pyanalyze.extensions (unless it makes it into typing_extensions first)
Add TypeGuardExtension to value.py
annotations.py turns TypeGuard[T] into AnnotatedValue(bool, [TypeGuardExtension(T)])
The code in signature.py that consumes ParameterTypeGuardExtension is changed to also handle TypeGuardExtension in a similar way. The main tricky part is going to be to make it match the second instead of the first parameter for methods.

Allow disabling specific checks with --enable-all and add corresponding --disable-all flag

Currently, codes can be additively enabled and disabled on top of the default config with the -e and -d CLI options. However, when --enable-all is passed, -d is ignored. It would be very useful to be able enable all codes except for the specified ones, without the tedious and fragile alternative of manually specifying every default-disabled code with -e. This is easy to accomplish by simply moving lines 338-342 up a level in node_visitor.py.

Equally, it would be useful to very useful to support being able to only enable specific specific listed codes, e.g. with a --disable-all argument, in concert with the change. Again, this would be simple to add, just add another branch to the if above plus the arg itself. It could either be made exclusive with --enable-all, or could override it (as duplicate codes with -e currently does with -d).

Given its simple to do, I'd be happy to submit a PR to this effect. Thanks!

Support Protocols

Currently, pyanalyze doesn't support Protocols natively. (@runtime_checkable protocols are somewhat supported because we delegate to isinstance().) Proper support for Callable (#52) is probably a prerequisite for doing this.

Infer better types with iterable unpacking

This is not good:

Value: list (code: inference_failure)
In pyan.py at line 3:
   1: from pyanalyze import dump_value
   2: def f():
   3:     dump_value([*(1, 2), *(3, 4)])
          ^

ImportError: No module named 'Users'

$  python -m pyanalyze tests
Traceback (most recent call last):
  File "/Users/tekumara/.virtualenvs/brp/lib/python3.6/site-packages/pyanalyze/name_check_visitor.py", line 632, in _load_module
    return self.load_module(self.filename)
  File "/Users/tekumara/.virtualenvs/brp/lib/python3.6/site-packages/pyanalyze/name_check_visitor.py", line 653, in load_module
    filename, self.config.PATHS_EXCLUDED_FROM_IMPORT
  File "/Users/tekumara/.virtualenvs/brp/lib/python3.6/site-packages/pyanalyze/importer.py", line 56, in load_module_from_file
    module = _importer.importFromPath(abspath, module_path)
  File "/Users/tekumara/.virtualenvs/brp/lib/python3.6/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/Users/tekumara/.virtualenvs/brp/lib/python3.6/site-packages/nose/importer.py", line 79, in importFromDir
    fh, filename, desc = find_module(part, path)
  File "/Users/tekumara/.pyenv/versions/3.6.10/lib/python3.6/imp.py", line 297, in find_module
    raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'Users'

Support @final (PEP 591)

Classes annotated with @final should not allow subclasses, and methods decorated with @final should not allow overrides.

Return value from boolean ops is not inferred

The return value from boolean ops (e.g. x and y) is currently always inferred as Any. It should be a union of the operands, with some constraints on the left operands.

Make AnnotatedValue never annotate a Union

When I introduced AnnotatedValue I didn't really think through the implications of how it interacts with MultiValuedValue (Union). I now think the right thing to do is to introduce an invariant that the inner type of an AnnotatedValue is never a MultiValuedValue. We should automatically transform Annotated[Union[A, B], C] into Union[Annotated[A, C], Annotated[B, C]].

Incorrect Behavior in `MultiValuedValue.get_type()`

>>> iv   # our variable
MultiValuedValue(vals=(SequenceIncompleteValue(typ=<class 'list'>, args=(UnresolvedValue(),), members=(UnresolvedValue(),)), KnownValue(val=[])))
>>> iv.get_type()
None
>>> iv.get_type_value()
MultiValuedValue(vals=(KnownValue(val=<class 'list'>), KnownValue(val=<class 'list'>)))
>>> iv.get_type_value().get_type()
None

I would expect get_type() to return list, since that is clearly the type of this node.

Migrate to new errors framework

#164 added a new errors framework that supports multiple errors per test case and is more precise about where errors appear. We should migrate all test cases to this system and remove the @assert_fails decorator.

Redesign what we pass to implementation functions

Implementation functions are callbacks for specific functions that are handled specially. Currently, they are passed three arguments: a dictionary with the arguments passed to the function, the AST node for the call (mostly useful for showing errors), and the visitor object.

This design makes it hard to change what information we send to the implementation functions. If instead we passed a special object (let's call it a CallContext), we could more easily adjust what data is passed to the visitors.

Here's what this context could look like:

variables dict {str: Value}
composites dict {argument name: Varname}, to make it easier to set constraints on arguments
AST node
visitor object (so you can still do fancy things directly with it if you want)
Convenience methods for showing an error on the Call node or on any argument

Support constraints on container elements

Value: Union[int, None] (code: inference_failure)
In getitemconst.py at line 6
   3: 
   4: def capybara(dct: Dict[int, Optional[int]]) -> None:
   5:     if dct[0] is not None:
   6:         dump_value(dct[0])
              ^
   7:

This should produce int instead. We already support this for attributes.

Question on reusing real types in annotations

Consider this code:

from collections.abc import AsyncIterable


async def agen_things() -> AsyncIterable[str]:
    yield "xx"

mypy is happy with it, but pyanalyze reports a traceback TypeError: 'ABCMeta' object is not subscriptable

Create a separately installable `pyanalyze_extensions`

To allow users to use the runtime tools pyanalyze provides without installing the whole checker at runtime, we should provide a separate pyanalyze_extensions package that contains only the few functions that are commonly useful at runtime.

It should contain:

The extensions.py file
The @used and @test_helper decorators from the unused object finder
Maybe the dump_value function, although adding support for reveal_type might be better

Support Final (PEP 591)

Variables annotated with Final should not allow reassignment

Improve unpacking logic in _visit_display

There is some limited logic for handling unpacking (a, b = c) in the _visit_display method, but it should be better:

Handle MultiValuedValue
Handle arbitrary generics
Error if the type is not iterable

Does not support NamedTuple arguments.

Make ExtendedArgSpec be more based on inspect.Signature

The pyanalyze.argspec.ExtendedArgSpec type is used for type checking function calls. It's a bit of a pile of hacks: it uses a piece of autogenerated execed code to properly handle keyword arguments; it goes through many complicated layers of unwrapping to get to the underlying function; and it was built mostly with Python 2's inspect.ArgSpec in mind.

But Python 3 provides inspect.Signature, which is much more powerful and handles some of what we need natively. For example, the Signature.bind method could probably provide a replacement for the exec dance we currently do. Similarly, Signature would provide support for positional-only arguments in versions of Python that don't have native support for them.

Drop support for Python 2

We don't need the Python 2 support internally. It could be useful for others who are doing a Python 2/3 migration, but I haven't heard of anyone doing that. I'd like to drop Python 2 support so pyanalyze can use type annotations internally and I have to worry less about syntax in test cases.

Support TypedDict with total=False

And while we're at it, probably also the experimental Required/NotRequired syntax.

Consider replacing long-deprecated `imp` module with modern `importlib`

Thanks for your work on this! Running pyanalyze produced a DeprecationWarning due to using the standard library imp module in pyanalyze/name_check_visitor.py, which has bee deprecated since Python 3.4 in favor of importlib. Can it be replaced with the modern importlib (especially now that Python 2 support has been dropped)? Thanks!

Order of type parameters

If you inherit explicitly from Generic, the order of the type parameters passed to Generic is what matters:

>>> class Coroutine(Awaitable[_V_co], Generic[_T_co, _T_contra, _V_co]): pass
... 
>>> Coroutine.__parameters__
(+_T_co, -_T_contra, +_V_co)

But I think pyanalyze (at least when reading types from typeshed) will just put them in the order it encounters them in the bases, which would put _V_co first.

Bug Report: inferred_value doesn't propagate to the subscript of a list

The following code demonstrates the bug:

a = [1,2,3]

if a[:0]:
	print("// bad")

import pyanalyze
def main():
	tree = pyanalyze.ast_annotator.annotate_file(__file__)
	print(repr(tree.body[1].test.inferred_value))			# UnresolvedValue()
	print(repr(tree.body[1].test.value.inferred_value))		# KnownValue(val=[1, 2, 3])

main()

The issue from my standpoint, is that when I ask for the type of the if's condition, I get None instead of list, even though there should be no doubt that it is a list.

I'd be willing to attempt to fix it myself, if you could give me a few pointers.

Btw,

Your library has been very useful to me so far, to establish call graphs and discern types, so thank you!

Support `Optional[AsynqCallable]`

The current runtime implementation of AsynqCallable makes it so that Optional[AsynqCallable] throws a runtime error because of runtime typechecks in typing.py.

Support more TypeVar features

We have some support for typevars now, but currently pyanalyzes ignores most non-standard options:

Variance is ignored. I'm inclined to keep treating invariant typevars as covariant (the current behavior), because in my experience, the way the Python type system defaults to invariance is more likely to lead to annoying false positives than to find genuine issues. And this is likely to be even worse with pyanalyze than with other type checkers, because pyanalyze is more likely to infer precise Literal types.
Contravariant typevars should be supported though. We'll need some concept of contravariance for Callable types (#52) anyway.
Typevars with value restriction
Typevars with bounds

Signature.can_assign is wrong for some exotic cases

Consider these two signatures:

(1) (*args: tuple[int]) -> Any
(2) (*args: tuple[int, int]) -> Any
Where *args is the type of the whole tuple, not the type of individual elements (different from the convention in code).

These two signatures are incompatible because a call f(1) would succeed on (1) but not on (2), but currently Signature.can_assign would allow them. This is low priority to fix because it isn't even possible to express these types in annotations, only by manually creating a signature.

There is an analogous issue with TypedDict and **kwargs. There, fixing it should probably go hand in hand with improving support for required/non-required keys.

Add an Intersection type

I promised in python/typing#213 that I'd add support for Intersection, so I'd better do it. I'd like to get Protocol support (#163) in first, though, because most realistic use cases for Intersection involve protocols. Here's a sketch of how the implementation could work:

Add pyanalyze.extensions.Intersection, which would work similar to Union at runtime. Intersection[int, str] means a value which is both an int and a str. We should also support int & str in string annotations, similar to the existing support for int | str.
Add a new IntersectionValue to value.py and handle it in lots of places. For example:
- IntersectionValue.can_assign(val) would check that val is assignable to all members of the intersection
- val.can_assign(intersectionValue) would succeed if any member of the intersection is assignable to val
- Getting an attribute on an IntersectionValue should return something if any member of the intersection has the attribute
  - What if they have conflicting values? I guess we can return another Intersection
Find ways to use Intersection internally. For example, various narrowing operations could return Intersection objects.

Support "type[int]"

"type[int]" should be legal in recent versions of Python but pyanalyze still rejects it.

Support collections.abc.Callable in Python 3.9

See https://bugs.python.org/issue42195 for some context

Support Callable types

Currently pyanalyze basically doesn't support typing.Callable; the arguments and return type are simply ignored. A proper implementation of Callable type would share a lot of characteristics with the ExtendedArgSpec type that pyanalyze uses for type checking function calls, so #51 could also help make this issue easier.

Audit for places with suboptimal type inference

I wrote much of pyanalyze with a focus on avoiding false positives at all costs. That's nice, but it means that often pyanalyze is a bit too eager to just fall back to Any/UNRESOLVED_VALUE types and give up on finding errors. There's probably a bunch more lurking issues like #63 where we can fairly easily do better. Let's find more of them.

Some strategies:

Just grep for UNRESOLVED_VALUE in the pyanalyze codebase
Add a log that gets triggered whenever we infer UNRESOLVED_VALUE as the value for an AST node (by editing the NameCheckVisitor.visit method), then run pyanalyze on a fully typed codebase and see what happens.

'pip install pyanalyze' currently broken due to typeshed-client version

If I install pyanalyze via pip, it will install the latest typeshed-client (1.0.x) together with it. However, pyanalyze tries to pass a version argument to the Resolver constructor and that argument existed in 0.4.1 but doesn't exist anymore in the latest release.

In #110 the code of pyanalyze was updated to deal with this issue, but that fix is from a month ago while the last release of pyanalyze on PyPI is almost a year old. I think it would be good to either release more often, or change the README to suggest installing pyanalyze from Git instead of PyPI.

AssertionError: invalid argspec BoundMethodArgSpecWrapper

Traceback (most recent call last):
  File "/v/site-packages/pyanalyze/name_check_visitor.py", line 705, in visit
    ret = method(node)
  File "/v/site-packages/pyanalyze/name_check_visitor.py", line 2843, in visit_Subscript
    return_value, _ = self._get_argspec_and_check_call(
  File "/v/site-packages/pyanalyze/name_check_visitor.py", line 3372, in _get_argspec_and_check_call
    extended_argspec = self._get_argspec_from_value(callee_wrapped, node)
  File "/v/site-packages/pyanalyze/name_check_visitor.py", line 3465, in _get_argspec_from_value
    return self._get_argspec(callee_wrapped.val, node, name=name)
  File "/v/site-packages/pyanalyze/name_check_visitor.py", line 3495, in _get_argspec
    return self.arg_spec_cache.get_argspec(obj, name=name, logger=self.log)
  File "/v/site-packages/pyanalyze/arg_spec.py", line 1124, in get_argspec
    argspec = self._cached_get_argspec(obj, kwargs)
  File "/v/site-packages/pyanalyze/arg_spec.py", line 1136, in _cached_get_argspec
    extended = self._uncached_get_argspec(obj, kwargs)
  File "v/site-packages/pyanalyze/arg_spec.py", line 1161, in _uncached_get_argspec
    return BoundMethodArgSpecWrapper(argspec, KnownValue(obj.__self__))
  File "/v/site-packages/pyanalyze/arg_spec.py", line 142, in __init__
    assert isinstance(argspec, ExtendedArgSpec), "invalid argspec %r" % (argspec,)
AssertionError: invalid argspec BoundMethodArgSpecWrapper(
  argspec=ExtendedArgSpec(
    _has_return_value=False,
    arguments=[Parameter(default_value=<object object at 0x7fa6a833f990>, name='self', typ=None)],
    implementation=None,
    kwargs='kwargs',
    kwonly_args=None,
    name='GenericAlias',
    params_of_names={
      'self': Parameter(default_value=<object object at 0x7fa6a833f990>, name='self', typ=None),
      'args': Parameter(default_value=<object object at 0x7fa6a833f990>, name='args', typ=TypedValue(typ=<class 'tuple'>)),
      'kwargs': Parameter(default_value=<object object at 0x7fa6a833f990>, name='kwargs', typ=TypedValue(typ=<class 'dict'>))},
    return_value=UnresolvedValue(), starargs='args'),
    self_value=TypedValue(typ=<class 'types.GenericAlias'>))

Internal error: AssertionError(-"-) (code: internal_error)

I'm sorry the code is closed-source, I could try to track it down somehow...
I get this error using Python 3.9.0b1 but not 3.8

Optimize Union of Literals

During some profiling I noticed a lot of time being spent in MultiValuedValue.can_assign where the value contained a ton of string literal KnownValues. It would be more efficient if we put them all in a set. This could perhaps be implemented as a subclass of MultiValuedValue that would only hold KnownValues of hashable objects. Then it can internally keep a set of all values, and do fast checking for some common cases.

quora / pyanalyze Goto Github PK

pyanalyze's People

Contributors

Stargazers

Watchers

Forkers

pyanalyze's Issues

Recommend Projects

Recommend Topics

Recommend Org