A collection of the Apache Spark stub files. These files were generated by stubgen and manually edited to include accurate type hints.
Tests and configuration files have been originally contributed to the Typeshed project. Please refer to its contributors list and license for details.
Static error detection (see SPARK-20631)
Improved completion for chained method calls.
Please note that the guidelines for distribution of type information is
still work in progress (PEP 561 - Distributing and Packaging Type
Information). Currently
installation script overlays existing Spark installations (pyi
stub
files are copied next to their py
counterparts in the PySpark
installation directory). If this approach is not acceptable you can add stub
files to the search path manually.
According to PEP 484:
Third-party stub packages can use any location for stub storage. Type checkers should search for them using PYTHONPATH.
Moreover:
Third-party stub packages can use any location for stub storage. Type checkers should search for them using PYTHONPATH. A default fallback directory that is always checked is shared/typehints/python3.5/ (or 3.6, etc.)
Please check usage before proceeding.
The package is available on PYPI:
pip install pyspark-stubs
and conda-forge:
conda install -c conda-forge pyspark-stubs
Depending on your environment you might also need a type checker, like Mypy or Pytype.
- Atom - Requires atom-mypy or equivalent.
- Jupyter Notebooks - It is possible to use magics to type check directly in the notebook.
- PyCharm - Works out-of-the-box, with excellent code completion, though as of today (PyCharm 2018.2.4) built-in type checker is somewhat limited compared to MyPy and mypy-PyCharm-plugin can be an interesting alternative.
- PyDev - Works out-of-the-box via built-in MyPy code analyzer (7.0.3+).
- VIM / Neovim - Using vim-mypy, syntastic or Neomake.
- Visual Studio Code - With Mypy linter.
- Environment independent - Just use your favorite checker directly, optionally combined with tool like entr.
This package is tested against MyPy development branch and in rare cases (primarily important upstrean bugfixes), is not compatible with the preceding MyPy release.
Package versions follow PySpark versions with exception to maintenance releases - i.e. pyspark-stubs==2.3.0 should be compatible with pyspark>=2.3.0,<2.4.0. Maintenance releases (post1, post2, ..., postN) are reserved for internal annotations updates.
Module | Dynamically typed | Statically typed | Notes |
---|---|---|---|
pyspark | ✔ | ✘ | |
pyspark.accumulators | ✘ | ✔ | |
pyspark.broadcast | ✔ | ✔ | Mixed |
pyspark.cloudpickle | ✘ | ✘ | Internal |
pyspark.conf | ✘ | ✔ | |
pyspark.context | ✘ | ✔ | |
pyspark.daemon | ✘ | ✘ | Internal |
pyspark.files | ✘ | ✔ | |
pyspark.find_spark_home | ✘ | ✘ | Internal |
pyspark.heapq3 | ✘ | ✘ | Internal |
pyspark.java_gateway | ✘ | ✘ | Internal |
pyspark.join | ✘ | ✔ | |
pyspark.ml | ✔ | ✘ | |
pyspark.ml.base | ✘ | ✔ | |
pyspark.ml.classification | ✘ | ✔ | |
pyspark.ml.clustering | ✘ | ✔ | |
pyspark.ml.common | ✔ | ✔ | Mixed |
pyspark.ml.evaluation | ✘ | ✔ | |
pyspark.ml.feature | ✘ | ✔ | |
pyspark.ml.fpm | ✘ | ✔ | |
pyspark.ml.image | ✘ | ✔ | |
pyspark.ml.linalg | ✘ | ✔ | |
pyspark.ml.param | ✘ | ✔ | |
pyspark.ml.param._shared_params_code_gen | ✘ | ✘ | Internal |
pyspark.ml.param.shared | ✘ | ✔ | |
pyspark.ml.pipeline | ✘ | ✔ | |
pyspark.ml.recommendation | ✘ | ✔ | |
pyspark.ml.regression | ✘ | ✔ | |
pyspark.ml.stat | ✘ | ✔ | |
pyspark.ml.tests | ✘ | ✘ | Tests |
pyspark.ml.tuning | ✘ | ✔ | |
pyspark.ml.util | ✘ | ✔ | |
pyspark.ml.wrapper | ✔ | ✔ | Mixed |
pyspark.mllib | ✔ | ✘ | |
pyspark.mllib.classification | ✘ | ✔ | |
pyspark.mllib.clustering | ✘ | ✔ | |
pyspark.mllib.common | ✔ | ✘ | |
pyspark.mllib.evaluation | ✘ | ✔ | |
pyspark.mllib.feature | ✘ | ✔ | |
pyspark.mllib.fpm | ✘ | ✔ | |
pyspark.mllib.linalg | ✘ | ✔ | |
pyspark.mllib.linalg.distributed | ✘ | ✔ | |
pyspark.mllib.random | ✘ | ✔ | |
pyspark.mllib.recommendation | ✘ | ✔ | |
pyspark.mllib.regression | ✘ | ✔ | |
pyspark.mllib.stat | ✘ | ✔ | |
pyspark.mllib.stat.KernelDensity | ✘ | ✔ | |
pyspark.mllib.stat._statistics | ✘ | ✔ | |
pyspark.mllib.stat.distribution | ✘ | ✔ | |
pyspark.mllib.stat.test | ✘ | ✔ | |
pyspark.mllib.tests | ✘ | ✘ | Tests |
pyspark.mllib.tree | ✘ | ✔ | |
pyspark.mllib.util | ✘ | ✔ | |
pyspark.profiler | ✘ | ✔ | |
pyspark.resourceinformation | ✘ | ✔ | |
pyspark.rdd | ✘ | ✔ | |
pyspark.rddsampler | ✘ | ✔ | |
pyspark.resultiterable | ✘ | ✔ | |
pyspark.serializers | ✔ | ✘ | |
pyspark.shell | ✘ | ✘ | Internal |
pyspark.shuffle | ✘ | ✘ | Internal |
pyspark.sql | ✔ | ✘ | |
pyspark.sql.catalog | ✘ | ✔ | |
pyspark.sql.cogroup | ✘ | ✔ | |
pyspark.sql.column | ✘ | ✔ | |
pyspark.sql.conf | ✘ | ✔ | |
pyspark.sql.context | ✘ | ✔ | |
pyspark.sql.dataframe | ✘ | ✔ | |
pyspark.sql.functions | ✘ | ✔ | |
pyspark.sql.group | ✘ | ✔ | |
pyspark.sql.readwriter | ✘ | ✔ | |
pyspark.sql.session | ✘ | ✔ | |
pyspark.sql.streaming | ✘ | ✔ | |
pyspark.sql.tests | ✘ | ✘ | Tests |
pyspark.sql.types | ✘ | ✔ | |
pyspark.sql.udf | ✘ | ✔ | |
pyspark.sql.utils | ✔ | ✘ | |
pyspark.sql.window | ✘ | ✔ | |
pyspark.statcounter | ✘ | ✔ | |
pyspark.status | ✘ | ✔ | |
pyspark.storagelevel | ✘ | ✔ | |
pyspark.streaming | ✔ | ✘ | |
pyspark.streaming.context | ✘ | ✔ | |
pyspark.streaming.dstream | ✘ | ✔ | |
pyspark.streaming.kinesis | ✔ | ✘ | |
pyspark.streaming.listener | ✔ | ✘ | |
pyspark.streaming.tests | ✘ | ✘ | Tests |
pyspark.streaming.util | ✔ | ✘ | |
pyspark.taskcontext | ✘ | ✔ | |
pyspark.tests | ✘ | ✘ | Tests |
pyspark.traceback_utils | ✘ | ✘ | Internal |
pyspark.util | ✔ | ✘ | |
pyspark.version | ✘ | ✔ | |
pyspark.worker | ✘ | ✘ | Internal |
Apache Spark, Spark, PySpark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. This project is not owned, endorsed, or sponsored by The Apache Software Foundation.