PySpark Stubs

A collection of the Apache Spark stub files. These files were generated by stubgen and manually edited to include accurate type hints.

Tests and configuration files have been originally contributed to the Typeshed project. Please refer to its contributors list and license for details.

Motivation

Static error detection (see SPARK-20631)
Improved completion for chained method calls.

Installation and usage

Please note that the guidelines for distribution of type information is still work in progress (PEP 561 - Distributing and Packaging Type Information). Currently installation script overlays existing Spark installations (pyi stub files are copied next to their py counterparts in the PySpark installation directory). If this approach is not acceptable you can add stub files to the search path manually.

According to PEP 484:

Third-party stub packages can use any location for stub storage. Type checkers should search for them using PYTHONPATH.

Moreover:

Third-party stub packages can use any location for stub storage. Type checkers should search for them using PYTHONPATH. A default fallback directory that is always checked is shared/typehints/python3.5/ (or 3.6, etc.)

Please check usage before proceeding.

The package is available on PYPI:

pip install pyspark-stubs

and conda-forge:

conda install -c conda-forge pyspark-stubs

Depending on your environment you might also need a type checker, like Mypy or Pytype.

Atom - Requires atom-mypy or equivalent.
Jupyter Notebooks - It is possible to use magics to type check directly in the notebook.
PyCharm - Works out-of-the-box, with excellent code completion, though as of today (PyCharm 2018.2.4) built-in type checker is somewhat limited compared to MyPy and mypy-PyCharm-plugin can be an interesting alternative.
PyDev - Works out-of-the-box via built-in MyPy code analyzer (7.0.3+).
VIM / Neovim - Using vim-mypy, syntastic or Neomake.
Visual Studio Code - With Mypy linter.
Environment independent - Just use your favorite checker directly, optionally combined with tool like entr.

This package is tested against MyPy development branch and in rare cases (primarily important upstrean bugfixes), is not compatible with the preceding MyPy release.

PySpark Version Compatibility

Package versions follow PySpark versions with exception to maintenance releases - i.e. pyspark-stubs==2.3.0 should be compatible with pyspark>=2.3.0,<2.4.0. Maintenance releases (post1, post2, ..., postN) are reserved for internal annotations updates.

API Coverage

Module	Dynamically typed	Statically typed	Notes
pyspark	✔	✘
pyspark.accumulators	✘	✔
pyspark.broadcast	✔	✔	Mixed
pyspark.cloudpickle	✘	✘	Internal
pyspark.conf	✘	✔
pyspark.context	✘	✔
pyspark.daemon	✘	✘	Internal
pyspark.files	✘	✔
pyspark.find_spark_home	✘	✘	Internal
pyspark.heapq3	✘	✘	Internal
pyspark.java_gateway	✘	✘	Internal
pyspark.join	✘	✔
pyspark.ml	✔	✘
pyspark.ml.base	✘	✔
pyspark.ml.classification	✘	✔
pyspark.ml.clustering	✘	✔
pyspark.ml.common	✔	✔	Mixed
pyspark.ml.evaluation	✘	✔
pyspark.ml.feature	✘	✔
pyspark.ml.fpm	✘	✔
pyspark.ml.image	✘	✔
pyspark.ml.linalg	✘	✔
pyspark.ml.param	✘	✔
pyspark.ml.param._shared_params_code_gen	✘	✘	Internal
pyspark.ml.param.shared	✘	✔
pyspark.ml.pipeline	✘	✔
pyspark.ml.recommendation	✘	✔
pyspark.ml.regression	✘	✔
pyspark.ml.stat	✘	✔
pyspark.ml.tests	✘	✘	Tests
pyspark.ml.tuning	✘	✔
pyspark.ml.util	✘	✔
pyspark.ml.wrapper	✔	✔	Mixed
pyspark.mllib	✔	✘
pyspark.mllib.classification	✘	✔
pyspark.mllib.clustering	✘	✔
pyspark.mllib.common	✔	✘
pyspark.mllib.evaluation	✘	✔
pyspark.mllib.feature	✘	✔
pyspark.mllib.fpm	✘	✔
pyspark.mllib.linalg	✘	✔
pyspark.mllib.linalg.distributed	✘	✔
pyspark.mllib.random	✘	✔
pyspark.mllib.recommendation	✘	✔
pyspark.mllib.regression	✘	✔
pyspark.mllib.stat	✘	✔
pyspark.mllib.stat.KernelDensity	✘	✔
pyspark.mllib.stat._statistics	✘	✔
pyspark.mllib.stat.distribution	✘	✔
pyspark.mllib.stat.test	✘	✔
pyspark.mllib.tests	✘	✘	Tests
pyspark.mllib.tree	✘	✔
pyspark.mllib.util	✘	✔
pyspark.profiler	✘	✔
pyspark.resourceinformation	✘	✔
pyspark.rdd	✘	✔
pyspark.rddsampler	✘	✔
pyspark.resultiterable	✘	✔
pyspark.serializers	✔	✘
pyspark.shell	✘	✘	Internal
pyspark.shuffle	✘	✘	Internal
pyspark.sql	✔	✘
pyspark.sql.catalog	✘	✔
pyspark.sql.cogroup	✘	✔
pyspark.sql.column	✘	✔
pyspark.sql.conf	✘	✔
pyspark.sql.context	✘	✔
pyspark.sql.dataframe	✘	✔
pyspark.sql.functions	✘	✔
pyspark.sql.group	✘	✔
pyspark.sql.readwriter	✘	✔
pyspark.sql.session	✘	✔
pyspark.sql.streaming	✘	✔
pyspark.sql.tests	✘	✘	Tests
pyspark.sql.types	✘	✔
pyspark.sql.udf	✘	✔
pyspark.sql.utils	✔	✘
pyspark.sql.window	✘	✔
pyspark.statcounter	✘	✔
pyspark.status	✘	✔
pyspark.storagelevel	✘	✔
pyspark.streaming	✔	✘
pyspark.streaming.context	✘	✔
pyspark.streaming.dstream	✘	✔
pyspark.streaming.kinesis	✔	✘
pyspark.streaming.listener	✔	✘
pyspark.streaming.tests	✘	✘	Tests
pyspark.streaming.util	✔	✘
pyspark.taskcontext	✘	✔
pyspark.tests	✘	✘	Tests
pyspark.traceback_utils	✘	✘	Internal
pyspark.util	✔	✘
pyspark.version	✘	✔
pyspark.worker	✘	✘	Internal

Disclaimer

Apache Spark, Spark, PySpark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. This project is not owned, endorsed, or sponsored by The Apache Software Foundation.

libbytheoharis / pyspark-stubs Goto Github PK

pyspark-stubs's Introduction

PySpark Stubs

Motivation

Installation and usage

PySpark Version Compatibility

API Coverage

Disclaimer

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent