Git Product home page Git Product logo

pyspark-examples's People

Contributors

haikaruna avatar nnkumar13 avatar sparkcodegeeks avatar sparkcodegeeks1 avatar wtysos11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyspark-examples's Issues

sparkbyexample website not working

Hi,

Thank you for creating such awesome tutorials. I just want to give you kind notice that sparkbyexample website is not working for some reason.

image

Please help to resolve this.

AttributeError: 'NoneType' object has no attribute 'rdd'

Line # 39
keysDF = df.select(explode(map_keys(df.properties))).distinct().show()
throws below error:

AttributeError Traceback (most recent call last)
in
----> 1 keysList = keysDF.rdd.map(lambda x:x[0]).collect()

AttributeError: 'NoneType' object has no attribute 'rdd'

error of running pyspark using jupyter notebook on Windows, Exception: Java gateway process exited before sending its port number

Hello, I am trying to run pyspark examples on local windows machine, with Jupyter notebook using Anaconda. I followed this tutorial. and did not find any issue during the installation. However, I still got the following error messages when running the following example

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.sql.functions import to_timestamp, current_timestamp
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, LongType

spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()





Exception Traceback (most recent call last)
in
5 from pyspark.sql.types import StructType, StructField, StringType, IntegerType, LongType
6
----> 7 spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\sql\session.py in getOrCreate(self)
226 sparkConf.set(key, value)
227 # This SparkContext may be an existing one.
--> 228 sc = SparkContext.getOrCreate(sparkConf)
229 # Do not update SparkConf for existing SparkContext, as it's shared
230 # by all sessions.

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\context.py in getOrCreate(cls, conf)
382 with SparkContext._lock:
383 if SparkContext._active_spark_context is None:
--> 384 SparkContext(conf=conf or SparkConf())
385 return SparkContext._active_spark_context
386

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\context.py in init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
142 " is not allowed as it is a security risk.")
143
--> 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
145 try:
146 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
329 with SparkContext._lock:
330 if not SparkContext._gateway:
--> 331 SparkContext._gateway = gateway or launch_gateway(conf)
332 SparkContext._jvm = SparkContext._gateway.jvm
333

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf, popen_kwargs)
106
107 if not os.path.isfile(conn_info_file):
--> 108 raise Exception("Java gateway process exited before sending its port number")
109
110 with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number



exponential smoothing in Pyspark

Hello, I have a pandas code for exponential smoothening. But I am not able to do the same in pyspark.
def exponential_smoothing(x, alpha):
result = []
for value in x:
if result:
smoothed_value = alpha * value + (1 - alpha) * result[-1]
else:
smoothed_value = value
result.append(smoothed_value)
return result
def apply_exponential_smoothing(df, alpha):
df['product_area_sales_value_N_mean_T'] = df.groupby(['area_id', 'product_id'])['product_area_sales_value_N_mean'].transform(lambda x: exponential_smoothing(x, alpha))
df['product_area_sales_unit_N_mean_T'] = df.groupby(['area_id', 'product_id'])['product_area_sales_unit_N_mean'].transform(lambda x: exponential_smoothing(x, alpha))
return df

tmp3 = apply_exponential_smoothing(tmp3, alpha=0.8)
this is the code. here in pyspark, I am not able to fetch previous row smoothen value. there is no such functionality in pyspark. Please suggest solution in spark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.