Comments (5)
Hi @raviranak ,
What will you do if it's vanilla Spark? You can try the same thing. I think vanilla Spark conf directory does not contain hive-site.xml, either.
Are you using pyspark installed via pip or binary spark install? Have you tried putting hive-site.xml into the conf dir?
from raydp.
I have figured way for configuring the hive metastore with mysql , but getting an error
default_spark_conf = {
"spark.jars.packages": "mysql:mysql-connector-java:8.0.32",
"spark.jars": "/home/ray/.ivy2/jars/com.mysql_mysql-connector-j-8.0.32.jar",
"spark.hadoop.javax.jdo.option.ConnectionDriverName": "com.mysql.jdbc.Driver",
"spark.hadoop.javax.jdo.option.ConnectionURL": "jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true",
"spark.hadoop.javax.jdo.option.ConnectionUserName": "test",
"spark.hadoop.javax.jdo.option.ConnectionPassword": "",
"spark.sql.catalog.spark_catalog.type":"hive",
"spark.sql.catalogImplementation":"hive"
}
spark = raydp.init_spark(
app_name="Darwin_SPARK",
num_executors=1,
executor_cores=1,
executor_memory='4G',
enable_hive = True,
configs=default_spark_conf)
Getting error when trying to create a table like
`df = spark.createDataFrame([
(1, "Smith"),
(2, "Rose"),
(3, "Williams")
], ("id", "name"))
df.write.mode("overwrite").saveAsTable("employees12")`
Stack Trace
2023-05-22 10:45:09,515 WARN HiveMetaStore [Thread-5]: Retrying creating default database after error: Unexpected exception caught. javax.jdo.JDOFatalInternalException: Unexpected exception caught. at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1203) ~[javax.jdo-3.2.0-m3.jar:?] at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:814) ~[javax.jdo-3.2.0-m3.jar:?] at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:702) ~[javax.jdo-3.2.0-m3.jar:?] at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:521) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:550) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:405) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79) ~[hadoop-client-api-3.3.2.jar:?] at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:139) ~[hadoop-client-api-3.3.2.jar:?] at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431) ~[hive-metastore-2.3.9.jar:2.3.9] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_362] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_362] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_362] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:162) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70) ~[hive-exec-2.3.9-core.jar:2.3.9]
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.datanucleus.util.NucleusLogger at org.datanucleus.plugin.PluginRegistryFactory.newPluginRegistry(PluginRegistryFactory.java:58) at org.datanucleus.plugin.PluginManager.<init>(PluginManager.java:60) at org.datanucleus.plugin.PluginManager.createPluginManager(PluginManager.java:430) at org.datanucleus.AbstractNucleusContext.<init>(AbstractNucleusContext.java:85) at org.datanucleus.PersistenceNucleusContextImpl.<init>(PersistenceNucleusContextImpl.java:167) at org.datanucleus.PersistenceNucleusContextImpl.<init>(PersistenceNucleusContextImpl.java:156) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.<init>(JDOPersistenceManagerFactory.java:415) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:304) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:213) ... 80 more
from raydp.
Could you please help @kira-lin
from raydp.
Did same with SparkSession and its working
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Spark Examples")
.config("spark.hadoop.javax.jdo.option.ConnectionURL", "jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true")
.config("spark.hadoop.javax.jdo.option.ConnectionDriverName", "com.mysql.cj.jdbc.Driver")
.config("spark.hadoop.javax.jdo.option.ConnectionUserName", "test")
.config("spark.hadoop.javax.jdo.option.ConnectionPassword", "")
.config("spark.sql.catalogImplementation","hive")
.config("spark.sql.catalog.spark_catalog.type","hive")
.config("spark.jars","/home/ray/.ivy2/jars/com.mysql_mysql-connector-j-8.0.32.jar")
.config("spark.jars.packages", "mysql:mysql-connector-java:8.0.32")
.enableHiveSupport().getOrCreate()
from raydp.
This seems to be a bug , could you please look into it
from raydp.
Related Issues (20)
- rename ray JVM log file to avoid being monitored and polled
- Building Docker image to run Spark on Ray with RAPIDS HOT 2
- [Feature] Write your feature request here!🚀 HOT 3
- Fix security issue of protobuf < 3.19.5
- fix the workaround of working with ray 2.3.[0-1]
- Running RayDP on GPU machine stuck at GCS readiness HOT 1
- Add shim layer for Spark 3.4.x
- Support/Validate Ray 2.4.0 Release
- Performance benchmark on RayDP v.s. Spark HOT 1
- Data ownership transfer tests fail in Ray 2.4.0
- enable_hive not working with raydp HOT 24
- Threads hang on getting from object store when calling `to_spark`
- raydp.init_spark fails HOT 3
- Issues with running on Ray Client HOT 3
- raydp start occur null error HOT 1
- Executor lost triggered by JVM OOM can't recover and cause subsequent steps stuck HOT 12
- Support ray 2.3.1 HOT 3
- When submit a script another process start locally in the head node HOT 1
- Add the ability to set the owner of objects when calling _save_spark_df_to_object_store() HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from raydp.