Hello, I hope you are doing well. At my company, we are in the process of adopting

Issue With Direct Migration from Hive Metastore to Glue Data Catalog about aws-glue-samples HOT 6 CLOSED

aws-samples commented on July 3, 2024

Issue With Direct Migration from Hive Metastore to Glue Data Catalog

from aws-glue-samples.

Comments (6)

kjudahlookout commented on July 3, 2024

Please let me know what more info I can provide to debug this issue. Thanks

from aws-glue-samples.

dichenli commented on July 3, 2024

Hi Kshitij,

Thank you for using the Glue service. Here are some of my suggestions:

Try to run Hive Metastore -> S3 migration as well, see if any data is generated in the S3 bucket. This helps to narrow down the search space for the error.
The script hive_metastore_migration.py can migrate your Hive metastore to S3 or a local file system. It is completely open source Spark 2.1 code, you can run the script on your own Spark cluster or a local Spark installation on your laptop, as long as it's connectable to your Hive metastore. Debug locally using this script might be helpful.
If you are familiar with Spark programming or Python, you may manually add "print()" debugging statement in the script, then run the script on Glue. It will print to the job output logs in CloudWatch.
Double check if the data is showing up on Glue console. If it shows up on Glue console but not Athena console, you may need to migrate your Athena catalog to Glue: http://docs.aws.amazon.com/athena/latest/ug/glue-upgrade.html
In your use case, if you are looking for a solution to sync from Hive metastore to Glue DataCatalog in near-realtime, you may consider calling Glue DataCatalog APIs directly on a stream processing workflow. You may find Glue DataCatalog APIs documentation here: http://docs.aws.amazon.com/glue/latest/dg/aws-glue-api.html.

from aws-glue-samples.

kjudahlookout commented on July 3, 2024

Hello,
Thank you very much for your reply and sorry about the late reply. I was able add some print statements to these Python scripts, run them in Glue and able to see dataframes that is being processed by these scripts. As far as I can see, i didnt see any issue with the dataframes themselves, i was able to print their count and first 10 rows of dataframes for databases and tables and I can see all the data even after the step in which Spark dataframes get converted to Glue's dynamic frames.

However, the last step of loading the dynamic frames into glue catalog seems to be (silently) failing for me. Despite printing all data successfully in logs, the data is not getting loaded into glue by glue APIs.

I am wondering if this is something to do with my old version of hive metastore. My hive metastore version is 1.1.0 which is almost 3 years old. Is it possible that these scripts may have been developed and tested with newer versions of Hive metastore, and may not support old versions such as 1.1.0 as in my case? Pls let me know.

Thanks,
Kshitij

from aws-glue-samples.

dichenli commented on July 3, 2024

Hmmm... this is really weird. Have you tried to migrate Hive metastore data into S3? Could you send some sample data in S3 to my email address? Make sure to mask the confidential fields in the data. I can try to reproduce the issue, and debug from there.

…

On Mon, Dec 11, 2017 at 4:40 PM, kjudahlookout ***@***.***> wrote: Hello, Thank you very much for your reply and sorry about the late reply. I was able add some print statements to these Python scripts, run them in Glue and able to see dataframes that is being processed by these scripts. As far as I can see, i didnt see any issue with the dataframes themselves, i was able to print their count and first 10 rows of dataframes for databases and tables and I can see all the data even after the step in which Spark dataframes get converted to Glue's dynamic frames. However, the last step of loading the dynamic frames into glue catalog seems to be (silently) failing for me. Despite printing all data successfully in logs, the data is not getting loaded into glue by glue APIs. I am wondering if this is something to do with my old version of hive metastore. My hive metastore version is 1.1.0 which is almost 3 years old. Is it possible that these scripts may have been developed and tested with newer versions of Hive metastore, and may not support old versions such as 1.1.0 as in my case? Pls let me know. Thanks, Kshitij — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIXWASqUdSJou4QB4XlM-M0vHt4b_u2xks5s_ctvgaJpZM4Q7prM> .

from aws-glue-samples.

dichenli commented on July 3, 2024

Also, you may send me your AWS account ID and job run ID through email if you feel comfortable. I can use it to investigate our service log and see if anything went wrong.

…

On Mon, Dec 11, 2017 at 6:26 PM, Dichen Li ***@***.***> wrote: Hmmm... this is really weird. Have you tried to migrate Hive metastore data into S3? Could you send some sample data in S3 to my email address? Make sure to mask the confidential fields in the data. I can try to reproduce the issue, and debug from there. On Mon, Dec 11, 2017 at 4:40 PM, kjudahlookout ***@***.***> wrote: > Hello, > Thank you very much for your reply and sorry about the late reply. I was > able add some print statements to these Python scripts, run them in Glue > and able to see dataframes that is being processed by these scripts. As far > as I can see, i didnt see any issue with the dataframes themselves, i was > able to print their count and first 10 rows of dataframes for databases and > tables and I can see all the data even after the step in which Spark > dataframes get converted to Glue's dynamic frames. > > However, the last step of loading the dynamic frames into glue catalog > seems to be (silently) failing for me. Despite printing all data > successfully in logs, the data is not getting loaded into glue by glue APIs. > > I am wondering if this is something to do with my old version of hive > metastore. My hive metastore version is 1.1.0 which is almost 3 years old. > Is it possible that these scripts may have been developed and tested with > newer versions of Hive metastore, and may not support old versions such as > 1.1.0 as in my case? Pls let me know. > > Thanks, > Kshitij > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#9 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AIXWASqUdSJou4QB4XlM-M0vHt4b_u2xks5s_ctvgaJpZM4Q7prM> > . >

from aws-glue-samples.

kjudahlookout commented on July 3, 2024

This issue seems to have been resolved for me with using --region option in the migration scripts. Thanks Dichen Li for the help.

from aws-glue-samples.

Issue With Direct Migration from Hive Metastore to Glue Data Catalog about aws-glue-samples HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent