connorcallison / hadoop-internship-2017 Goto Github PK
View Code? Open in Web Editor NEWOver the course of my internship, I constructed a three node Hadoop cluster and tested ETL with Hive, SparkSQL, as well as pySpark. My goals were to document the install, test / review the technologies, and compare them to the current data warehousing solution.