This Spark Job is designed to support 8 Analytics query pattern against 6 given tables
- CHARGES
- DAMAGES
- ENDORSE
- PRIMARYPERSON
- RESTRICT
- UNITS
- MODEL (6 files each specific for reading data from 1 file type)
- RESOURCES (6 sample input csv files)
- OUTPUT (data from 8 analytics query saved in output)
- IMAGES (output of 8 query captured in snippet )
- main.py (Driver process to execute Spark Job)
The application includes functionality, to update input and output filepath using configs file :
-
Load data
- Update source_data_path in config.json {Default-> resources}
-
Export data
- The data can be exported in CSV format
- Update output_data_path in config.json {Default-> output}
- All the dependency are included as zip files
- Make data_reader as sources root
- python3 main.py