One simulated dataset and one real-world dataset will be used for this assignment.
Task 1
Build and test the program with a small simulated CSV file provided.
Calculate combinations of frequent businesses and users based on a support threshold.
Create baskets for each user containing the business ids reviewed by the user, and for each business containing the user ids that commented on the business.
Task 2
Generate a subset using the Ta Feng dataset with a structure similar to the simulated data.
Algorithm
Implement the SON Algorithm on top of the Spark Framework.
Find all possible combinations of frequent itemsets in any given input file within the required time.
Input Format
Case number: Integer specifying the case (1 for Case 1, 2 for Case 2).
Support: Integer defining the minimum count to qualify as a frequent itemset.
Input file path: Path to the input file including path, file name, and extension.
Output file path: Path to the output file including path, file name, and extension.