noyo25 / clusteringtableheaders Goto Github PK
View Code? Open in Web Editor NEWThis project aims at creating an RDF schema given a list of column headers of a tabular dataset. It first transforms the given header list into meaningful vectors, then it applies a distance-based Clustering algorithm such that it maximizes the similarity among headers inside one cluster. The user has the facility to move items from one cluster to another and merge among some clusters. The system can suggest cluster names based on the commonality among its members. If no common word found, it will produce Unknown. Afterwards, the user can rename the automatically generated names. Finally, it can expose the resultant clusters in an RDF format.