interstat / statistics-contextualized Goto Github PK
View Code? Open in Web Editor NEWModels for the dissemination of contextualized statistical data
Models for the dissemination of contextualized statistical data
http://rdf.interstat.eng.it/graphs/sep/metadata
http://rdf.interstat.eng.it/graphs/sep
sep-test
& sep-staging
Description of the SEP (Support for Environment Policies) data is given here. Some of the fields correspond to artifacts defined in the Semantic Sensor Network Ontology. It would be useful to document these correspondences.
Implement the workflow specified by #14 in Prefect
The Data Cube model is presented in the specification.
Specify the data location, formats, ETL, etc.
An interesting exercice would be to take the JSON-LD produced from the Data Cube/Turtle, to convert it back to RDF by standard JSON-LD -> RDF transformation and to compare with the original graph.
Data extraction
Step1: data source website
Step2: Select DATA panel. Data are organized in a set of tables
Step3: Scroll to the requested table, named “Tabella 1 – PM10. Stazioni di monitoraggio: dati e parametri statistici per la valutazione della qualità dell'aria (2019)”
Step4: Download link available on the left bottom at the end of the table . Downloaded data are in xls format
The downloaded file is not compliant with the required Data Structure.
Data transformation
The downloaded file has the following Data Structure:
“Regione”,”Provincia”,”Comune”,”Nome della stazione Tipo di zona”,”Tipo di stazione”,”Giorni di superamento di 50 µg/m3”,”Valore medio annuo³ [µg/m³]”,”Rendimento [%]”,”Rispetta copertura minima”,”sufficiente distribuzione temporale nell'anno”,”numero_dati_validi”,”TIPO DI DATI 4”,”Codice zona”,”Nome zona”
Data Load
The transformed file has been uploaded into INTERSTAT GraphDB repository sep-test
GraphDB allows direct link to the resources by a permalink, but the raw data needs a little reworking to be accessed directly.
Further data files available
Same procedure can be used to import other data from Data Source Website
AMBIENT AIR QUALITY: NITROGEN DIOXIDE NO2
AMBIENT AIR QUALITY: TROPOSPHERIC OZONE O3
AMBIENT AIR QUALITY: PARTICULATE PM2.5
These files have not been uploaded to GraphDB repository yet
Transformation script in R language
processing_ETL_AIR.R.txt
Design and implement data workflow.
Implement the transformation task that formats the SEP Census QB DSD and CSV file as NGSI-LD
Finish implementation of the GF data workflow
The converter will soon be available as a Python module, to be integrated in the different pipelines.
This is a proposal to try model Air quality using existing vocabularies from SOSA for sensor description and AQD model for Air pollution
interstat.pdf
Yellow is related to SOSA concepts and green is related to AQD model.
Bear in mind that this is the ontological description of the domain of interest regarding Air Pollution
This model can be exported in OWL format with eddy.
Actual Data can be mapped by tools like monolith or juma, but some adjustments are needed to match with suggested smart model data structures
The link contains a list of properties and concepts that have been analyzed to solve the compatibility issue
Items highlighted in green have been added in the graphical representation while the yellow ones are proposed for revision.
Not highlighted Item have not been analyzed yet. Set of concepts like these pertains to the administrative elements of the sensor or to its physical environment that could be added to the concept model too.
lista di concetti interstat.docx
Except from the missing areas regarding sensor physical environment, the main mismatch in the data models is about the pollutant structure:
sensor data exhibits a vector of pollutant measurements that can be mapped to a given set of columns in a tabular representation, each of which represents concentration data, and thus is formatted as a float.
Our model represents a single observation as a couple of key/value set, so multiple measurements translate to multiple rows pertaining to the same observation
Is it possible to translate between the two models with a simple pivot/unpivot function.
After we reach consensus on a common data model, the next step is about mapping this model to actual data sources to produce the triplets for each source, but I'd like to discuss available datasets and the common model first.
Italian datasets, which are already compliant to AQD and SOSA models are available for reference.
Italian census data is currently produced manually. Explore possibilities of automation.
Description of the SEP (Support for Environment Policies) data is given here.
We have now a new link for the creation of a data model
https://smartdatamodels.org/index.php/draft-a-data-model/
simpler and more powerful
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.