Git Product home page Git Product logo

statistics-contextualized's People

Contributors

flo7894 avatar francescadag avatar franckco avatar nicolaval avatar pafrance avatar romaintailhurat avatar thomaspo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

statistics-contextualized's Issues

Sep data update

  • Move metadata into dedicated graph: http://rdf.interstat.eng.it/graphs/sep/metadata
  • Rename data graph: http://rdf.interstat.eng.it/graphs/sep
  • Reorganize graphdb repositories : sep-test & sep-staging
  • Fix observation filtering into data pipeline

Loop back Data Cube -> NGSI-LD -> RDF

An interesting exercice would be to take the JSON-LD produced from the Data Cube/Turtle, to convert it back to RDF by standard JSON-LD -> RDF transformation and to compare with the original graph.

SEP data workflow: Italian Air Pollution datasets

Data extraction
Step1: data source website
Step2: Select DATA panel. Data are organized in a set of tables
Step3: Scroll to the requested table, named “Tabella 1 – PM10. Stazioni di monitoraggio: dati e parametri statistici per la valutazione della qualità dell'aria (2019)”
Step4: Download link available on the left bottom at the end of the table . Downloaded data are in xls format
The downloaded file is not compliant with the required Data Structure.

Data transformation
The downloaded file has the following Data Structure:
“Regione”,”Provincia”,”Comune”,”Nome della stazione Tipo di zona”,”Tipo di stazione”,”Giorni di superamento di 50 µg/m3”,”Valore medio annuo³ [µg/m³]”,”Rendimento [%]”,”Rispetta copertura minima”,”sufficiente distribuzione temporale nell'anno”,”numero_dati_validi”,”TIPO DI DATI 4”,”Codice zona”,”Nome zona”

  1. Data need to be filtered in order to be compliant to the requested Data Structure,
  2. NUTS3 variable has been added through a transformation from municipality_id Variable, using data from ISTAT LAU archive
  3. Provided metadata for NUTS3 transformation need to be downloaded and merged.
    Metadata are referenced in a time series and Variable regarding year 2019 has been used in the script.
  4. Metadata regarding pollutant type, data reference time and aggregation type have been added in the datafile.

Data Load
The transformed file has been uploaded into INTERSTAT GraphDB repository sep-test
GraphDB allows direct link to the resources by a permalink, but the raw data needs a little reworking to be accessed directly.

Further data files available
Same procedure can be used to import other data from Data Source Website
AMBIENT AIR QUALITY: NITROGEN DIOXIDE NO2
AMBIENT AIR QUALITY: TROPOSPHERIC OZONE O3
AMBIENT AIR QUALITY: PARTICULATE PM2.5
These files have not been uploaded to GraphDB repository yet

Transformation script in R language
processing_ETL_AIR.R.txt

Air Quality ontology and data models

This is a proposal to try model Air quality using existing vocabularies from SOSA for sensor description and AQD model for Air pollution
interstat.pdf
Yellow is related to SOSA concepts and green is related to AQD model.
Bear in mind that this is the ontological description of the domain of interest regarding Air Pollution
This model can be exported in OWL format with eddy.

Actual Data can be mapped by tools like monolith or juma, but some adjustments are needed to match with suggested smart model data structures
The link contains a list of properties and concepts that have been analyzed to solve the compatibility issue
Items highlighted in green have been added in the graphical representation while the yellow ones are proposed for revision.
Not highlighted Item have not been analyzed yet. Set of concepts like these pertains to the administrative elements of the sensor or to its physical environment that could be added to the concept model too.
lista di concetti interstat.docx

Except from the missing areas regarding sensor physical environment, the main mismatch in the data models is about the pollutant structure:
sensor data exhibits a vector of pollutant measurements that can be mapped to a given set of columns in a tabular representation, each of which represents concentration data, and thus is formatted as a float.
Our model represents a single observation as a couple of key/value set, so multiple measurements translate to multiple rows pertaining to the same observation
Is it possible to translate between the two models with a simple pivot/unpivot function.

After we reach consensus on a common data model, the next step is about mapping this model to actual data sources to produce the triplets for each source, but I'd like to discuss available datasets and the common model first.
Italian datasets, which are already compliant to AQD and SOSA models are available for reference.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.