Git Product home page Git Product logo

awesome-official-statistics-software's Introduction

Awesome official statistics software Awesome

An awesome list of open source software for official statistics.

An item on this list is awesome because it is

  1. free, open source, and available for download and
  2. used in the production of official statistics by at least one institute or provides access to official statistics.

We prefer software that is easy to install and use, has at least one stable version, and is actively maintained. Contributions welcome.

News

Visuals

GSBPM use clickable version word cloud Access to offstats


Design frame and sample (GSBPM 2.1)


  • R package SamplingStrata. Optimal Stratification of Sampling Frames for Multipurpose Sampling Surveys.

  • R package R2BEAT. Multistage Sampling Allocation and PSU Selection.

Design variable descriptions (GSBPM 2.2)


  • Excel SDMX_Matrix_Generator. Excel-based visual SDMX artefact authoring tool which generates SDMX-ML for upload into an SDMX repository such as a registry. By OECD.

Sampling (GSBPM 4.1)


  • R package sampling. Several algorithms for drawing survey samples, including a variety of unequal probabiltiy sampling designs (high entropy, systematic, Rao-Sampford, etc.), and calibrating design weights.

  • R package surveyplanning. Tools for sample survey planning, including sample size calculation, estimation of expected precision for the estimates of totals, and calculation of optimal sample size allocation.

  • R package PracTools. Functions and datasets related to Valliant, Dever, and Kreuter (2018 2nd ed), Practical Tools for Designing and Weighting Survey Samples.

  • R package prnsamplr. Coordinated stratified sampling using permanent random numbers (PRN's). Supports simple random sampling and probability-proportional-to-size sampling and includes a function for transforming PRN's to control the sample overlap.

Scraping for Statistics (GSBPM 4.3)


  • Java application GUrlSearcher. An application for searching Urls via Google. Can be used to find websites of enterprise. By ISTAT.

  • Java application Url_scorer. Gives a rule based score to scraped documents in a Solr database. By ISTAT.

  • Node.js tool RobotTool. A tool for checking (price) changes on the web. By Statistics Netherlands.

  • Python Social-Media-Presence. A script for detecting social media presence on enterprises websites. By Statistics Poland.

  • Python Sustainability_Reporting. A script for measuring sustainability reporting from enterprises websites. By ONS.

  • Python urlfinding. Software for finding websites of enterprises using a search engine and machine learning. By Statistics Netherlands.

Process (GSBPM 5)


  • R package blaise. Reading and writing Files in the Blaise Format from R. By Statistics Netherlands.

  • Java application Java-VTL. A partial implementation of the Validation Transformation Language, based on the VTL 1.1 draft specification. By Statistics Norway.

  • Java application ADaMSoft. implements procedures for data analysis, data, web and text mining. Also contains procedures for data validation and imputation, based on the principle of Fellegi and Holt.

  • R package dcmodify. Derive new variables or modify data using externally defined data modification rules.

Data integration and record linkage (GSBPM 5.1)


  • R package reclin2. Functions to assist in performing probabilistic record linkage and deduplication: generating pairs, comparing records, em-algorithm for estimating m- and u-probabilities, forcing one-to-one matching. Can also be used for pre- and post-processing for machine learning methods for record linkage.

  • R package RecordLinkage. Implementation of the Fellegi-Sunter method for record linkage.

  • R package StatMatch. Statistical Matching or Data Fusion

  • R package fastLink. Implements a Fellegi-Sunter probabilistic record linkage model that allows for missing data and the inclusion of auxiliary information. Documentation.

  • R packages stringdist. Approximate string matching. Supports various string distances (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) and heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well.

  • R packages fuzzyjoin. Join tables based on exact or similar matches. Allows for matching records based on inaccurate keys.

  • R Java MySQL RELAIS. A toolkit providing techniques for dealing with record linkage. The purpose is to identify the same real world entity that can be differently represented in data sources. By Istat.

  • R package XBRL. Extraction of Business Financial Information from XBRL.

Statistical data editing and imputation (GSBPM 5.3 | 5.4)


  • R package validate. Data validation checks such as on length, format, range, missingness, availability, uniqueness, multivariate checks, statistical checks and checks on SDMX codelist. See Cookbook. By Statistics Netherlands.

  • R package validatedb. validate on a SQL database, providing validation on bigger data.

  • R package validatetools. Checking validation rules on redundancies and contradictions. Useful if your validation rule set grows in complexity.

  • R package errorlocate. Error localisation based on Fellegi and Holt, supporting categorical and/or numeric data, linear equalities, inequalities and conditional rules and MIP-based error localization.

  • R package VIM. Visualisation and imputation of missing data. Imputation using (robust) linear regression methods or donor-based methods (kNN, hot-deck).

  • R package simputation. Front-end to (combinations of) advanced imputation methods following the tidy tools manifesto. Supports regression (standard, M-estimation, ridge/lasso/elasticnet), hot-deck methods (powered by VIM), randomForest, EM-based, and iterative randomForest imputation. Reuse of fitted models and definition of simple user-defined methods are supported as well.

  • R package SeleMix. Detection of outliers and influential errors using a latent variable model for selective editing.

  • R package univOutl. Various methods for detecting univariate outliers.

  • R package extremevalues. Detection of univariate outliers based on modeling the bulk distribution.

  • R package deductive. Deductive correction and imputation using edit rules and (partially) complete data.

  • R package rspa. Adapt Numerical Records to Fit (in)Equality Restrictions.

  • R package mice. Multiple imputation by chained equations, aka fully conditional specification, accompanied by van Buuren (2018) Flexible Imputation of Missing Data.

Estimation and weighting (GSBPM 5.6 | 5.7)


  • R package survey. Weighting and estimation for complex survey designs, possibly under nonresponse. Also computes estimator variance. See also R package srvyr for integration with tidy tools.

  • R package hbsae. Small area estimation based on hierarchical Bayesian models.

  • R package mcmcsae. Small area estimation based on Markov Chain Monte Carlo simulation.

  • R package rsae. Small area estimation based on (robust) maximum likelihood estimation.

  • R package CalibrateSSB. Calculate weighs and estimates for panel data with non-response.

  • R package PriceIndices. Calculating Bilateral and Multilateral Price Indexes.

  • R package ReGenesees. Like survey, but with specific features (e.g. partitioned calibration) that make it fit for processing large-scale surveys. Implements different estimators with sampling errors, and ships with a dedicated GUI (package ReGenesees.GUI).

  • R package vardpoor. Linearization of non-linear statistics and variance estimation.

  • R package convey. Variance estimation on indicators of income concentration and poverty using complex sample survey designs. Wrapper around the survey package.

  • R package icarus. Provides detailed tools for performing calibration and several of its varitations, in a familiar setting for Calmar users in SAS.

  • R package gustave. Provides a toolkit for analytical variance estimation in survey sampling.

  • R package rtrim. Trends and Indices for Monitoring data. Provides tools for estimating animal/plant populations based on site counts, including occurrence of missing data.

  • R package surveysd. Calibration, bootstrap and error estimation for complex surveys.

  • R package inca. Calibration weighting with integer weights.

  • Fortran X-13ARIMA-SEATS. seasonal adjustment software, by Census Bureau produced maintained and distributed by the US Census Bureau.

  • R package seasonal. Interface to the `X13-ARIMA-SEATS` program from R with a very nice shiny GUI.

  • R package x12. Alternative interface to the `X13-ARIMA-SEATS` program from R with a focus on batch processing time series.

  • Java application JDemetra+. The seasonal adjustment software officially recommended for the European Statistical System.

  • R package RJDemetra. R interface to JDemetra+.

  • R package tempdisagg. Methods for temporal disaggregation and interpolation of time series.

Output validation (GSBPM 6.2)


  • R package validate. Rule management and data validation.

Statistical disclosure control (GSBPM 6.4)


  • Java and C++ application Mu-ARGUS. Tool to create safe micro-data files. See also the casc page.

  • Java C++ Fortran and Delphi application T-ARGUS. Tool to protect statistical tables. See also the casc page.

  • R package sdcMicro. Disclosure control for statistical microdata.

  • R package sdcTable. Disclosure control for tabulated data.

  • R package easySdcTable. Provides an interface to the package sdcTable.

  • R package GaussSuppression. Tabular data suppression using the Gaussian elimination secondary suppression algorithm.

  • R package sdcHierarchies. Allows to generate, modify and export nested hierarchies.

  • R package SmallCountRounding. Can be used to protect frequency tables by rounding necessary inner cells so that cross-classifications to be published are safe.

  • R package simPop. Simulation of synthetic populations from census/survey data considering auxiliary information.

  • R package sdcSpatial. Create privacy protected density maps from location data. Includes visual sensitivity assessment and several protection methods.

  • R package synthpop. Produce synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis.

Statistical Dissemination (GSBPM 7.2)


  • Java application SDMX_Converter. Converts between SDMX versions and formats like CSV, FLR etc. By Eurostat.

  • Java application SDMX-RI. Framework for disseminating data in SDMX webservices. By Eurostat.

  • C# HTML5 JavaScript PxStat. Data Dissemination Management System for creating and publishing Statistics over the Web with focus on Accessibility and LOD. By CSO.

  • C# VB.NET ASP.NET PxWeb_. Web application for dissemination of statistical tables in Px format or SQL data in the Nordic Data Model.

  • Node.js and other .Stat_Suite. An SDMX-based platform to build tailored data portals, topical or regional data explorers, or lightweight reporting platforms. Documentation. By SIS-CC.

  • JSON SDMX-JSON. JSON variant of SDMX. Works together with the SDMX-REST API.

  • JSON JSON-stat. Simple lightweight JSON format for statistical dissemination. Based on a Cube model with dimensions organised in categories.

  • R package cols4all. Color palettes generation and analysis with support for color-blind-friendliness and fairness. Supports categorical, sequential, diverging and bivariate color palettes and colors for missing values.

  • R package tabplot. Compare up to about 10-20 variables simultaneously using a tableplot. See also tabplotd3 for a web-based GUI. Note: 2022-03-03: Temporarily not on Cran but expected to be back in 2022.

  • R package tmap. Thematic geographic maps, including bubble charts, choropleths, and more.

  • R package oceanis. To create maps for statistical analysis such as proportional circles, chroropleth, typology and flows. By INSEE.

  • GeoJSON/TopoJSON cartomap. A (growing) list of simplified maps useful for web cartography for World, Europe and countries.

  • GeoJSON/TopoJSON Nuts2json. Simplified geometries for web maps of European NUTS regions. By Eurostat.

  • R package treemap. Space-filling visualisation of hierarchical data.

  • R package btb. Conservative kernel smoothing method for spatial analysis.

  • Node.js StatMiner. Experimental visualization framework from Statistics Netherlands.

  • JavaScript Visual. Javascript library for data visualization that encapsulates complexity supporting chart types such as bar, rank, pie, time series bar/line, population pyramid, scatterplots and Choropleth maps. By Idescat.

  • R package PantaRhei. Sankey plots suited for (circulair) economical systems such as energy systems, material flow accounts and water accounts. Supports loops.

Access to official statistics (GSBPM 7.4)


  • R package rsdmx. Access to data or metadata from statistical organisations that support SDMX webservices. The package contains a list of SDMX access points of various national and international statistical institutes.

  • R package readsdmx. Read SDMX into dataframes from local SDMX-ML file or web-service. Parts in C++. By OECD.

  • Python sdmx. Python library that implements SDMX 2.1 to explore data from SDMX data providers, parse data and metadata and convert it into Pandas objects.

  • R package rjstat. Read and write data sets in the JSON-stat format.

  • Python package pyjstat. Read and write JSON-stat.

  • Java application json-stat.java. Read and write JSON-stat. By Statistics Norway.

  • R package oecd. Search and Extract Data from the OECD

  • R package sorvi. Finnish Open Government Data Toolkit

  • R package eurostat. Tools to download data from the Eurostat database together with search and manipulation utilities.

  • R package restatapi. Search and retrieve data from Eurostat database, by Eurostat.

  • R package acs. Download, Manipulate, and Present American Community Survey and Decennial Data from the US Census.

  • R package inegiR. Access to data published by INEGI, Mexico's official statistics agency.

  • R package cbsodataR. Access to Statistics Netherlands' (CBS) open data API from R.

  • R package cbsodata4. Access to OData4 interface of Statistics Netherlands' (CBS) open data.

  • Node.js package cbsodata.js. Access to Statistics Netherlands' (CBS) open data API from js.

  • Python package cbsodata.py. Access to Statistics Netherlands' (CBS) open data API from Python.

  • R package censusapi. A wrapper for the U.S. Census Bureau APIs that returns data frames of Census data and metadata.

  • R package nsoApi. Builds on other packages to access data from official statistics and tries to harmonize the API.

  • R package CANSIM2R. Extract CANSIM (Statistics Canada) tables and transform them into readily usable data.

  • Python package pyscbwrapper. Access to the open data API of the Swedish Statistical Institute

  • R package pxweb. Generic interface for the PX-Web/PC-Axis API used by many National Statistical Agencies.

  • R package PxWebApiData. Easy API access to e.g. Statistics Norway, Statistics Sweden and Statistics Finland.

  • R package pxR. Functions for reading and writing PC-Axis files.

  • R package rdbnomics. Access to the DB.nomics database which provide macroeconomic data from 38 official providers such as INSEE, Eurostat, World bank, etc.

  • R package readabs. Download data from the Australian Bureau of Statistics.

  • R package statcanR. An R connection to Statistics Canada's Web Data Service. Open economic data (formerly CANSIM tables) are accessible as a data frame in the R environment.

  • R package cdlTools. Downloads USDA National Agricultural Statistics Service (NASS) cropscape data for a specified state.

  • Java application SDMX_Connectors. Browse SDMX data providers, build your queries and get data directly in your favourite tool (R, SAS, Matlab, Stata and Excel). By Banca d'Italia.

  • Node.js package sdmx-rest. This library allows to easily create and execute SDMX REST queries from a JavaScript client application.

  • R package csodata. Download data from Central Statistics Office (CSO) of Ireland.

  • R package iriR. Client for the European Commission’s Industrial R&D Investment Scoreboard (IRI)

  • R package czso. Access open data from the Czech Statistical Office.

  • R package ipumsr. Access to the Integrated Public Use Microdata Series archive ipums.org (international censuses, harmonized U.S. data).

  • R package eph. Tools to download and manipulate the EPH-INDEC from Argentina (EPH is the Spanish acronym for Permanent Household Survey)

  • R package blsR. Make Requests to the BLS (Bureau of Labor Statistics) API

  • R package danstat. R Client for the Statistics Denmark Databank API

Other lists

Contributions

Awesome contributions are welcome, here are ways to do it:

  • The GitHub way: send us a pull request on data/software.yaml.
  • Add an item to the issue tracker issue tracker. (you need a GH account)
  • Send an e-mail to mark dot vanderloo at gmail dot com or olav dot tenbosch at gmail dot com or tweet @olavtenbosch or @markvdloo

Wear the badge. Authors of software that is mentioned on this list gain the right to wear the mentioned in awesome badge on their website or GH repository. Please use the following code (or equivalent) to do so for your project.

[![Mentioned in Awesome Official Statistics ](https://awesome.re/mentioned-badge.svg)](http://www.awesomeofficialstatistics.org)

License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

awesome-official-statistics-software's People

Contributors

olavtenbosch avatar markvanderloo avatar github-actions[bot] avatar alexkowa avatar djvanderlaan avatar djhurio avatar edwindj avatar haroine avatar dpprdan avatar cutterkom avatar jgaffuri avatar bernhard-da avatar sosna avatar skolenik avatar petrbouchal avatar spekulatius avatar pachevalier avatar pablotis avatar berenz avatar olangsrud avatar dickoa avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.