Git Product home page Git Product logo

deepak12545 / data-quality-for-data-science-using-sas Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gerhard1050/data-quality-for-data-science-using-sas

0.0 1.0 0.0 1.29 MB

Considerations and best practices for data quality in the context of data science, analytics and machine learning and how to implement this with SAS Software. Companion and Download Site for the SAS Press Book "Data Quality for Analytics Using SAS" by Gerhard Svolba

SAS 100.00%

data-quality-for-data-science-using-sas's Introduction

Data-Quality-for-Data-Science-Using-SAS

Considerations and best practices for data quality in the context of data science, analytics and machine learning and how to implement this with SAS Software. Companion and Download Site for the SAS Press Book "Data Quality for Analytics Using SAS" by Gerhard Svolba. --> amazon.com

Main focus of this book --> Analytics and Data Science

My intention was not to write another book on data quality in general, but to write the first book that deals with data quality from the viewpoint of a data scientist, statistician, data miner, engineer, operations researcher, or other analytically minded problem-solver. Data quality is getting a lot of attention in the market. However, most of the initiatives, publications, and papers on data quality focus on classical data quality topics, such as elimination of duplicates, standardization of data, lists of values, value ranges, and plausibility checks. It will not be said here that these topics are not important for analytics; on the contrary, they build the foundation of data for analysis. However, there are many aspects of data that are specific to analytics. And these aspects are important to differentiate whether data are suitable for analysis or not.

The following points are the main focus of this book:

  • Usability and availability of data for analysis
  • Selection of the right data sources
  • Explaining why analytical methods need historic data and also historic snapshots of the data
  • Ensuring sufficient data quantity in terms of number of observations, number of event cases, and length of data history
  • Typical analyses: predictive analytics and statistics, including time series analysis and time series forecasting
  • Types of research are mainly observational studies, where existing data of the company or organization itself is used for the analysis (partly also controlled studies where the required data for the analysis is retrieved in a controlled way)
  • Data completeness in terms of profiling of the number and the structure of missing values, finding special types of missing values, replacement of missing values, advanced imputation methods
  • Consideration of the operational, data management, and analysis process point of view
  • Data relevancy with respect to the definition of the data and the requirements of the analysis
  • Data correction with focus on analytic profiling of outliers and complex data validation
  • Methods in SAS on how to perform advanced profiling of the data quality status and what SAS can offer for the improvement of data quality
  • Simulation studies and consequences of poor data quality for predictive modeling and time series forecasting

Main parts of this book

This book is divided into three main parts. The naming and ordering of these three parts and the respective chapters follow a process as well, but also consider a segmentation of the content of this book into well-defined parts and a good readable sequence of topics and chapters. The three parts of this book are:

  • Data Quality Defined
  • Data Quality—Profiling and Improvement
  • Consequences of Poor Data Quality—Simulation Studies

The data quality criteria that are defined in this book in chapters 3–9 are the following.

  • Chapter 3, “Data Availability,” starts with the question as to whether data are available in general.
  • Chapter 4, “Data Quantity,” examines whether the amount of data are sufficient for the analysis.
  • Chapter 5, “Data Completeness,” deals with the fact that available data fields may contain missing values.
  • Chapter 6, “Data Correctness,” checks whether the available data are correct with respect to its definition.
  • Chapter 7, “Predictive Modeling,” discusses special requirements of predictive modeling methods.
  • Chapter 8, “Analytics for Data Quality,” shows additional requirements of interdependences of analytical methods and the data.
  • Chapter 9, “Process Considerations for Data Quality,” finally shows the process aspect of data quality and also discusses aspects like data relevancy and possible alternatives.

data-quality-for-data-science-using-sas's People

Contributors

gerhard1050 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.