Data pre-processing is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to data mining and machine learning projects. Data-gathering methods are often loosely controlled, resulting in out-of-range values, impossible data combinations (e.g., Sex: Male, Pregnant: Yes), missing values. Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus, the representation and quality of data is first and foremost before running an analysis.
If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. Data preparation and filtering steps can take considerable amount of processing time. Data pre-processing includes cleaning, normalization, transformation, feature extraction and selection. The product of data pre-processing is the final training set.
The data-preprocessing routines involve standardization (stndze), graphical summary(gs), skewness, kurtosis, creating dummy variables, box cox transformation etc.
Standardization - Standardize the raw feature vectors from the training data.
Deviations - Calculate the deviation of a particular value from the average.
Indicator Variables - Create Indicator variables representing the training data.
Skewness - Compute the skewness of a sample within a training set.
Kurtosis - Compute the kurtosis of a sample within a training set.
Box-cox Transformation - Transform the training vectors using Box-cox.
Poisson Transformation - Transform the training vectors using Poihttps://github.com/serendio-labs/data-preprocessing-python/wiki/Box-Cox-Transformationsson.
Proportional Transformation - Transform the training vectors with Proportional transformation.
Graphical Summary - Get a pictorial representation of the training data.
Download or pull the data-preprocessing package https://github.com/serendio-labs/data-preprocessing-python.git into the appropriate location, then refer to each of the above links to work with the respective utility.