AMSTAT Consulting is dedicated to detecting and correcting corrupt or inaccurate records from a record set, table, or database. The process of data cleaning includes data auditing, workflow specification, workflow execution, post-processing, and controlling.
We can use popular methods. Those include parsing, data transformation, duplicate elimination, and statistical methods. By analyzing the data using the values of mean, standard deviation, range, and clustering algorithms, we can find values that are unexpected and thus erroneous.
We can examine any standardized residual greater than about 3 in absolute value, Hat element greater than 3p/n (p=k+1, k degrees of freedom), a Cook’s distance > 1, and Mahalanobis’s distance for case. We run Outlier Analysis such as a run-sequence plot, a scatter plot, a histogram, and a box plot.