AMSTAT Consulting is dedicated to detecting and correcting corrupt or inaccurate records from a recordset, table, or database. The process of data cleaning includes data auditing, workflow specification, workflow execution, post-processing, and controlling. We can use popular methods. Those include parsing, data transformation, duplicate elimination, and statistical methods.
By analyzing the data using the values of mean, standard deviation, range, and clustering algorithms, we can find values that are unexpected and thus erroneous. We can examine any standardized residual greater than about 3 in absolute value, Hat element greater than 3p/n (p=k+1, k degrees of freedom), a Cook’s distance > 1, and Mahalanobis’s distance. We can run outlier analysis such as a run-sequence plot, a scatter plot, a histogram, and a box plot.
We can test reliability (such as Cronbach’s alpha, test-retest reliability, split-half reliability, and inter-rater reliability) and validity (such as content validity, construct validity, criterion validity, internal validity, and external validity).