Our Expertise
AMSTAT Consulting uses machine learning that is an artificial intelligence technology that provides systems with the ability to learn without being explicitly programmed. Our clients cite these reasons for choosing to work with us:
 Everyone on our team is a PhDtrained scientist with experience deploying machine learning tools for many different problems and industries.
 We have Ph.D. in statistics at leading universities including Harvard, Stanford, and Columbia.
 We use modern data science and machine learning tools that make predictive models, classification, segmentation, and natural language processing easier and more powerful than ever before.
 Our data wrangling expertise makes quick work of any dataset to deliver finished projects fast.
 Our experience enables us to pick the right tool for the job, whether it’s a deep convolutional neural network or a linear model.
 Our application development experience allows us to tightly integrate predictive models with your app, dashboard, reporting, API, and other components of your infrastructure.
PhD in Statistics at Leading Universities including Harvard, Stanford, and Columbia
All of our principals have PhD in statistics at leading universities including Harvard, Stanford, and Columbia.PhD in Statistics at Leading Universities including Harvard, Stanford, and Columbia
All of our principals have PhD in statistics at leading universities including Harvard, Stanford, and Columbia.
Nationally Renowned Machine Learning Experts
They include nationally renowned machine learning experts.Nationally Renowned Machine Learning Experts
They include nationally renowned machine learning experts.
Extensive Background in Statistics
They have extensive backgrounds in statistics and over 100 years of practical experience in quantitative methods.Extensive Backgrounds in Statistics
They have extensive backgrounds in statistics and over 100 years of practical experience in quantitative methods.
Deep Knowledge of Advanced Machine Learning Algorithms
We utilize deep knowledge of advanced machine learning algorithms.Deep Knowledge of Advanced Machine Learning Algorithms
We utilize deep knowledge of advanced machine learning algorithms.
AMSTAT Consulting’s Services
Four different types of machine learning algorithms are available that can be organized into a taxonomy based on the desired outcome of the algorithm or the type of input available for training the machine. We can use machine learning:
Supervised Learning
 Most machine learning is supervised learning.
 Supervised learning algorithms are “trained” using labeled examples where the desired output is known.
 It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process
 We know the correct answers, the algorithm iteratively makes predictions on the training data and is corrected by the teacher.
 Learning stops when the algorithm achieves an acceptable level of performance.
 Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.
 The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.
 Supervised learning problems can be further grouped into regression and classification problems.
 Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”.
 Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
We can:
 Determine the input feature representation of the learned function.
 The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. The number of features should not be too large, because of the curse of dimensionality; but should contain enough information to accurately predict the output.
 Determine the structure of the learned function and corresponding learning algorithm
 Complete the design
 Run the learning algorithm
 Some supervised learning algorithms require the user to determine certain control parameters. These parameters may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via crossvalidation.
 Evaluate the accuracy of the learned function
We can:
 Use supervised learning in applications that use historical data to predict likely future events
 Use supervised learning techniques to make best guess predictions for the unlabeled data
 Feed the data back into the supervised learning algorithm as training data and use the model to make predictions on new unseen data
 Use supervised machine learning algorithms
 Regression – Linear REgression, LASSO Regression, Logistic Regression, Ridge Regression
 Decision Tree Gradient Boosting, Random Forests
 Naïve Bayes
 Neighbors
 Gaussian processes
 Neural Networks
 Support Vector Machine (SVM)
Unsupervised Learning
 About 10 to 20 percent of machine learning is unsupervised learning.
 Unsupervised learning is a type of machine learning where the system operates on unlabeled examples. In this case, the system is not told the “right answer.”
 The algorithm tries to find a hidden structure or manifold in unlabeled data.
 Unsupervised learning is where you only have input data (X) and no corresponding output variables.
 The goal of unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.
 These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devices to discover and present the interesting structure in the data.
 Unsupervised learning problems can be further grouped into clustering and association problems.
 Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
 Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
We can:
 Use unsupervised learning techniques to discover and learn the structure in the input variables
 Use unsupervised clustering – a statistical and data science technique to detect clusters and cluster structures without any a priori knowledge or training set to help the classification algorithm.
 Use approaches to unsupervised learning:
 Clustering
 kmeans
 mixture models
 Hierarchical clustering^{}
 Anomaly detection
 Neural Networks
 Hebbian Learning
 Generative Adversarial Networks
 Approaches for learning latent variable models such as
 Expectation–maximization algorithm (EM)
 Method of moments
 Blind signal separation techniques
 Principal component analysis
 Independent component analysis
 Nonnegative matrix factorization
 Singular value decomposition
 Clustering
Semisupervised Learning
 Problems where you have a large amount of input data (X) and only some of the data is labeled (Y) are called semisupervised learning problems.
 These problems sit in between both supervised and unsupervised learning.
 Methods
 Generative models
 Generative approaches to statistical learning first seek to estimate {\displaystyle p(xy)}, the distribution of data points belonging to each class.
 Generative models assume that the distributions take some particular form {\displaystyle p(xy,\theta )} parameterized by the vector {\displaystyle \theta }. If these assumptions are incorrect, the unlabeled data may actually decrease the accuracy of the solution relative to what would have been obtained from labeled data alone. However, if the assumptions are correct, then the unlabeled data necessarily improve performance.
 Lowdensity separation
 It attempts to place boundaries in regions where there are few data points (labeled or unlabeled). One of the most commonly used algorithms is the transductive support vector machine, or TSVM (which, despite its name, may be used for inductive learning as well).
 Graphbased methods
 Graphbased methods for semisupervised learning use a graph representation of the data
 Heuristic approaches
 It is not intrinsically geared to learning from both unlabeled and labeled data, but instead, makes use of unlabeled data within a supervised learning framework.
 Generative models
We can:
 Use semisupervised learning for the same applications as supervised learning. But this technique uses both labeled and unlabeled data for training – typically, a small amount of labeled data with a large amount of unlabeled data
 Use this type of learning with methods such as classification, regression, and prediction
 Clustering
 Autoencoders – Multilayer Perception, Restricted Boltzmann machines
 EM
 TSVM
 Prediction and Classification
 Manifold regularization
 Use semisupervised learning when the cost associated with labeling data is too high to allow for a fully labeled training process
 Interpret semisupervised learning in at least two different ways.
 We can use unlabeled data to inform a computer algorithm of the structural information of the data that is relevant to supervised learning.
 The primary goal is unsupervised learning, and labels are viewed as side information to help the algorithm find the right intrinsic data structure.
Reinforcement Learning
AMSTAT Consulting uses reinforcement learning to discover for itself which actions yield the greatest rewards through trial and error. Reinforcement learning has three primary components:
 The agent – the learner or decision maker
 The environment – everything the agent interacts with
 Actions – what the agent can do
Algorithms for reinforcement learning include:
 Criterion of optimality
 Policy
 The agent’s action selection is modeled as a map called policy
 The policy map gives the probability of taking action “a” when in state “s.”
 Statevalue function
 Policy
 Brute Force
 The brute force approach entails two steps:
 For each possible policy, sample returns while following it.
 Choose the policy with the largest expected return
 The brute force approach entails two steps:
 Value Function
 Value function approaches attempt to find a policy that maximizes the return by maintaining a set of estimates of expected returns for some policy (usually either the “current” [onpolicy] or the optimal [offpolicy] one).
 These methods rely on the theory of MDPs, where optimality is defined in a sense that is stronger than the above one: A policy is called optimal if it achieves the bestexpected return from an initial state (i.e., initial distributions play no role in this definition). Again, an optimal policy can always be found amongst stationary policies.
We can:
 Use Markov decision processes (MDPs) in reinforcement learning. MDPs assume the state of the environment is perfectly observed by the agent
 Use a more general model called partially observable MDPs (or POMDPs) to find the policy that resolves the state uncertainty while maximizing the longterm reward when this is not the case
 Use reinforcement learning to discover for itself which actions yield the greatest rewards through trial and error
Generalization, Evaluation and Model Selection
We can:
 Use all types of machine learning that develop models that enable the learning machine to perform accurately on new, unseen examples or tasks
 Improve these models by using the machine
 Want the fit to be not too much, not too little, but just right
 Look at data of any complexity and size and build a model that sizes well to that data
 Look at all the data or a subset to create an accurate model
 One of the more powerful machine learning algorithms is a random forest. A random forest takes individual decision trees and combines them. When a new input is entered into the system, it runs down all of the trees. The result is either an average or a weighted average of all the terminal nodes that are reached.
 Validate a model to determine whether it can make effective predictions
 Use a training data set to develop the model
 Use known outofsample data to test it
Data Analytics
Today, the vast majority of enterprises have needs for descriptive analytics, which are necessary for effective management, but not sufficient to accelerate business performance. In order to scale to a higher level of responsiveness, enterprise organizations need to move beyond descriptive analytics and climb up the intelligence capability pyramid. We can use machine learning to help you with:
 Descriptive Analytics
 Diagnostic Analytics
 Predictive Analytics
 Predictive Modeling
 Automated Modeling
 Geospatial Analysis
 Text Analytics
 Social Network Analysis
 Entity Analytics
 Prescriptive Analytics
 Cognitive Analytics
 Operational Analytics
 Supply Chain Analytics
 Complexity Management
 EndtoEnd Optimization
 Supply Chain Risk Management
 Advanced Supplier Management
We can:
 Build a portfolio
 Deliver solutions
1. Build a Portfolio
We can demonstrate our ability to deliver by building a portfolio of completed machine learning projects.
Our Portfolio
We can:
 Pick a theme. This is the type of projects that we want to work on.
 Complete projects. We can apply our process to the dataset in order to deliver a result.
 Write up our findings. We can write up our findings.
2. Deliver Solutions
We can deliver solutions.
ML (Machine Learning) is at the heart of Data Science. It powers predictive technology. We can apply it to serve the following business objectives:
 Reduce user/customer attrition with churn prediction
 Acquire new customers through lead scoring and marketing campaigns optimization
 Crosssell products with targeted campaigns and personalized recommendations
 Optimize products and pricing by finding patterns in commerce data
 Increase customer engagement by predicting their needs and interests
 Improve operations by predicting demand (or improve resource management by predicting usage)
 Save time by automating tasks
 Make your team more productive, with predictive enterprise apps
We emphasize the use of machine learning to create predictive models:
 Customer Satisfaction Prediction
 Drug Selection for Treating Heart Problems
 Predicting Financial Performance of a Company
 Forecast Profits for Clothing Sales
We can:

Drive data scientist productivity.We focus on speeding up analysis by using big data platforms such as Apache Spark, automating portions of the data science life cycle, and improving the usability of the data science workbench.

Include multiple model deployment methods.Production models must be embedded in applications and business processes to provide business value. AMSTAT Consulting can deploy models in multiple ways, including as code embedded directly into applications, exposed as a service callable by applications, or injected into other platforms such as databases. Some of the more mature PAML (predictive analytics and machine learning) vendors include or are integrated with decision management platforms that allow AD&D pros and business users to use a visual metaphor or express decision logic as a set of business rules that can also include model

Provide sophisticated model management.The very nature of predictive models is that they may lose accuracy over time. More mature PAMl solutions include features to monitor the ongoing efficacy of models in production by comparing model output with established key performance indicators and testing new models using a champion/challenger or A/B testing scheme.

Allow polyglot programming.AMSTAT Consulting uses more than one programming language because of opensource addon libraries such as CRAN for R and scikitlearn for Python.

Expand to Apache Spark.Apache Spark is an opensource, primarily inmemory cluster computing platform that also includes Spark Ml, a set of machine learning libraries that data scientists are increasingly interested in using. In addition to Spark Ml, other machine libraries such as H2o.ai’s sparkling Water and IBM’s SystemML run on Spark.

Build the foundation for AI and invest in deep learning.Machine learning models are a key building block of AI applications. We use any of the PAML solutions to build models for use in AI applications. Deep learning is a branch of machine learning that we use to build models based on artificial neural networks. This method is particularly good at creating models for image recognition (including facial recognition), but it is applicable to more traditional use cases as well. We incorporate numerous open source libraries, such as Caffe, MXNet, and TensorFlow, into PAML solutions, or they are creating their own deeplearning algorithms built into the platform.
We can use Principal Component Analysis in countless machine learning applications:
 Fraud Detection
 Word and Character Recognition
 Speech Recognition
 Email Spam Detection
 Texture Classification
 Face Recognition
Principal Component Analysis converts a set of possible correlated features into a set of linearly uncorrelated features called principal components.
We can reduce the dimensionality:
 Reduce memory and disk space need to store the data
 Reveal hidden, simplified structures
 Solve issues of multicollinearity
 Visualize higherdimensional data
 Detect outliers
We can visualize data after dimension reduction:
 Which Medicare providers are similar to each other?
 Which Medicare providers are outliers?
 Further Exploratory Analysis
 External Knowledge for Deeper Understanding of The Groups
Clustering is unsupervised learning.
 No predefined classes
 No examples demonstrating how the data should be grouped
Clustering is a method of data exploration.
 A way of looking for patterns or structure in the data that are of interest
 As a standalone tool to get insight into data distribution
 As a processing step for other algorithms
Grouping
We can:
 Group them based on what they do
 Group them based on where they live
 Use multiple variables and do cluster analysis with a similarity/dissimilarity measure
 Cluster them based on their shopping behavior
 Discover distinct groups in their customer data sets, and then use this knowledge to develop targeted marking programs (e.g., fresh food lovers, junk food lovers)
Major Clustering Approaches
 Partitioning algorithm
 We can construct various partitions and then evaluate them by some criterion
 Hierarchical algorithms
 We can create a hierarchical decomposition of the set of data using some criterion
 Hard clustering: Each observation belongs to exactly one cluster
 Soft clustering: An observation can belong to more than one cluster to a certain degree (e.g., likelihood of belonging to the cluster)
 We can create a hierarchical decomposition of the set of data using some criterion
How to Choose a Clustering Algorithm
Depending on your problem, we can ask these questions:
 Is the algorithm scalable?
 Does it handle different types of attributes?
 Do you have to specify the number of clusters?
 How much control do you have on the parameters and on the output?
 How does it handle noise and outliers?
 Is it sensitive to order of observations?
 Can it handle high dimensional data?
 Are the results interpretable?
Kmeans Clustering Summary
 Advantage
 Simple, understandable, efficient
 Items automatically assigned to clusters
 Can be used as a preclustering step
 Other clustering algorithms can be applied on smaller subspaces.
 Disadvantages
 Must pick number of clusters k
 All items forced into a cluster
 Too sensitive to outliers and noise
 Does not work well with noncluster cluster shape
Similarity vs Dissimilarity
 Depends on what we want to find or emphasize in the data
 Depends on the type of attributes in your data
 Measures the relationship between 2 observations
 Weighting the attributes might be necessary.
 Some of the clustering algorithms use distance matrices as input.
Similarity
 Cosine similarity
 Inverse of distance measures values
Dissimilarity
 Euclidean distance
 Manhattan distance
Internal vs External
We can tell which clustering you need.
Internal criterion
 Good clustering will produce highquality clusters in which:
 The intracluster similarity is high.
 The intercluster similarity is low.
External criterion
 Quality measured by its ability to discover some or all of the hidden patterns or latent classes in gold standard data
 Assess a clustering with respect to ground truth
Estimating K: Reference Distribution
We can use the following methods to compare a clustering solution in the training data to a clustering solution in a reference distribution:
 Aligned box criterion (ABC)
 Gap statistic
 Cubic clustering criterion (CCC)
When to Use Clustering
 Segmentation
 Customer, product, store
 Anomaly detection
 Outliers typically belong to clusters with 1 observation.
 Identify fraud transactions
 Prepare for other techniques
 Summarize the documents =clusters and use centroids
 Predictive modeling on segments
 Logistic regression results can be improved by performing it on smaller clusters
 Missing value imputation
 Decrease dependence between attributes
 Preprocessing step
MANUFACTURING
 Determining propensity to buy
 Estimating warranty reserves
 Forecasting demand
 Optimizing process and predicting maintenance
 Orchestrating telematics
RETAIL
 Providing predictive inventory planning
 Driving recommendation engines, upsell, and crosssell opportunities
 Automating intelligent market segmentation and targeting
HEALTHCARE AND LIFE SCIENCES
 Providing realtime alerts and patient diagnostics
 Identifying diseases and risk stratification
 Optimizing patient triage
 Driving proactive health management
 Analyzing healthcare
ENERGY, UTILITIES, & FEEDSTOCK
 Analyzing power usage
 Processing seismic data
 Optimizing energy demand and supply
 Automating intelligent grid management
 Recommending customer pricing
FINANCIAL SERVICES
 Providing risk analysis and regulation
 Evaluating credit
 Segmenting customers
 Recommending crosssell & upsell opportunities
 Automating sales & marketing campaigns
TRAVEL & HOSPITALITY
 Analyzing traffic patterns and congestion management
 Scheduling aircrafts
 Creating dynamic prices
 Automating social feedback & interaction
Dr. Zamir S. Brelvi MD, PhD., CEO & CoFounder, EndoLogic
“We have been very pleased with working with AMSTAT Consulting. The service was custom tailored and on time completion. The statistical report was detailed with excellent graphics. The cost of the services was affordable for a startup company such as EndoLogic! Dr. Ann is very detail oriented and likes to know the project thoroughly that is being analyzed.”
Dr. Raj Singhal, MD., Director, Pediatric Anesthesiology, Phoenix Children’s Hospital
“Dr. Ann has been instrumental in helping with our statistical needs. In addition to her professionalism, she has been prompt and thorough with all of our requests. Dr. Ann’s work is impeccable, and I would recommend her services to anyone in need of assistance with statistical methods or interpretation. We plan on using Dr. Ann for all of our future needs, and I am thrilled to have been introduced to her.”
Dr. Haritha Boppana, MD, DHA, GHS Greenville Memorial Hospital
“I am a physician and was in need of statistical analysis of research data. I found AMSTAT Consulting on online search. Dr. Ann called me and explained the process involved in data analysis. Dr. Ann was always very prompt, helpful, intelligent and took time explaining the various tests used in conducting data analysis. Thank you so much!! I look forward to working with you in the future.”
Dr. Vincent Salyers, Dean, Faculty of Nursing, MacEwan University
“I have worked closely with AMSTAT Consulting on the data analysis/results of two research projects so feel as though I am knowledgeable about their expertise. On all accounts, the company provided me with reliable statistical analysis and results that I could translate into publishable format. They are conscientious experts who provide keen insights into appropriate statistical analysis given various data sets. I highly recommend them for your statistical support needs!”
Dr. Nancy Allen, Ph.D., Curriculum and Technology Consultant
“My project required the analysis of a complex survey that required a great deal of help in organizing the data and analyses. In addition, the project required a quick turnaround. AMSTAT Consulting asked all the right questions, made realistic and helpful suggestions, and completed the project in a timely manner. They were professional and helpful throughout the process. I highly recommend them.”