Our Expertise
AMSTAT Consulting uses machine learning that is an artificial intelligence technology that provides systems with the ability to learn without being explicitly programmed. Our clients cite these reasons for choosing to work with us:
- Everyone on our team is a PhD-trained scientist with experience deploying machine learning tools for many different problems and industries.
- We have Ph.D. in statistics at leading universities including Harvard, Stanford, and Columbia.
- We use modern data science and machine learning tools that make predictive models, classification, segmentation, and natural language processing easier and more powerful than ever before.
- Our data wrangling expertise makes quick work of any dataset to deliver finished projects fast.
- Our experience enables us to pick the right tool for the job, whether it’s a deep convolutional neural network or a linear model.
- Our application development experience allows us to tightly integrate predictive models with your app, dashboard, reporting, API, and other components of your infrastructure.
PhD in Statistics at Leading Universities including Harvard, Stanford, and Columbia
All of our principals have PhD in statistics at leading universities including Harvard, Stanford, and Columbia.PhD in Statistics at Leading Universities including Harvard, Stanford, and Columbia
All of our principals have PhD in statistics at leading universities including Harvard, Stanford, and Columbia.
Nationally Renowned Machine Learning Experts
They include nationally renowned machine learning experts.Nationally Renowned Machine Learning Experts
They include nationally renowned machine learning experts.
Extensive Background in Statistics
They have extensive backgrounds in statistics and over 100 years of practical experience in quantitative methods.Extensive Backgrounds in Statistics
They have extensive backgrounds in statistics and over 100 years of practical experience in quantitative methods.
Deep Knowledge of Advanced Machine Learning Algorithms
We utilize deep knowledge of advanced machine learning algorithms.Deep Knowledge of Advanced Machine Learning Algorithms
We utilize deep knowledge of advanced machine learning algorithms.
AMSTAT Consulting’s Services
Four different types of machine learning algorithms are available that can be organized into a taxonomy based on the desired outcome of the algorithm or the type of input available for training the machine. We can use machine learning:
Supervised Learning
- Most machine learning is supervised learning.
- Supervised learning algorithms are “trained” using labeled examples where the desired output is known.
- It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process
- We know the correct answers, the algorithm iteratively makes predictions on the training data and is corrected by the teacher.
- Learning stops when the algorithm achieves an acceptable level of performance.
- Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.
- The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.
- Supervised learning problems can be further grouped into regression and classification problems.
- Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”.
- Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
We can:
- Determine the input feature representation of the learned function.
- The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. The number of features should not be too large, because of the curse of dimensionality; but should contain enough information to accurately predict the output.
- Determine the structure of the learned function and corresponding learning algorithm
- Complete the design
- Run the learning algorithm
- Some supervised learning algorithms require the user to determine certain control parameters. These parameters may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via cross-validation.
- Evaluate the accuracy of the learned function
We can:
- Use supervised learning in applications that use historical data to predict likely future events
- Use supervised learning techniques to make best guess predictions for the unlabeled data
- Feed the data back into the supervised learning algorithm as training data and use the model to make predictions on new unseen data
- Use supervised machine learning algorithms
- Regression – Linear REgression, LASSO Regression, Logistic Regression, Ridge Regression
- Decision Tree -Gradient Boosting, Random Forests
- Naïve Bayes
- Neighbors
- Gaussian processes
- Neural Networks
- Support Vector Machine (SVM)
Unsupervised Learning
- About 10 to 20 percent of machine learning is unsupervised learning.
- Unsupervised learning is a type of machine learning where the system operates on unlabeled examples. In this case, the system is not told the “right answer.”
- The algorithm tries to find a hidden structure or manifold in unlabeled data.
- Unsupervised learning is where you only have input data (X) and no corresponding output variables.
- The goal of unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.
- These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devices to discover and present the interesting structure in the data.
- Unsupervised learning problems can be further grouped into clustering and association problems.
- Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
- Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
We can:
- Use unsupervised learning techniques to discover and learn the structure in the input variables
- Use unsupervised clustering – a statistical and data science technique to detect clusters and cluster structures without any a priori knowledge or training set to help the classification algorithm.
- Use approaches to unsupervised learning:
- Clustering
- k-means
- mixture models
- Hierarchical clustering
- Anomaly detection
- Neural Networks
- Hebbian Learning
- Generative Adversarial Networks
- Approaches for learning latent variable models such as
- Expectation–maximization algorithm (EM)
- Method of moments
- Blind signal separation techniques
- Principal component analysis
- Independent component analysis
- Non-negative matrix factorization
- Singular value decomposition
- Clustering
Semisupervised Learning
- Problems where you have a large amount of input data (X) and only some of the data is labeled (Y) are called semi-supervised learning problems.
- These problems sit in between both supervised and unsupervised learning.
- Methods
- Generative models
- Generative approaches to statistical learning first seek to estimate {\displaystyle p(x|y)}, the distribution of data points belonging to each class.
- Generative models assume that the distributions take some particular form {\displaystyle p(x|y,\theta )} parameterized by the vector {\displaystyle \theta }. If these assumptions are incorrect, the unlabeled data may actually decrease the accuracy of the solution relative to what would have been obtained from labeled data alone. However, if the assumptions are correct, then the unlabeled data necessarily improve performance.
- Low-density separation
- It attempts to place boundaries in regions where there are few data points (labeled or unlabeled). One of the most commonly used algorithms is the transductive support vector machine, or TSVM (which, despite its name, may be used for inductive learning as well).
- Graph-based methods
- Graph-based methods for semi-supervised learning use a graph representation of the data
- Heuristic approaches
- It is not intrinsically geared to learning from both unlabeled and labeled data, but instead, makes use of unlabeled data within a supervised learning framework.
- Generative models
We can:
- Use semisupervised learning for the same applications as supervised learning. But this technique uses both labeled and unlabeled data for training – typically, a small amount of labeled data with a large amount of unlabeled data
- Use this type of learning with methods such as classification, regression, and prediction
- Clustering
- Autoencoders – Multilayer Perception, Restricted Boltzmann machines
- EM
- TSVM
- Prediction and Classification
- Manifold regularization
- Use semisupervised learning when the cost associated with labeling data is too high to allow for a fully labeled training process
- Interpret semisupervised learning in at least two different ways.
- We can use unlabeled data to inform a computer algorithm of the structural information of the data that is relevant to supervised learning.
- The primary goal is unsupervised learning, and labels are viewed as side information to help the algorithm find the right intrinsic data structure.
Reinforcement Learning
AMSTAT Consulting uses reinforcement learning to discover for itself which actions yield the greatest rewards through trial and error. Reinforcement learning has three primary components:
- The agent – the learner or decision maker
- The environment – everything the agent interacts with
- Actions – what the agent can do
Algorithms for reinforcement learning include:
- Criterion of optimality
- Policy
- The agent’s action selection is modeled as a map called policy
- The policy map gives the probability of taking action “a” when in state “s.”
- State-value function
- Policy
- Brute Force
- The brute force approach entails two steps:
- For each possible policy, sample returns while following it.
- Choose the policy with the largest expected return
- The brute force approach entails two steps:
- Value Function
- Value function approaches attempt to find a policy that maximizes the return by maintaining a set of estimates of expected returns for some policy (usually either the “current” [on-policy] or the optimal [off-policy] one).
- These methods rely on the theory of MDPs, where optimality is defined in a sense that is stronger than the above one: A policy is called optimal if it achieves the best-expected return from an initial state (i.e., initial distributions play no role in this definition). Again, an optimal policy can always be found amongst stationary policies.
We can:
- Use Markov decision processes (MDPs) in reinforcement learning. MDPs assume the state of the environment is perfectly observed by the agent
- Use a more general model called partially observable MDPs (or POMDPs) to find the policy that resolves the state uncertainty while maximizing the long-term reward when this is not the case
- Use reinforcement learning to discover for itself which actions yield the greatest rewards through trial and error
Generalization, Evaluation and Model Selection
We can:
- Use all types of machine learning that develop models that enable the learning machine to perform accurately on new, unseen examples or tasks
- Improve these models by using the machine
- Want the fit to be not too much, not too little, but just right
- Look at data of any complexity and size and build a model that sizes well to that data
- Look at all the data or a subset to create an accurate model
- One of the more powerful machine learning algorithms is a random forest. A random forest takes individual decision trees and combines them. When a new input is entered into the system, it runs down all of the trees. The result is either an average or a weighted average of all the terminal nodes that are reached.
- Validate a model to determine whether it can make effective predictions
- Use a training data set to develop the model
- Use known out-of-sample data to test it
Data Analytics
Today, the vast majority of enterprises have needs for descriptive analytics, which are necessary for effective management, but not sufficient to accelerate business performance. In order to scale to a higher level of responsiveness, enterprise organizations need to move beyond descriptive analytics and climb up the intelligence capability pyramid. We can use machine learning to help you with:
- Descriptive Analytics
- Diagnostic Analytics
- Predictive Analytics
- Predictive Modeling
- Automated Modeling
- Geospatial Analysis
- Text Analytics
- Social Network Analysis
- Entity Analytics
- Prescriptive Analytics
- Cognitive Analytics
- Operational Analytics
- Supply Chain Analytics
- Complexity Management
- End-to-End Optimization
- Supply Chain Risk Management
- Advanced Supplier Management
We can:
- Build a portfolio
- Deliver solutions
1. Build a Portfolio
We can demonstrate our ability to deliver by building a portfolio of completed machine learning projects.
Our Portfolio
We can:
- Pick a theme. This is the type of projects that we want to work on.
- Complete projects. We can apply our process to the dataset in order to deliver a result.
- Write up our findings. We can write up our findings.
2. Deliver Solutions
We can deliver solutions.
ML (Machine Learning) is at the heart of Data Science. It powers predictive technology. We can apply it to serve the following business objectives:
- Reduce user/customer attrition with churn prediction
- Acquire new customers through lead scoring and marketing campaigns optimization
- Cross-sell products with targeted campaigns and personalized recommendations
- Optimize products and pricing by finding patterns in commerce data
- Increase customer engagement by predicting their needs and interests
- Improve operations by predicting demand (or improve resource management by predicting usage)
- Save time by automating tasks
- Make your team more productive, with predictive enterprise apps
We emphasize the use of machine learning to create predictive models:
- Customer Satisfaction Prediction
- Drug Selection for Treating Heart Problems
- Predicting Financial Performance of a Company
- Forecast Profits for Clothing Sales
We can:
-
Drive data scientist productivity.We focus on speeding up analysis by using big data platforms such as Apache Spark, automating portions of the data science life cycle, and improving the usability of the data science workbench.
-
Include multiple model deployment methods.Production models must be embedded in applications and business processes to provide business value. AMSTAT Consulting can deploy models in multiple ways, including as code embedded directly into applications, exposed as a service callable by applications, or injected into other platforms such as databases. Some of the more mature PAML (predictive analytics and machine learning) vendors include or are integrated with decision management platforms that allow AD&D pros and business users to use a visual metaphor or express decision logic as a set of business rules that can also include model
-
Provide sophisticated model management.The very nature of predictive models is that they may lose accuracy over time. More mature PAMl solutions include features to monitor the ongoing efficacy of models in production by comparing model output with established key performance indicators and testing new models using a champion/challenger or A/B testing scheme.
-
Allow polyglot programming.AMSTAT Consulting uses more than one programming language because of open-source add-on libraries such as CRAN for R and scikit-learn for Python.
-
Expand to Apache Spark.Apache Spark is an open-source, primarily in-memory cluster computing platform that also includes Spark Ml, a set of machine learning libraries that data scientists are increasingly interested in using. In addition to Spark Ml, other machine libraries such as H2o.ai’s sparkling Water and IBM’s SystemML run on Spark.
-
Build the foundation for AI and invest in deep learning.Machine learning models are a key building block of AI applications. We use any of the PAML solutions to build models for use in AI applications. Deep learning is a branch of machine learning that we use to build models based on artificial neural networks. This method is particularly good at creating models for image recognition (including facial recognition), but it is applicable to more traditional use cases as well. We incorporate numerous open source libraries, such as Caffe, MXNet, and TensorFlow, into PAML solutions, or they are creating their own deep-learning algorithms built into the platform.
We can use Principal Component Analysis in countless machine learning applications:
- Fraud Detection
- Word and Character Recognition
- Speech Recognition
- Email Spam Detection
- Texture Classification
- Face Recognition
Principal Component Analysis converts a set of possible correlated features into a set of linearly uncorrelated features called principal components.
We can reduce the dimensionality:
- Reduce memory and disk space need to store the data
- Reveal hidden, simplified structures
- Solve issues of multicollinearity
- Visualize higher-dimensional data
- Detect outliers
We can visualize data after dimension reduction:
- Which Medicare providers are similar to each other?
- Which Medicare providers are outliers?
- Further Exploratory Analysis
- External Knowledge for Deeper Understanding of The Groups
Clustering is unsupervised learning.
- No predefined classes
- No examples demonstrating how the data should be grouped
Clustering is a method of data exploration.
- A way of looking for patterns or structure in the data that are of interest
- As a stand-alone tool to get insight into data distribution
- As a processing step for other algorithms
Grouping
We can:
- Group them based on what they do
- Group them based on where they live
- Use multiple variables and do cluster analysis with a similarity/dissimilarity measure
- Cluster them based on their shopping behavior
- Discover distinct groups in their customer data sets, and then use this knowledge to develop targeted marking programs (e.g., fresh food lovers, junk food lovers)
Major Clustering Approaches
- Partitioning algorithm
- We can construct various partitions and then evaluate them by some criterion
- Hierarchical algorithms
- We can create a hierarchical decomposition of the set of data using some criterion
- Hard clustering: Each observation belongs to exactly one cluster
- Soft clustering: An observation can belong to more than one cluster to a certain degree (e.g., likelihood of belonging to the cluster)
- We can create a hierarchical decomposition of the set of data using some criterion
How to Choose a Clustering Algorithm
Depending on your problem, we can ask these questions:
- Is the algorithm scalable?
- Does it handle different types of attributes?
- Do you have to specify the number of clusters?
- How much control do you have on the parameters and on the output?
- How does it handle noise and outliers?
- Is it sensitive to order of observations?
- Can it handle high dimensional data?
- Are the results interpretable?
K-means Clustering Summary
- Advantage
- Simple, understandable, efficient
- Items automatically assigned to clusters
- Can be used as a pre-clustering step
- Other clustering algorithms can be applied on smaller sub-spaces.
- Disadvantages
- Must pick number of clusters k
- All items forced into a cluster
- Too sensitive to outliers and noise
- Does not work well with non-cluster cluster shape
Similarity vs Dissimilarity
- Depends on what we want to find or emphasize in the data
- Depends on the type of attributes in your data
- Measures the relationship between 2 observations
- Weighting the attributes might be necessary.
- Some of the clustering algorithms use distance matrices as input.
Similarity
- Cosine similarity
- Inverse of distance measures values
Dissimilarity
- Euclidean distance
- Manhattan distance
Internal vs External
We can tell which clustering you need.
Internal criterion
- Good clustering will produce high-quality clusters in which:
- The intra-cluster similarity is high.
- The inter-cluster similarity is low.
External criterion
- Quality measured by its ability to discover some or all of the hidden patterns or latent classes in gold standard data
- Assess a clustering with respect to ground truth
Estimating K: Reference Distribution
We can use the following methods to compare a clustering solution in the training data to a clustering solution in a reference distribution:
- Aligned box criterion (ABC)
- Gap statistic
- Cubic clustering criterion (CCC)
When to Use Clustering
- Segmentation
- Customer, product, store
- Anomaly detection
- Outliers typically belong to clusters with 1 observation.
- Identify fraud transactions
- Prepare for other techniques
- Summarize the documents =clusters and use centroids
- Predictive modeling on segments
- Logistic regression results can be improved by performing it on smaller clusters
- Missing value imputation
- Decrease dependence between attributes
- Pre-processing step
MANUFACTURING
- Determining propensity to buy
- Estimating warranty reserves
- Forecasting demand
- Optimizing process and predicting maintenance
- Orchestrating telematics
RETAIL
- Providing predictive inventory planning
- Driving recommendation engines, upsell, and cross-sell opportunities
- Automating intelligent market segmentation and targeting
HEALTHCARE AND LIFE SCIENCES
- Providing real-time alerts and patient diagnostics
- Identifying diseases and risk stratification
- Optimizing patient triage
- Driving proactive health management
- Analyzing healthcare
ENERGY, UTILITIES, & FEEDSTOCK
- Analyzing power usage
- Processing seismic data
- Optimizing energy demand and supply
- Automating intelligent grid management
- Recommending customer pricing
FINANCIAL SERVICES
- Providing risk analysis and regulation
- Evaluating credit
- Segmenting customers
- Recommending cross-sell & upsell opportunities
- Automating sales & marketing campaigns
TRAVEL & HOSPITALITY
- Analyzing traffic patterns and congestion management
- Scheduling aircrafts
- Creating dynamic prices
- Automating social feedback & interaction
Dr. Zamir S. Brelvi MD, PhD., CEO & Co-Founder, EndoLogic
“We have been very pleased with working with AMSTAT Consulting. The service was custom tailored and on time completion. The statistical report was detailed with excellent graphics. The cost of the services was affordable for a start-up company such as EndoLogic! Dr. Ann is very detail oriented and likes to know the project thoroughly that is being analyzed.”
Dr. Raj Singhal, MD., Director, Pediatric Anesthesiology, Phoenix Children’s Hospital
“Dr. Ann has been instrumental in helping with our statistical needs. In addition to her professionalism, she has been prompt and thorough with all of our requests. Dr. Ann’s work is impeccable, and I would recommend her services to anyone in need of assistance with statistical methods or interpretation. We plan on using Dr. Ann for all of our future needs, and I am thrilled to have been introduced to her.”
Dr. Haritha Boppana, MD, DHA, GHS Greenville Memorial Hospital
“I am a physician and was in need of statistical analysis of research data. I found AMSTAT Consulting on online search. Dr. Ann called me and explained the process involved in data analysis. Dr. Ann was always very prompt, helpful, intelligent and took time explaining the various tests used in conducting data analysis. Thank you so much!! I look forward to working with you in the future.”
Dr. Vincent Salyers, Dean, Faculty of Nursing, MacEwan University
“I have worked closely with AMSTAT Consulting on the data analysis/results of two research projects so feel as though I am knowledgeable about their expertise. On all accounts, the company provided me with reliable statistical analysis and results that I could translate into publishable format. They are conscientious experts who provide keen insights into appropriate statistical analysis given various data sets. I highly recommend them for your statistical support needs!”
Dr. Nancy Allen, Ph.D., Curriculum and Technology Consultant
“My project required the analysis of a complex survey that required a great deal of help in organizing the data and analyses. In addition, the project required a quick turn-around. AMSTAT Consulting asked all the right questions, made realistic and helpful suggestions, and completed the project in a timely manner. They were professional and helpful throughout the process. I highly recommend them.”