Issue #6: MLOps Resources. Feature Stores. Interpretability. Predictive Uncertainty.
Welcome to the 6th issue of the ML Ops Newsletter. In this issue, we share MLOps resources (online courses, lectures, communities, books) that we’ve come across recently, discuss what we learned about Feature Stores from Tecton, model interpretability from Gradient and finally dive into some research about dealing with predictive uncertainty.
We are always excited to hear from you, our readers. Thank you for subscribing, and if you find this newsletter interesting, forward this to your friends and support this project ❤️
MLOps Resources and Community
We believe the present moment is just Day 1 for MLOps, ML transparency and explainability. Over the next 3-5 years, we expect a lot of great products that will be built in this area, and the community of ML practitioners working in this area will grow. With that in mind, we’d like to share some resources that we’ve come across recently that we found especially helpful:
Awesome MLOps: This Github repo is a comprehensive set of resources (research papers, Slack communities, helpful open-source tools, newsletters) for everything MLOps. If you are interested in learning about the topic, or even just keeping up to date with what’s new, we highly recommend starting here. This content in this repo is curated by Larysa Visengeriyeva
A list of online courses & lectures related to machine learning in production:
Stanford MLSys Seminar - A lecture series focussed on the challenges commonly observed in industry when deploying machine learning systems in the real world, and the role that academic research can play in solving some of these challenges
MLOps Fundamentals with GCP: A course focused on sharing MLOps tools and best practices for deploying, evaluating, monitoring and operating production ML systems on Google Cloud.
Applied ML by MadeWithML: We have shared MadeWithML in a previous newsletter, and here we wanted to highlight a new course they are currently starting. The course is focussed on all aspects of “ML in production” (data transformation, modeling & training, deployment, monitoring). One thing we especially like about it is the hands-on nature of the course, using entirely open-source tools.
Trustworthy ML Seminar series: A seminar lecture series focused on aspects of trustworthy machine learning: (which the facilitators describe as machine learning that is explainable, fair, privacy-preserving, causal, and robust). You can follow updates about the seminar at @trustworthy_ml
MLOps.community: A community for ML practitioners to come together and share their experiences and best practices around operationalizing machine learning in production. If you’re interested, you can check out the Slack community or follow @mlopscommunity on Twitter.
Introducing MLOps: A book from O’Reilly which we can’t wait to sink our teeth into!
This is a great introduction to Feature Stores, written by the team from Tecton.ai, which is one of the companies providing a Feature Store solution today.
What is a Feature Store?
A feature store is an ML-specific data system that:
Runs data pipelines that transform raw data into feature values
Stores and manages the feature data itself, and
Serves feature data consistently for training and inference purposes
They state that a Feature Store acts as a central hub for all ML-related data and metadata across the entire ML lifecycle. If you want to explore the data, train models, create new features, serve models in production or monitor feature data, you should be interfacing with your Feature Store.
What are the components of a Feature Store?
The blog talks about five main components of a Feature Store:
Serving: Providing an easy and consistent way to serve all kinds of feature data at model serving time.
Storage: Feature data needs to be persisted so it can be served later (either in data warehouses like Snowflake or key-value stores like Cassandra).
Transformations: Incoming data needs to be transformed and processed so it can be stored for serving models.
Monitoring: Feature stores can provide metrics on features as data changes so that data drifts can be caught.
Registry: This is a centralized store for all feature definitions and metadata which is used by an entire team to search and catalogue and new features can be published to it.
Feature Stores are an exciting development in MLOps. They are one of the clearest examples of bringing an important best practice that is typically only seen at established companies to smaller and less mature (from an ML standpoint) organizations. The Tecton founding team was also behind the Michaelangelo platform at Uber.
That being said, it is much more common to see Feature Stores being built in-house by ML teams - watch this talk by Spencer Barton for how ML infra develops at an early-stage company. Similarly, read this post by the Doordash team on they built and scaled their Feature Store. The reasons for this are two-fold:
Often, the specific needs develop slowly and it’s simpler to spend a week building an internal tool. If an external vendor offers too many “features” in their Feature Store offering, it might feel too heavy-handed.
A feature store needs to be very tightly integrated with the existing data infrastructure, so it might be an operational challenge to integrate with a vendor.
We expect that a Feature Store product that is built to be easy-to-integrate and easy-to-manage and allows users to slowly grow their usage as their needs develop to gain traction faster.
This article is a good overview of machine learning interpretability which itself summarizes a paper from Lipton et al.
At present, interpretability has no formal technical meaning...The demand for interpretability arises when there is a mismatch between the formal objectives of supervised learning (test set predictive performance) and the real world costs in a deployment setting.
Why is Interpretability important?
Trust: interpretability in machine learning is useful because it can aid in trust. As humans, we may be reluctant to rely on machine learning models for certain critical tasks, e.g., medical diagnosis, unless we know "how they work."
Safety: When there are the inevitable shifts in distribution, interpretability approaches can help diagnose errors and provide insights into potential remedies.
Contestability: As we delegate more decision-making to ML models, it becomes important for the people to appeal these decisions. Black-box models provide no such recourse because they don't decompose the decision into anything contestable.
Properties of Interpretable Models
There are two categories of desirable properties. First, those that help with transparency:
Simulatability: Can a human walk through each of the model’s steps?
Decomposability: Does each part of a model - input, parameter, calculation - have an intuitive explanation?
Algorithmic Transparency: Does the algorithm itself have any guarantees that make it easier to understand?
Next, properties that provide post-hoc interpretability:
Text Explanations: Can the model explain its decision in natural language, after the fact?
Visualization/Local Explanations: Can the model identify what is/was important to its decision-making?
Explanation by Example: Can the model show what else in the training data it thinks are related to this input/output?
Interpretability is going to be important in the future as ML applications take off in domains such as healthcare, finance, etc. Most techniques still feel very research-y and the trade-offs with ML model performance (accuracy, precision, recall) aren’t clearly established yet. In the meantime, using techniques like LIME for NLP models or Saliency Maps for CV models might be enough to get started. For some gorgeous looking feature visualization research, head over to the work from Olah et al on Distill.pub.
Usually, we think of ML models as producing single-point estimates, given an input: Y = f(X), where “f” is the learned function (i.e. model parameters). Predictive uncertainty is a measure of how “confident” a model is about its predictions -- i.e. a probability distribution over predictions given the input.
Research presented in this paper
ML-assisted decision making is becoming increasingly common as a way to improve human decision making using AI (for e.g. when pricing houses for sale, making lending decisions, detecting fraud & abuse online etc).
This paper by researchers at Harvard and MIT studies the impact of communicating predictive uncertainty (in addition to the model’s prediction) to humans as they’re trying to make a prediction. They conduct a user study where each participant is asked to predict monthly rental prices of apartments in Boston, MA using a model’s output as helpful context. Within this setting, the study assesses how people respond to different types of predictive uncertainty (i.e. posterior distributions with different shapes and variance). Additionally, the study assesses this response among different populations of users, with some users having domain expertise for the problem and some having general machine learning expertise. The study found that communicating predictive uncertainty can help increase agreement of a user’s final decision with model predictions (compared to when no uncertainty estimates are shown).
As far as we know, this paper is one of the first attempts to study the impact of predictive uncertainty on ML-assisted decision making. We imagine that research like this has the potential to become an integral part of a product manager’s toolbox - how do you design your ML products to make it easier for users to use them for decision making? From a user’s standpoint, we might all benefit from thinking harder about how we can use model outputs and uncertainties to make better decisions.
From our Readers
Rohith Desikan shares a nifty library called kedro from the team at QuantumBlack Labs. The library combines a set of features to help bring software engineering practices to ML pipelines. We would love to hear if you have considered or worked with this library!
Underspecification in ML
Ali Chaudhry shares a wonderful article from the MIT Technology Review that discusses a problem called “underspecification” in ML training. This problem is seen when the same model is trained multiple times on the same data but leads to very different performance in production. The authors claim that it might be a more important problem than data shifts - we plan to dive into this paper in a later issue.
Thanks for making it to the end of the newsletter! This has been curated by Nihit Desai and Rishabh Bhargava. We would love to hear your thoughts and feedback. If you have suggestions for what we should be covering in this newsletter, tweet us @mlopsroundup (open to DMs as well) or email us at firstname.lastname@example.org.