Welcome to the 21st issue of the MLOps newsletter. In this issue, we cover some tips from Google Cloud on choosing the right MLOps capabilities, share what continuous delivery for ML systems looks like, deep dive into the performance of language models over time, discuss the implications of AWS terms of service, and much more.
A comment on the "Pitfalls" paper on large, static model degradation...
You mention that retraining frequently would solve the problem, but it's too expensive. You are half right. The reason it's too expensive is not inherent in problem - it's an issue with the chosen ML technology (deep learning). At Textician, we use very fast regression algorithms. Our customers can retrain entire models overnight on vanilla hardware, with a side benefit that convergence is built in.
This solves problems of both time and space. As the paper discusses, models degrade over time, but in our application, we often face the issue that each installation must deal with different jargon and/or documentation customs. (We work with medical records text, which is highly variable doctor to doctor and facility to facility.) Rapid (re)training is the solution here and a competitive advantage for us versus competitors that use one-size-fits-all static ML models or GOFAI rules-based systems.*
So - yes - large, static deep-learning models degrade over time and space, but the solution is not more data or regular retraining. It's picking a more appropriate technology!
* Rule-based systems in this application have 500K+ rules! It's a nontrivial task to tune them over time, let alone space.