Welcome to the 21st issue of the MLOps newsletter. In this issue, we cover some tips from Google Cloud on choosing the right MLOps capabilities, share what continuous delivery for ML systems looks like, deep dive into the performance of language models over time, discuss the implications of AWS terms of service, and much more.
A comment on the "Pitfalls" paper on large, static model degradation...
You mention that retraining frequently would solve the problem, but it's too expensive. You are half right. The reason it's too expensive is not inherent in problem - it's an issue with the chosen ML technology (deep learning). At Textician, we use very fast regression algorithms. Our customers can retrain entire models overnight on vanilla hardware, with a side benefit that convergence is built in.
This solves problems of both time and space. As the paper discusses, models degrade over time, but in our application, we often face the issue that each installation must deal with different jargon and/or documentation customs. (We work with medical records text, which is highly variable doctor to doctor and facility to facility.) Rapid (re)training is the solution here and a competitive advantage for us versus competitors that use one-size-fits-all static ML models or GOFAI rules-based systems.*
So - yes - large, static deep-learning models degrade over time and space, but the solution is not more data or regular retraining. It's picking a more appropriate technology!
* Rule-based systems in this application have 500K+ rules! It's a nontrivial task to tune them over time, let alone space.
In the past, I worked on a problem where we had to rapidly train 100s of text classifiers on a very specific kind of data. A rules-based approach + very simple classifiers (think logistic regression) worked perfectly well. We were able to churn out many classifiers at a fairly rapid clip. Although if I look back on it, I think the work expended in training the marginal classifier was almost the same as the first one.
In a different project, I was working on a much more "difficult" problem -- that of summarizing text. Now, depending on the dataset, summarization can also be done with some rules. However, it's much harder to get good results that way. To me, large language models are the perfect technology for that use case. These models already know a lot about the syntax and semantics of a language, and fine-tuning on your specific dataset can lead to reasonably good results pretty quickly.
Interesting article. A whole lot of the issues he discusses are solved by our implementations:
1) We run in a container on premises. Data need not be de-identified because it stays behind the firewall.
2) We run on vanilla hardware, easily training a model for thousands of ICD-10 codes in a day or two. So, as discussed above, model degradation over time is obviated, and robustness is increased, by regular retraining.
3) We are immune to the vagaries of jargon. As mentioned above, these vary even among doctors at the same facility, let alone across facilities. Further, medical coding is somewhat of an art, so coding standards and preferences vary too. Attempting to come up with a single/translatable solution to this is... uh... difficult.
A comment on the "Pitfalls" paper on large, static model degradation...
You mention that retraining frequently would solve the problem, but it's too expensive. You are half right. The reason it's too expensive is not inherent in problem - it's an issue with the chosen ML technology (deep learning). At Textician, we use very fast regression algorithms. Our customers can retrain entire models overnight on vanilla hardware, with a side benefit that convergence is built in.
This solves problems of both time and space. As the paper discusses, models degrade over time, but in our application, we often face the issue that each installation must deal with different jargon and/or documentation customs. (We work with medical records text, which is highly variable doctor to doctor and facility to facility.) Rapid (re)training is the solution here and a competitive advantage for us versus competitors that use one-size-fits-all static ML models or GOFAI rules-based systems.*
So - yes - large, static deep-learning models degrade over time and space, but the solution is not more data or regular retraining. It's picking a more appropriate technology!
* Rule-based systems in this application have 500K+ rules! It's a nontrivial task to tune them over time, let alone space.
Thanks for the comment, Dan.
In the past, I worked on a problem where we had to rapidly train 100s of text classifiers on a very specific kind of data. A rules-based approach + very simple classifiers (think logistic regression) worked perfectly well. We were able to churn out many classifiers at a fairly rapid clip. Although if I look back on it, I think the work expended in training the marginal classifier was almost the same as the first one.
In a different project, I was working on a much more "difficult" problem -- that of summarizing text. Now, depending on the dataset, summarization can also be done with some rules. However, it's much harder to get good results that way. To me, large language models are the perfect technology for that use case. These models already know a lot about the syntax and semantics of a language, and fine-tuning on your specific dataset can lead to reasonably good results pretty quickly.
I definitely hear you on choosing the right technology for the right problem though! Your use case of working with medical records text reminds me of this post we linked in the latest issue: https://www.oreilly.com/content/lessons-learned-building-natural-language-processing-systems-in-health-care/
Thanks once again for the note!
Interesting article. A whole lot of the issues he discusses are solved by our implementations:
1) We run in a container on premises. Data need not be de-identified because it stays behind the firewall.
2) We run on vanilla hardware, easily training a model for thousands of ICD-10 codes in a day or two. So, as discussed above, model degradation over time is obviated, and robustness is increased, by regular retraining.
3) We are immune to the vagaries of jargon. As mentioned above, these vary even among doctors at the same facility, let alone across facilities. Further, medical coding is somewhat of an art, so coding standards and preferences vary too. Attempting to come up with a single/translatable solution to this is... uh... difficult.
Let us know if you'd like to give it a try.