How do you serve and manage thousands of ML models?
Serving one ML model is easy. Serving a couple is fairly easy. But once you need to serve and manage hundreds, you really need some sort of system to industrialise the process.
A recent post by Ernest Chan looked at various approaches used to solve this problem. It is common to use some sort of model store or registry along with config files that instruct how to use the model. This then gets sent to a generic model serving architecture that is typically hosted using Kubernetes. Lots of the big tech firms have built their own tools for this that work for their specific needs. This guide provides some detail on how they work. I think an important thing to remember if you are NOT a big tech firm, is that these solutions might not be right for you (you are not Google). However, reading this does give useful insights for any size of organisation.