Deploying ML models doesn't always need GPUs or Kubernetes clusters. Sometimes a simple, single CPU machine is plenty.
In the rush to 'scale' it is possible to ignore simple solutions. A single VM is easy to build, deploy and maintain. It is possible to make a fast and simple model serving system with a single virtual machine and a bit of code optimisation, writes Jacques Verré, product manager at Comet ML. Verre describes the process of benchmarking and improving performance of an API that served the Bert NLP model using Fast API, with some really simple modifications he was able to improve performance from 6 requests per second to 100. That is 8.6 million requests per day!
Some of the improvements he made where:
🛎️ Why this matters: If you can get away with it, keep it simple!