O

Services

Case studies

Contact

11.05.21

Keep it simple! Deploy a model on a single machine

featured image thumbnail for post Keep it simple! Deploy a model on a single machine

Deploying ML models doesn't always need GPUs or Kubernetes clusters. Sometimes a simple, single CPU machine is plenty.

In the rush to 'scale' it is possible to ignore simple solutions. A single VM is easy to build, deploy and maintain. It is possible to make a fast and simple model serving system with a single virtual machine and a bit of code optimisation, writes Jacques Verré, product manager at Comet ML. Verre describes the process of benchmarking and improving performance of an API that served the Bert NLP model using Fast API, with some really simple modifications he was able to improve performance from 6 requests per second to 100. That is 8.6 million requests per day!

Some of the improvements he made where:

  • Turning off gradient computation in Pytorch
  • Tuning FastAPI by adding more gunicorn workers and turning off asynchronous processing
  • Using model distillation to decrease model size
  • Choosing the right cloud instances (30 vCPUs)

🛎️ Why this matters: If you can get away with it, keep it simple!

←Previous: The 28 billion dollar private AI companies

Next: Should we stop hiring data scientists?→


Keep up with the latest developments in data science. One email per month.

ortom logoortom logoortom logoortom logo

©2025

LINKEDIN

CLUTCH.CO

TERMS & PRIVACY