From Engineer to Researcher: Thinking in Machine Learning Systems

My professional background is rooted in building production systems: backend services, mobile applications, and distributed architectures. This experience shapes how I now think about machine learning.

In real-world settings, ML models do not live in isolation:

  • They run on constrained hardware.
  • They compete for resources with other services.
  • They must meet latency and reliability requirements.
  • They interact with constantly changing data distributions.

During my MSc, as I delved deeper into optimisation, deep learning, and probabilistic modelling, I began to see ML as only one part of a larger system. Questions that now interest me include:

  • How do we design training and inference pipelines that are both efficient and reliable?
  • How does distributed training behave under communication constraints?
  • What trade-offs exist between model size, latency, and accuracy in deployment?
  • How can we make ML systems more robust to distribution shift and failures?

This blog marks a turning point in my journey: I am no longer satisfied with models that perform well in isolation. I am increasingly drawn to Machine Learning Systems as a research area—where algorithmic design and system constraints meet.