From Engineer to Researcher: Thinking in Machine Learning Systems

My professional background is rooted in building production systems: backend services, mobile applications, and distributed architectures. This experience shapes how I now think about machine learning.

In real-world settings, ML models do not live in isolation:

They run on constrained hardware.
They compete for resources with other services.
They must meet latency and reliability requirements.
They interact with constantly changing data distributions.

During my MSc, as I delved deeper into optimisation, deep learning, and probabilistic modelling, I began to see ML as only one part of a larger system. Questions that now interest me include:

How do we design training and inference pipelines that are both efficient and reliable?
How does distributed training behave under communication constraints?
What trade-offs exist between model size, latency, and accuracy in deployment?
How can we make ML systems more robust to distribution shift and failures?

This blog marks a turning point in my journey: I am no longer satisfied with models that perform well in isolation. I am increasingly drawn to Machine Learning Systems as a research area—where algorithmic design and system constraints meet.

Tags: ml-systems deployment research-journey