Inference is just another distributed system

We're a team of experienced researchers and software engineers: we've published academic papers, maintained open-source projects, and held the pager for critical services at hyperscalers. We first met at Twitter, where the infrastructure we built saved the company over $100M over five years.

Operating inference shares much with traditional services: balancing latency with load, understanding the limits of the hardware, and monitoring the system for unexpected changes. We help you run the best inference stack in your own cloud.

YaoYue

MihirNanavati

YuriVishnevsky

XiYang

BrianMartin

SeanLynch

Backed by