Morph builds the fastest LLM code editing inference engine in the world — we hit 10,500 tok/sec PER REQUEST, all on Nvidia hardware.
Our stack powers high-throughput AI workflows for vibe coding apps, devtools, PR bots, and IDEs.
We're hiring a founding engineer to push the limits of performance, safety, and scalability across our inference, retrieval, and diffing pipelines.
What You’ll Do
- Have used ML frameworks like Pytorch, Tensorflow, or JAX in projects or at work
- Work across low-latency inference, containerized deployment, and CI/CD tooling
- Work with CUDA, kernels, and bleeding edge inference optimization research.
- Implement the latest ML research into production quality systems
You’re a Fit If You
- Strong understanding of Pytorch/TF/JAX
- Know your way around real infra: Docker, Kubernetes, Linux, observability
- Have experience with LLM apps, devtools, compilers, building games, or code intelligence
- Prior experience with low level inference optimizations (ex. kernels)
- Prefer ownership and agency > bureaucracy
Why Morph
- zero fluff - work directly with the founder. Everyone on the team is an ML engineer
- We will never make you do story points
- Work on the fastest model and inference engine in the world - 4x the speed of the fastest model on cerebras
!uploaded image
Apply:
Describe the machine learning project you're most proud of. Please go into extreme technical detail. We’re familiar with all the libraries.
Describe what you were or are deeply obsessed about (anything)