About the role

Overview

Pulse is tackling one of the most persistent challenges in data infrastructure: extracting accurate, structured information from complex documents at scale. We have a breakthrough approach to document understanding that combines intelligent schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools consistently fail.

We are a small, fast-growing team of engineers in San Francisco powering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. We are backed by tier 1 investors and growing quickly.

What makes our tech special is our multi-stage architecture:

Layout understanding with specialized component detection models
Low-latency OCR models for targeted extraction
Advanced reading-order algorithms for complex structures
Proprietary table structure recognition and parsing
Fine-tuned vision-language models for charts, tables, and figures

If you are passionate about the intersection of computer vision, NLP, and data infrastructure, your work at Pulse will directly impact customers and shape the future of document intelligence.

What we are looking for

5 days in-office at our San Francisco office
Eager to learn and adapt quickly
Prior startup or founding experience is a plus

What we are looking for

5 days in-office at our San Francisco office
Eager to learn and adapt quickly
Prior startup or founding experience is a plus

About the Role
Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own profiling, batching, and autoscaling across single-tenant and multi-tenant environments.

Responsibilities

Build inference services with smart batching and caching
Optimize kernels, tokenization, and model graphs
Evaluate vLLM, TensorRT LLM, and Triton tradeoffs
Implement autoscaling and admission control with clear SLOs
Own performance dashboards and capacity planning

Requirements

3+ years in performance engineering or ML systems
Strong Python, plus C++ or CUDA exposure
Experience with GPU profiling and model serving

Nice to have

Experience reducing p95 and cost in production ML systems

Sponsorship
Sponsorship available.

Compensation and benefits
Competitive base salary plus equity, performance-based bonus, relocation assistance for Bay Area moves, daily meal stipend, medical, vision, and dental coverage.

About Pulse

At Pulse, we're tackling one of the most persistent challenges in data infrastructure - extracting accurate, structured information from complex documents at scale. We've developed a breakthrough approach to document understanding that combines intelligent schema mapping with fine-tuned extraction models where legacy OCR and other parsing tools consistently fail.

We're a small but fast-growing team of engineers based in San Francisco, working on technology that's powering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. We're backed by tier 1 investors and are growing fast.

What makes our tech special is our multi-stage architecture approach to document intelligence:

Layout understanding with specialized component detection models
Low-latency OCR models for targeted extraction
Advanced reading order algorithms for complex document structures
Proprietary table structure recognition and parsing
Fine-tuned vision-language models for charts, tables, and figures

If you're passionate about solving complex challenges at the intersection of computer vision, NLP, and data infrastructure, you'll find that at Pulse, your work directly impacts customers and shapes the future of document intelligence.

What Are We Looking For?

5 days in-office at our San Francisco office
Eager to learn and adapt quickly
Prior startup or founding experience is a plus

Compensation

Competitive base salary plus equity
Performance-based bonuses
Relocation assistance for Bay Area moves
Daily meal stipends
Comprehensive medical, vision, and dental coverage

Pulse

Software Engineer, Inference

About the role

About Pulse

What Are We Looking For?

Compensation