Core Infrastructure Engineer
The San Francisco Compute Company is hiring for a core infrastructure engineer. This role works on the core product. On any given day, you might be optimizing the scheduler, debugging InfiniBand drivers, or writing a lot of Rust.
About working with us
As the name implies, we mostly work in San Francisco. Our office is in Hayes Valley. Many of our customers are physically nearby, in that you can literally walk down the street to meet them.
We probably don't need to say it, but the problems are interesting. We make supercomputers, not SaaS.
Working at San Francisco Compute means helping hundreds of university labs, focused-research organizations, startups and tinkerers get access to ultra powerful machines. This is their biggest bottleneck and we think it's the highest possible leverage you can have as a software engineer to materially push scientific progress forward.
About you
You like to check all the boxes and do it right the first time. Customers may spend $100k or more on a single training run, so all the bolts need to be tightened.
You are not afraid of debugging, profiling and patching hypervisors or fixing BIOS settings. Experience in a system programming language such as Rust, C++ or C is nice. Experience with Ethernet and InfiniBand internals is a plus.
You have a strong distributed systems background.
Machine learning experience is preferred. It's great if you're familiar with PyTorch. It's especially extra great if you're familiar with multi-node training, the InfiniBand networking stack, CUDA Kernels, and low-level optimization. If you're not familiar, you might be expected to learn a bit on the job.