Remoteville

Remote Senior Kubernetes Operations Engineer Job in UK Lambda

Senior Kubernetes Operations Engineer Lambda
£135200 - £194400
Data ScienceHigh Performance ComputingIncident ResponseOpsPXERAIDRoot CauseSystem AdministrationTroubleshooting
Senior (5-8 years) - Expert (9+ years)
UK


The GPU Cloud for AI
405+ employees
Machine LearningCloud ComputingResearch

Open for applications

Role


Who you are

  • Experienced operations engineer with deep knowledge of running Linux clusters and systems
  • Familiarity with running on bare-metal, including knowledge of BMCs, kernel drivers, PXE, RAID, VLANs, and hypervisors
  • Understanding of containers, virtualization, and their mechanisms
  • Good understanding of daily operation, bug-fixing, and maintenance of Kubernetes
  • Experience in on-call environments and incident response
  • Excellent problem-solving skills and ability to learn quickly
  • Ability to work independently or as part of a team
  • Experience working with customers during incidents

Desirables

  • Deep Kubernetes experience
  • Network engineering
  • HPC clusters familiarity
  • AI/ML training clusters exposure
  • Machine learning/AI frameworks familiarity



What the job involves

  • Remotely installing, upgrading, operating, and maintaining bare-metal Kubernetes clusters
  • Handling cluster degradation, recovery, and resizing using fleet management tools
  • Performing out-of-hours on-call response for critical incidents
  • Working on improving tooling, automation, and processes for daily operations and incident response
  • Assisting customers with high-level Kubernetes questions and integration
  • Mentoring and assisting less-experienced team members
  • Having a voice in product direction to minimize operational costs

Share this job

Hide company

More jobs at Lambda

Company


Company mission

Lambda provides computation to accelerate human progress. We're a team of Deep Learning engineers building the world's best GPU cloud, clusters, servers, and workstations. Our products power engineers and researchers at the forefront of human knowledge. Customers include Intel, Microsoft, Google, Amazon Research, Tencent, Kaiser Permanente, MIT, Stanford, Harvard, Caltech, Los Alamos National Lab, Disney, and the Department of Defense.




Company benefits

  • Generous cash & equity compensation
  • High demand for systems
  • Talented team of 300+
  • Health, dental, and vision coverage
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan



Company values

  • Computation
  • Accelerate Progress
  • Innovation
  • Excellence



Company HQ

San Jose
;