Customer Reliability Engineering is what you get when you treat operations and data as if it’s a software problem. Our mission is to architect, build, secure and provide for the software and systems behind all of our customer’s services, with an ever-watchful eye on their availability, latency, performance, and capacity.

This is an unusual job, unlike others in the industry. Like traditional operations groups, we design, build and help to keep important, revenue-critical systems up and running despite downtimes, traffic outages, and configuration problems.

Unlike traditional operations groups, we often have an ability and authority to fix, extend, and scale the code to keep it working and harden it against all the vagaries of the internet. We hire people from both systems and software backgrounds. Strong candidates will have experience with both.

As an Customer Reliability Engineer on our team, you will have the opportunity to tackle the complex problems of scale which are sometimes unique to our customers while using your expertise in resolving problems, coding, algorithms, complexity analysis and large-scale system design.

This is a hands-on technical expert role with a high potential for learning new things and creating new experiences. If you are a positive-thinking, versatile technical leader who has that kind of i-want-to-know-everything drive, and you thrive in a fast-paced, startup-like environment, we want you on-board with our all-star winning team.

CRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

Requirements:

  • BS degree in Computer Science or related technical field involving systems engineering (e.g., physics or mathematics), or equivalent practical experience.
  • Experience in one or more of the following: C, C++, Java, Python, Go, Perl, Ruby or shell scripting.
  • Experience with Unix/Linux operating systems internals and administration (e.g., filesystems, inodes, system calls) or networking (e.g., TCP/IP, routing, network topologies and hardware, SDN).
  • Expertise in designing, analyzing and troubleshooting large-scale distributed systems.
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • Ability to debug and optimize code and automate routine tasks.