Site Reliability Engineer
This job is no longer accepting applications.
The Web3 Foundation nurtures and stewards technologies and applications for the decentralized web. We collaborate with cutting-edge developer teams, researchers and community leaders to strengthen and extend the uses of decentralized technologies; building the future of identity, privacy, financial markets, commerce, and more.
Our core project at the moment is Polkadot, a protocol designed to connect blockchains and allow them to interoperate, share security and more. In addition, we are supporting the development of a number of technologies that form the Web 3.0 tech stack - such as decentralized messaging, distributed storage or secret management.
Do you like automation? Do you know Linux like the back of your hand? Do you want to work on creating the Web 3.0 infrastructure? Are you excited about joining a startup? We are looking for a team member whose mastery of Linux, Github, Docker and Kubernetes is second to none.
Then come join the Web3 Foundation team in Zug as a
Site Reliability Engineer
This position is based in Zug, Switzerland or Berlin, Germany, but for exceptional candidates, we may consider remote work in Europe.
- Participate in on-call rotation, failure resolution, post-mortem analysis and prevention through automation.
- Maintenance of the 1000 validator program of Polkadot and Kusama
- Assist teams on making the platform components production-ready and provide support on IT-related issues.
- Take ownership of the different infrastructure-as-code components that build up our platform, adapting them to the evolution of the given requirements.
- Define automated tools to help the products be timely adapted to end-user requirements, including CI/CD pipelines.
- Continuously improve the observability of the system and the feedback loops for getting information about potential problems before they happen.
- Design automated disaster-recovery mechanisms.
- Experience designing and maintaining scalable, resilient, performant and observable systems.
- On-call experience: participation in ops-duty rotations, incident response and post-mortem analysis and prevention.
- A systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- Solid background in software development and experience with one or more of the following; Nodejs, Typescript, Rust, Python and Bash Scripting
- Experience or willingness to learn about cloud-native technologies: kubernetes as platform, Helm as package manager, prometheus and related technologies as the monitoring stack, all working on different providers such as Google Cloud Platform, AWS, DigitalOcean and Azure.
- Continuous Delivery experience.
- Familiarity with Database administration and a good understanding of networking
- Prometheus, alertmanager and grafana: create service monitors, write alert rules, create dashboards and panels.
- Experience using Terraform and Ansible with a test-driven infrastructure approach.
- Interest and background in decentralized technologies, especially blockchain.
- Competitive compensation and employee benefits
- Regular company retreats at unique locations located around Europe
- Opportunity to work in a multinational, high-performance team with diverse backgrounds (i.e. physics, computer science, machine-learning algorithm design, legal, financial products, management consulting, marketing & advertising, etc.)
To apply to this position, we ask you to answer a few questions in the application form, and to submit your CV and a cover letter, telling us a bit about yourself and your motivation to join us.
For more information about us, visit us on
Your application has been successfully submitted.