We are searching for a Senior Site Reliability Engineer to join our Reliability Tooling team, where you'll play a pivotal role in designing, improving, and building scalable solutions to enhance system reliability.

As a respected expert within the team, you will contribute to technical strategy, mentor engineers, and advocate for SRE best practices to optimize service delivery and operational efficiency.

Responsibilities

Build tools to enable your team to identify and resolve infrastructure, platform, and application issues
Use Chaos Engineering methodologies to test reliability of systems under real-world conditions
Deploy and manage modern cloud technologies leveraging Infrastructure as Code and self-healing patterns
Develop effective telemetry, alerts, and automated responses to minimize Mean Time to Recovery (MTTR)
Provide technical guidance and expertise across team collaborations
Develop frameworks and practices for sustainable incident response via blameless postmortems and SRE methods
Identify reliability and operational inefficiencies to promote continuous improvement
Write code that enhances scalability, security, and maintainability of critical systems
Foster team involvement in delivering thoughtful and high-quality software solutions
Mentor team members in core SRE principles to support professional development

Requirements

3+ years of hands-on experience in SRE, DevOps, systems engineering, or software engineering
Strong communication skills, both written and verbal
Enthusiasm for learning and exploration in leveraging new technologies
Expertise in Cloud/PaaS/SaaS tools and platforms (e.g. AWS, Azure, GCP)
Proficiency in container technologies within enterprise environments (e.g. Docker, Kubernetes, AWS ECS and EKS)
Background in programming languages (Python, Go, Rust, or similar)

Nice to have

Familiarity with DevOps methodologies and SRE principles
Background in monitoring and observability solutions
Capability to work with automation tools like Terraform, CloudFormation, or Ansible within Infrastructure as Code practices
Understanding of Service Level Objectives and error budgets
Experience with scalable software development in languages such as Java or Scala

We offer

We gather like-minded people:
- Engineering community of industry professionals
- Friendly team and enjoyable working environment
- Flexible schedule and opportunity to work remotely within Poland
- Chance to work abroad for up to 60 days annually
- Business-driven relocation opportunities
We provide growth opportunities:
- Outstanding career roadmap
- Leadership development, career advising, soft skills, and well-being programs
- Certification (GCP, Azure, AWS)
- Unlimited access to LinkedIn Learning, Get Abstract, Cloud Guru
- English classes
We cover it all:
- Stable income (Employment Contract or B2B)
- Participation in the Employee Stock Purchase Plan
- Benefits package (health insurance, multisport, shopping vouchers)
- Strategically located offices featuring entertainment and relaxation zones, table tennis and football, free snacks, fantastic coffee, and more
- Referral bonuses
- Corporate, social and well-being events
Please, note:
- The set of bonuses might vary based on the role you apply for – specifics will be discussed with our recruiter during the general interview.
- We will reach out to selected candidates exclusively.

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Senior Site Reliability Engineer

Product Reliability Engineer

Senior Electrical Engineer for Quality Department

Tester / Delivery Engineer - Embedded Systems

Senior Full-Stack Software Engineer (MEAN)

Senior Design Engineer

Site Reliability Engineer - remote within EMEA

Lead Quality Assurance Engineer

Senior Software Engineer

DFM Engineer