Senior Site Reliability Engineer

EPAM Systems
Ruda Śląska, województwo śląskie
4 dni temu

We are searching for a Senior Site Reliability Engineer to join our Reliability Tooling team, where you'll play a pivotal role in designing, improving, and building scalable solutions to enhance system reliability.

As a respected expert within the team, you will contribute to technical strategy, mentor engineers, and advocate for SRE best practices to optimize service delivery and operational efficiency.

Responsibilities

  • Build tools to enable your team to identify and resolve infrastructure, platform, and application issues
  • Use Chaos Engineering methodologies to test reliability of systems under real-world conditions
  • Deploy and manage modern cloud technologies leveraging Infrastructure as Code and self-healing patterns
  • Develop effective telemetry, alerts, and automated responses to minimize Mean Time to Recovery (MTTR)
  • Provide technical guidance and expertise across team collaborations
  • Develop frameworks and practices for sustainable incident response via blameless postmortems and SRE methods
  • Identify reliability and operational inefficiencies to promote continuous improvement
  • Write code that enhances scalability, security, and maintainability of critical systems
  • Foster team involvement in delivering thoughtful and high-quality software solutions
  • Mentor team members in core SRE principles to support professional development

Requirements

  • 3+ years of hands-on experience in SRE, DevOps, systems engineering, or software engineering
  • Strong communication skills, both written and verbal
  • Enthusiasm for learning and exploration in leveraging new technologies
  • Expertise in Cloud/PaaS/SaaS tools and platforms (e.g. AWS, Azure, GCP)
  • Proficiency in container technologies within enterprise environments (e.g. Docker, Kubernetes, AWS ECS and EKS)
  • Background in programming languages (Python, Go, Rust, or similar)

Nice to have

  • Familiarity with DevOps methodologies and SRE principles
  • Background in monitoring and observability solutions
  • Capability to work with automation tools like Terraform, CloudFormation, or Ansible within Infrastructure as Code practices
  • Understanding of Service Level Objectives and error budgets
  • Experience with scalable software development in languages such as Java or Scala

We offer

  • We gather like-minded people:
    • Engineering community of industry professionals
    • Friendly team and enjoyable working environment
    • Flexible schedule and opportunity to work remotely within Poland
    • Chance to work abroad for up to 60 days annually
    • Business-driven relocation opportunities
  • We provide growth opportunities:
    • Outstanding career roadmap
    • Leadership development, career advising, soft skills, and well-being programs
    • Certification (GCP, Azure, AWS)
    • Unlimited access to LinkedIn Learning, Get Abstract, Cloud Guru
    • English classes
  • We cover it all:
    • Stable income (Employment Contract or B2B)
    • Participation in the Employee Stock Purchase Plan
    • Benefits package (health insurance, multisport, shopping vouchers)
    • Strategically located offices featuring entertainment and relaxation zones, table tennis and football, free snacks, fantastic coffee, and more
    • Referral bonuses
    • Corporate, social and well-being events
  • Please, note:
    • The set of bonuses might vary based on the role you apply for – specifics will be discussed with our recruiter during the general interview.
    • We will reach out to selected candidates exclusively.

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Apply
Other Job Recommendations:

Product Reliability Engineer

Infotree Global Solutions
Polska
  • Collaborate closely with Product Development, Manufacturing,...
  • Strong interpersonal and communication skills to work...
4 dni temu

Senior Electrical Engineer for Quality Department

ABB
powiat zgierski, województwo łódzkie
  • Propose design and specification changes to eliminate the...
  • Collaborate with global ABB experts from system design,...
1 tydzień temu

Tester / Delivery Engineer - Embedded Systems

Teleste
Wrocław, województwo dolnośląskie
If you enjoy working close to hardware, configuring systems, and testing software in real-world scenarios — this role is for you...
2 dni temu

Senior Full-Stack Software Engineer (MEAN)

Exadel
Warsaw, województwo mazowieckie
  • Develop the front and back end of the working project
  • Bring energy and passion to your work day in and day out, be...
1 tydzień temu

Senior Design Engineer

Danfoss
Grodzisk Mazowiecki County, województwo mazowieckie
  • Product Lifecycle Management: Oversee the entire lifecycle...
  • Cross-Functional Collaboration: Work closely with R&D,...
3 dni temu

Site Reliability Engineer - remote within EMEA

Printify
Polska
  • You will be responsible for the design and execution of...
  • Maintain response and resolution of cases/tickets received...
4 dni temu

Lead Quality Assurance Engineer

Branchspace Limited
powiat lubelski, województwo lubelskie
  • Partner closely with Product Teams to deliver testable,...
  • Evolve and ensure effective review of quality metrics to...
1 dzień temu

Senior Software Engineer

Fairmatic
Wrocław, województwo dolnośląskie
  • 5+ years of hands-on full-stack development experience
  • BSc in Computer Science, Mathematics, or a related field...
3 tygodnie temu

DFM Engineer

Xometry
Polska
  • Coordinate the development and launch of the product into...
  • Implement all changes to the production system related to...
3 tygodnie temu