Site Reliability Engineer/Architect (SRE)

EPAM Systems
Ruda Śląska, województwo śląskie
1 tydzień temu

We are seeking an experienced and accomplished Site Reliability Engineer/Architect (SRE) to join our dynamic, fast-paced team.

In this pivotal leadership role, you will be entrusted with architecting and implementing advanced SRE practices to ensure the reliability, scalability, and efficiency of our Generative AI (GenAI) enablement platform for enterprise use cases. The position offers a unique opportunity to work with cutting-edge technologies, collaborate with peers to drive technical excellence, and shape the operational strategy for an enterprise-grade, multi-cloud platform.

Responsibilities

  • Define and implement SRE principles, frameworks, and methodologies to ensure platform reliability and stability
  • Establish Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to create measurable reliability goals aligned with business objectives
  • Collaborate effectively with stakeholders, including senior leadership, to align the SRE vision with the overall technical and organizational strategy
  • Architect resilient systems by adopting innovative practices such as canary deployments, shadow traffic, and testing in production environments
  • Ensure uninterrupted operational reliability for a multi-cloud, multi-tenant enterprise platform
  • Optimize incident response practices and tools to ensure efficiency and effectiveness, implementing automated solutions where appropriate
  • Implement robust logging, tracing, and monitoring systems to provide real-time insight, detect faults, and optimize performance proactively
  • Collaborate with engineering teams to integrate observability frameworks into platform components, improving deployment and runtime confidence
  • Spearhead automation initiatives to reduce manual operational tasks and improve system scalability
  • Foster a strong culture of operational excellence through thought leadership and mentorship, promoting an SRE-first mindset within all teams
  • Collaborate with engineering and product teams to craft scalable designs with reliability embedded throughout the software development lifecycle
  • Build partnerships with Director-level leadership to conceptualize, prioritize, and deliver on long-term SRE goals

Requirements

  • A minimum of 7 years of professional experience in site reliability engineering, software engineering, or DevOps roles
  • Strong coding skills in languages such as Python, Go, or Java, with the ability to implement solutions to algorithmic challenges
  • Proven expertise in designing and managing multi-cloud environments (e.g., AWS, Azure, GCP), distributed systems, and multi-tenant architectures
  • Knowledge of CI/CD pipelines, microservices, and containerization technologies like Kubernetes, Docker, and Helm
  • Background in monitoring and observability tools like Prometheus, Grafana, OpenTelemetry, or Dynatrace
  • Competency in incident management and production troubleshooting using tools like PagerDuty or similar
  • Solid understanding of modern SRE concepts, including SLIs, SLOs, fault injection, and canary releases
  • Familiarity with security best practices for cloud-native architectures and multi-tenant platforms

Nice to have

  • Knowledge of cloud platforms such as AWS, Azure, or GCP, with experience applying multi-cloud strategies
  • Background in the fundamentals of Generative AI technologies and related workflows

We offer

  • We gather like-minded people:
    • Engineering community of industry professionals
    • Friendly team and enjoyable working environment
    • Flexible schedule and opportunity to work remotely within Poland
    • Chance to work abroad for up to 60 days annually
    • Business-driven relocation opportunities
  • We provide growth opportunities:
    • Outstanding career roadmap
    • Leadership development, career advising, soft skills, and well-being programs
    • Certification (GCP, Azure, AWS)
    • Unlimited access to LinkedIn Learning, Get Abstract, Cloud Guru
    • English classes
  • We cover it all:
    • Stable income (Employment Contract or B2B)
    • Participation in the Employee Stock Purchase Plan
    • Benefits package (health insurance, multisport, shopping vouchers)
    • Strategically located offices featuring entertainment and relaxation zones, table tennis and football, free snacks, fantastic coffee, and more
    • Referral bonuses
    • Corporate, social and well-being events
  • Please, note:
    • The set of bonuses might vary based on the role you apply for – specifics will be discussed with our recruiter during the general interview.
    • We will reach out to selected candidates exclusively.

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

Apply
Other Job Recommendations:

SRE Engineering Manager

Sauce Labs
Warsaw, województwo mazowieckie
  • Participate in the recruitment, interviewing, and hiring of...
  • Drive the adoption and implementation of SRE best practices,...
1 tydzień temu

Senior Site Reliability Engineer

EPAM Systems
Ruda Śląska, województwo śląskie
  • Develop effective telemetry, alerts, and automated...
  • Provide technical guidance and expertise across team...
3 tygodnie temu

Full Stack Engineer (m/f/d)

ZF
Warsaw, województwo mazowieckie
  • Work with the Product team to design and build products from...
  • Explore and adapt to new technologies as required...
4 dni temu

Senior Full Stack Engineer (m/f/d)

ZF
Warsaw, województwo mazowieckie
  • Work with the Product team to design and build products from...
  • Explore and adapt to new technologies as required...
4 dni temu

Tech Lead, Senior Site Reliability Engineer

Google
Warsaw, województwo mazowieckie
  • Experience working in computing, distributed systems,...
  • Experience in designing, analyzing, and troubleshooting...
5 dni temu

Senior AI Engineer

Procter & Gamble
Warsaw, województwo mazowieckie
Senior AI Engineer at P&G partners with data scientists, data managers, analysts, infrastructure engineers, and peer AI...
2 tygodnie temu

DevOps Engineer (Senior)

VIRTUSLAB
powiat lubelski, województwo lubelskie
We’re part of a long-term engineering partnership with a Swiss digital wallet provider, active in crypto and investment fund...
1 tydzień temu

Product Reliability Engineer

Infotree Global Solutions
Polska
  • Collaborate closely with Product Development, Manufacturing,...
  • Strong interpersonal and communication skills to work...
3 tygodnie temu

Tester / Delivery Engineer - Embedded Systems

Teleste
Wrocław, województwo dolnośląskie
If you enjoy working close to hardware, configuring systems, and testing software in real-world scenarios — this role is for you...
2 tygodnie temu

Senior QA Automation Engineer

zero effort nonbank (ZEN)
powiat lubelski, województwo lubelskie
  • 5 years of work experience in software testing
  • Solid experience in test automation
  • Good knowledge of Java...
1 dzień temu