AI Site Reliability Engineer

Procter & Gamble
Warsaw, województwo mazowieckie
Full time
3 tygodnie temu

AI Site Reliability Engineering (SREs) is responsible for keeping all production systems running smoothly including some bug fixing. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation including AI to our operating environments and the P&G codebase.

SREs specialize in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability, and scalability, with varied interests in algorithms and distributed systems.

In this role, you'll be constantly learning, staying up to date with industry trends and emerging technologies in data solutions. You'll have the chance to work with a variety of tools and technologies, including big data platforms, AI and machine learning frameworks, and data visualization tools, to build innovative and effective solutions.

So, if you're excited about the possibilities of data, and eager to make a real impact in the world of business, a career in SRE team might be just what you're looking for. Join us and become a part of the future of digital transformation.

Key Responsibilities:

As a Site Reliability Engineer (SRE) at P&G, you will play a crucial role in ensuring the reliability, availability, and performance of our production systems. Your role will blend software engineering principles with operational discipline to create scalable and highly available systems. You will collaborate with development and operations teams to implement automation, optimize costs, and troubleshoot issues as they arise.

Key Responsibilities:

  • Oversee and maintain the smooth operation of production systems, ensuring high availability and reliability.

  • Design and implement automation solutions including AI Agents for routine operational tasks to enhance efficiency and reduce manual intervention.

  • Develop monitoring and observability dashboards and alerts to provide actionable insights into system health.

  • Develop and maintain automatic tests to ensure the quality and reliability of production systems.

  • Analyze system performance and resource utilization to identify opportunities for cost optimization.

  • Work with teams to implement best practices for resource allocation and cost-effective architecture.

  • Lead post-incident reviews to identify improvements in processes and systems.

  • Participate in the change management process to facilitate seamless production deployments.

  • Plan, execute, and monitor production deployments to ensure minimal downtime and service disruption.

  • Collaborate with other teams to ensure proper deployment strategies and rollback mechanisms are in place.

Apply
Other Job Recommendations:

Principal Site Reliability Engineer

Groupon
Warsaw, województwo mazowieckie
  • Architect and maintain fault-tolerant systems, ensuring...
  • The opportunity to work with cutting-edge technologies in a...
3 dni temu

Site Reliability Engineer - Krakow, Poland

Telestream
Krakow, województwo małopolskie
For more than two decades, Telestream has been at the forefront of innovation in the digital video industry, pioneering file-based...
3 dni temu

Site Reliability Engineer

Telestream, LLC
powiat lubelski, województwo lubelskie
  • Design, implement, and maintain infrastructure on AWS and...
  • Manage Kubernetes clusters (K8s, Helm) for scalable and...
3 dni temu

Senior AI Engineer

Procter & Gamble
Warsaw, województwo mazowieckie
Senior AI Engineer at P&G partners with data scientists, data managers, analysts, infrastructure engineers, and peer AI...
2 tygodnie temu

Senior Site Reliability Engineer

EPAM Systems
Ruda Śląska, województwo śląskie
  • Develop effective telemetry, alerts, and automated...
  • Provide technical guidance and expertise across team...
3 tygodnie temu

Full Stack Engineer (m/f/d)

ZF
Warsaw, województwo mazowieckie
  • Work with the Product team to design and build products from...
  • Explore and adapt to new technologies as required...
1 tydzień temu

Senior Full Stack Engineer (m/f/d)

ZF
Warsaw, województwo mazowieckie
  • Work with the Product team to design and build products from...
  • Explore and adapt to new technologies as required...
1 tydzień temu

Product Reliability Engineer

Infotree Global Solutions
Polska
  • Collaborate closely with Product Development, Manufacturing,...
  • Strong interpersonal and communication skills to work...
3 tygodnie temu

VP of Engineering (AI Agents & NLP)

Red Sky
Warsaw, województwo mazowieckie
  • Collaborate cross-functionally with product, design, and...
  • Translate customer feedback into continuous product and...
4 tygodnie temu

Specjalistka/Specjalista ds. obsługi incydentów i współpracy z organami ścigania

PKO Bank Polski
Szczecin, województwo zachodniopomorskie
  • opracowujesz, aktualizujesz i pełnisz nadzór nad wdrożeniem...
  • weryfikujesz poprawność realizacji zadań przez podmioty...
1 tydzień temu