About PubNub

Join PubNub, a pioneer in weaving an interconnected world with compelling real-time experiences. Founded in 2010 and utilized by more than 2,000 leading companies, including Verizon, Autodesk, Monsanto, Zillow, and DropBox, our mission is to revolutionize real-time online interactions by providing an unparalleled platform enabling product and development teams to build, manage, and optimize real-time solutions in their applications. With over $130M raised from notable investors and a global influence, we are at the forefront of shaping how the world connects and interacts digitally.

At PubNub, you'll enjoy the freedom of a remote-first environment paired with a collaborative culture designed to support your unique work style—so you can do your best work, wherever you are.

About the Job

Join PubNub's Site Reliability Engineering team and grow your career while helping operate the real-time data streaming network that powers chat, IoT, live updates, and interactive experiences for companies worldwide. As an SRE, you'll transition from executing operational tasks to contributing meaningfully to system design and reliability improvements. Working closely with senior engineers and cross-functional teams, you'll gain hands-on experience maintaining systems that deliver millisecond-level performance while processing nearly a billion requests per minute during peak events. You'll develop both technical expertise and operational maturity while contributing to new functionality and performance optimizations that keep us at the forefront of real-time technology.

This role offers the opportunity to build foundational SRE skills across the full technology stack - from infrastructure and automation to incident leadership and business impact analysis. You'll learn to balance immediate operational needs with long-term system improvements, developing the broad system thinking that defines successful Site Reliability Engineers.

Responsibilities

Support the design, maintenance, and continuous improvement of highly available systems that can scale to support up to 100 million concurrent connections globally
Operate global infrastructure across 15+ data centers using Infrastructure as Code tools including Terraform, Kubernetes, and ArgoCD for automated operations
Work with observability tools including VictoriaMetrics, Grafana, and Loki to monitor system health and identify performance issues and optimization opportunities
Create and maintain technical documentation, runbooks, and contribute to knowledge sharing across the team
Collaborate with service architects and developers to help evaluate, implement, and adopt new technologies that improve stability and performance
Participate in incident response efforts under mentorship, conducting post-incident reviews and root cause analysis to enhance system reliability and prevent future occurrences
Create practical automation scripts to reduce operational toil and improve infrastructure management efficiency

About You

1-4 years of Site Reliability Engineering experience preferred, or equivalent production operations experience in DevOps, Infrastructure Engineering, Platform Engineering, or related roles

Required experience:

Container technology proficiency with Docker fundamentals and basic Kubernetes operations (pods, services, deployments) at a minimum
Cloud platform experience with working knowledge of at least one major platform (AWS preferred; GCP or Azure also accepted)
Basic understanding of monitoring and observability concepts with some practical experience using tools like VictoriaMetrics, Grafana, and Loki
Demonstrated commitment to continuous learning and professional growth with eagerness to expand technical skills beyond your current level

Recommended:

Experience with Infrastructure as Code practices and CI/CD concepts; familiarity with tools such as Terraform, ArgoCD, and GitOps workflows preferred
Some experience with incident response and troubleshooting, with ability to create and maintain technical documentation and runbooks
Practical automation experience using Python and/or Bash for operational tasks, system administration, and infrastructure management
Solid problem-solving skills and attention to detail
Collaborative mindset with understanding of business impact
Ability to work independently while knowing when to escalate

Why PubNub?

At PubNub, you'll help power real-time experiences used by millions around the world. What sets us apart is our people-first culture and strong sense of collaboration. Our employees value the opportunity to contribute meaningfully, grow their skills, and help build innovative, high-impact technology.

We offer competitive compensation of PLN14,000 to 20,300 per month on a B2B contract.

We're deeply committed to your personal growth and to recognizing your contributions. As an Equal Employment Opportunity (EEO) employer, we take pride in fostering a diverse and inclusive workplace where everyone can thrive.

Join us at PubNub to help revolutionize real-time communication and contribute to a more connected future. Here, your role is more than just a job—it's a chance to create meaningful, extraordinary experiences.

Save Apply

Report job

Site Reliability Engineer

Senior Site Reliability Engineer

Full Stack Engineer (m/f/d)

Senior Full Stack Engineer (m/f/d)

Tech Lead, Senior Site Reliability Engineer

Tester / Delivery Engineer - Embedded Systems

Senior AI Engineer

DevOps Engineer (Senior)

Senior Software Java Engineer

Product Reliability Engineer

Kierownik sklepu