About PubNub
Join PubNub, a pioneer in weaving an interconnected world with compelling real-time experiences. Founded in 2010 and utilized by more than 2,000 leading companies, including Verizon, Autodesk, Monsanto, Zillow, and DropBox, our mission is to revolutionize real-time online interactions by providing an unparalleled platform enabling product and development teams to build, manage, and optimize real-time solutions in their applications. With over $130M raised from notable investors and a global influence, we are at the forefront of shaping how the world connects and interacts digitally.
At PubNub, you'll enjoy the freedom of a remote-first environment paired with a collaborative culture designed to support your unique work style—so you can do your best work, wherever you are.
About the Job
Join PubNub's Site Reliability Engineering team and grow your career while helping operate the real-time data streaming network that powers chat, IoT, live updates, and interactive experiences for companies worldwide. As an SRE, you'll transition from executing operational tasks to contributing meaningfully to system design and reliability improvements. Working closely with senior engineers and cross-functional teams, you'll gain hands-on experience maintaining systems that deliver millisecond-level performance while processing nearly a billion requests per minute during peak events. You'll develop both technical expertise and operational maturity while contributing to new functionality and performance optimizations that keep us at the forefront of real-time technology.
This role offers the opportunity to build foundational SRE skills across the full technology stack - from infrastructure and automation to incident leadership and business impact analysis. You'll learn to balance immediate operational needs with long-term system improvements, developing the broad system thinking that defines successful Site Reliability Engineers.
Responsibilities
- Support the design, maintenance, and continuous improvement of highly available systems that can scale to support up to 100 million concurrent connections globally
- Operate global infrastructure across 15+ data centers using Infrastructure as Code tools including Terraform, Kubernetes, and ArgoCD for automated operations
- Work with observability tools including VictoriaMetrics, Grafana, and Loki to monitor system health and identify performance issues and optimization opportunities
- Create and maintain technical documentation, runbooks, and contribute to knowledge sharing across the team
- Collaborate with service architects and developers to help evaluate, implement, and adopt new technologies that improve stability and performance
- Participate in incident response efforts under mentorship, conducting post-incident reviews and root cause analysis to enhance system reliability and prevent future occurrences
- Create practical automation scripts to reduce operational toil and improve infrastructure management efficiency
About You
- 1-4 years of Site Reliability Engineering experience preferred, or equivalent production operations experience in DevOps, Infrastructure Engineering, Platform Engineering, or related roles
Required experience:
- Container technology proficiency with Docker fundamentals and basic Kubernetes operations (pods, services, deployments) at a minimum
- Cloud platform experience with working knowledge of at least one major platform (AWS preferred; GCP or Azure also accepted)
- Basic understanding of monitoring and observability concepts with some practical experience using tools like VictoriaMetrics, Grafana, and Loki
- Demonstrated commitment to continuous learning and professional growth with eagerness to expand technical skills beyond your current level
Recommended:
- Experience with Infrastructure as Code practices and CI/CD concepts; familiarity with tools such as Terraform, ArgoCD, and GitOps workflows preferred
- Some experience with incident response and troubleshooting, with ability to create and maintain technical documentation and runbooks
- Practical automation experience using Python and/or Bash for operational tasks, system administration, and infrastructure management
- Solid problem-solving skills and attention to detail
- Collaborative mindset with understanding of business impact
- Ability to work independently while knowing when to escalate
Why PubNub?
At PubNub, you'll help power real-time experiences used by millions around the world. What sets us apart is our people-first culture and strong sense of collaboration. Our employees value the opportunity to contribute meaningfully, grow their skills, and help build innovative, high-impact technology.
We offer competitive compensation of PLN14,000 to 20,300 per month on a B2B contract.
We're deeply committed to your personal growth and to recognizing your contributions. As an Equal Employment Opportunity (EEO) employer, we take pride in fostering a diverse and inclusive workplace where everyone can thrive.
Join us at PubNub to help revolutionize real-time communication and contribute to a more connected future. Here, your role is more than just a job—it's a chance to create meaningful, extraordinary experiences.