Job Information
Publicis Groupe Site Reliability Engineer in Bengaluru, India
Overview
About Business Unit:
At the core of all that Epsilon does is a team that sets the foundation of our IT infrastructure. The team drives innovation and efficiency through pioneering technology across Epsilon's platforms and business verticals. From being the first point of contact for infrastructure needs to final deployment, the team provides end-to-end solutions for our client-facing platforms. ETS supports all aspects of revenue-generating platforms for Epsilon and sets the architectural direction for our enterprise deployments. By adopting the newest technologies, such as Cloud, Automation, and Artificial Intelligence, the team is at the front of redefining our digital business and capturing new opportunities.
We are looking for a highly experienced and forward-thinking Site Reliability Engineer (SRE) to lead and evolve our infrastructure platforms—spanning over 15,000+ on-premises servers and a growing multi-cloud environment.
The ideal candidate should embody an automation-first mindset , a strong grasp of cloud engineering , and a passion for building AI-driven, agentic systems that can self-heal, self-optimize, and provide deep observability.
Click here to view how Epsilon transforms marketing with 1 View, 1 Vision and 1 Voice. (https://www.youtube.com/watch?v=xpjtfpntuv8&t=1s)
Responsibilities
Lead SRE initiatives across a hybrid infrastructure (on-prem + AWS, Azure, GCP)
Automate operations tasks across Linux servers.
Ready to work in morning and afternoon shifts during IST hours.
Collaborate and coordinate with the Onsite team on projects and initiatives.
Ability to drive projects and proactively identify improvements and automations.
Work on automation by creating n8n workflows and create integrations across our tech stack
Build self-service platform using Backstage and write integrations across different products
Architect and support scalable, resilient AWS infrastructure (EKS, EC2, S3, RDS, Lambda, etc.)
Administer Kubernetes clusters at scale; ensure health, upgrades, and secure deployments
Drive infrastructure automation using Python, Shell, and Infrastructure as Code (Terraform, Ansible)
Design and implement AI agents for observability, RCA, and incident triage using modern MLOps/DevOps paradigms
Build robust monitoring/alerting pipelines using Grafana, Prometheus, ELK, PagerDuty, or similar tools
Participate in and improve on-call rotations , while building out self-healing systems
Lead root cause analysis (RCA) exercises and post-incident reviews
Participate in on-call rotation & work in morning/evening shifts.
Qualifications
3 - 5 years of experience in Platform/Cloud Engineering, SRE, DevOps
Strong hands-on coding experience in Python, Shell, Terraform
Strong expertise in Cloud, Kubernetes, Linux Administration
Hands-on experience with AWS services and Kubernetes
Proficiency in IAC tools like Terraform, Ansible
Experience in delivering efficient developer experience
Knowledge in building CI/CD pipelines
Familiarity with monitoring tools (Zabbix, PagerDuty, Grafana, ELK).
Additional Information
Epsilon is a global data, technology and services company that powers the marketing and advertising ecosystem. For decades, we’ve provided marketers from the world’s leading brands the data, technology and services they need to engage consumers with 1 View, 1 Vision and 1 Voice. 1 View of their universe of potential buyers. 1 Vision for engaging each individual. And 1 Voice to harmonize engagement across paid, owned and earned channels.
Epsilon’s comprehensive portfolio of capabilities across our suite of digital media, messaging and loyalty solutions bridge the divide between marketing and advertising technology. We process 400+ billion consumer actions each day using advanced AI and hold many patents of proprietary technology, including real-time modeling languages and consumer privacy advancements. Thanks to the work of every employee, Epsilon has been consistently recognized as industry-leading by Forrester, Adweek and the MRC. Epsilon is a global company with more than 9,000 employees around the world.
Our pillars aren't just words. They're how we show up every day.
People centricity: We focus on employee well-being in an environment where colleagues truly care about each other.
Collaboration: We work together, support one another, and collectively achieve goals.
Growth: There are endless opportunities for growth through learning, development and career advancement.
Innovation: We drive progress through cutting-edge solutions and forward-thinking approaches.
Flexibility: We’ve created a balance between work and personal life, and we encourage adaptability to solve problems creatively.
Our values guide us to create value for our clients, our people and consumers.
Act with integrity
Work together to win together
Innovate with purpose
Respect all voices
Empower with accountability
These pillars and values are our foundation—shaping our culture, guiding our decisions, and uniting us in common purpose.
Epsilon is an Equal Opportunity Employer.
Epsilon is committed to promoting diversity, inclusion, and equal employment opportunities by using reasonable efforts to attract, recruit, engage and retain qualified individuals of all ethnicities and backgrounds, including, but not limited to, women, people of color, LGBTQ individuals, people with disabilities and any other underrepresented groups, traits or characteristics.