OneMain Financial Jobs

Job Information

Cognizant Site Reliability Engineer, AIOps in Toronto, ON-199 Bay Street, Ontario

About the role

As a Site Reliability Engineer, AIOps, you will make an impact by transforming how our production environments detect, respond to, and learn from operational issues. You will own the design and implementation of AI-driven observability pipelines, self-healing automation, and intelligent incident workflows that measurably improve system reliability across the organization. This is a hands-on engineering role for someone who thrives at the intersection of site reliability engineering, automation, and AI-powered operations. You will be a valued member of the Cloud Infrastructure and Security team and work collaboratively with the Manager.

In this role, you will:

• Implement and optimize monitoring solutions using Dynatrace, Splunk, and Moogsoft, leveraging AI/ML capabilities such as Davis AI, Splunk ITSI, and Moogsoft AIOps to detect anomalies, predict incidents, and reduce alert noise across distributed systems

• Design and build AI-powered operational workflows that automate incident detection, root cause analysis, remediation actions, and post-incident insights

• Configure and manage PagerDuty for intelligent alerting, escalation policies, and automated incident response

• Build self-healing automation and remediation playbooks using Ansible, Python, and GitHub Actions, triggered by AI-driven observability events

• Apply SRE principles including SLOs, SLIs, and error budgets to improve system reliability and eliminate operational toil

• Build and maintain CI/CD pipelines using Git and GitHub Actions that incorporate observability signals, AI-driven quality gates, and automated rollback workflows

• Develop Python-based tooling and integrations that connect monitoring platforms, ticketing systems, and automation engines

• Document runbooks, processes, and workflows for knowledge sharing and operational continuity

Required skills

• 8+ years of hands-on experience with Dynatrace (including Davis AI), Splunk, Moogsoft AIOps, PagerDuty, Ansible, Git and GitHub Actions, and Python scripting

• Proven experience leveraging AI/ML features within observability and incident management platforms for event correlation, predictive alerting, and automated remediation

• Strong understanding of distributed systems, cloud infrastructure, and reliability engineering

• Experience with SLO/SLI design, error budgets, and performance optimization

• Strong communication skills and ability to collaborate effectively across engineering teams

Preferred skills

• Experience with Red Hat OpenShift, Kubernetes, or Docker

• Exposure to LLM-based automation or generative AI for operational workflows

• Background in ChatOps frameworks or event-driven architecture

• Experience mentoring junior engineers or leading technical workstreams

• Background in IT operations or managed services environments

We're eager to meet people who share our mission and can make an impact in various ways. Don't hesitate to apply, even if you only meet the required skills listed. Your transferable skills and experiences matter—help us see how you the right person for this role.

Total compensation

We regularly assess market data to ensure we offer a competitive compensation package for our associates. The base salary for this position ranges between CAD $94,000 to $110,000 per year. Where the successful candidate may fall within the range depends on relevant education, work and/or management experience and other business-related and job-necessary qualifications. This position is also eligible for Cognizant’s discretionary annual performance-based bonus, as well as benefits that support your physical, mental and financial wellbeing.

Working arrangements

We believe hybrid work is the way forward as we strive to provide flexibility wherever possible. Based on this role’s business requirements, this is a hybrid position requiring 4 days a week in a client or Cognizant office in Toronto, Ontario. Regardless of your working arrangement, we are here to support a healthy work-life balance through our various wellbeing programs.

The working arrangements for this role are accurate as of the date of posting. This may change based on the project you’re engaged in, as well as business and client requirements. Rest assured; we will always be clear about role expectations.

Cognizant will only consider applicants for this position who are legally authorized to work in Canada without requiring employer sponsorship, now or at any time in the future.

Applications for this position are reviewed by our recruitment team without the use of artificial intelligence screening tools.

Post closing date

Applications will be accepted until April 24, 2026.

Cognizant is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected Veteran status, age, or any other characteristic protected by law.

DirectEmployers