OneMain Financial Jobs

Job Information

CAI Data Engineer in REMOTE, India

Data Engineer

Req number:

R7299

Employment type:

Full time

Worksite flexibility:

Remote

Who we are

CAI is a global services firm with over 9,000 associates worldwide and a yearly revenue of $1.3 billion+. We have over 40 years of excellence in uniting talent and technology to power the possible for our clients, colleagues, and communities. As a privately held company, we have the freedom and focus to do what is right—whatever it takes. Our tailor-made solutions create lasting results across the public and commercial sectors, and we are trailblazers in bringing neurodiversity to the enterprise.

Job Summary

We are seeking a highly skilled Data Engineer to join our dynamic team. As a Data Engineer, you will play a crucial role in data quality and data governance. This role ensures the team has the right data, with the right quality, with the right controls - so model outcomes are dependable and reliable. Own the end-to-end AI data lifecycle - from governed ingestion to training/evaluation datasets, data quality gates, lineage, reproducibility, and run-time monitoring - using AWS + Databricks as the production backbone. This position will be full-time and Hybrid.

Job Description

We are seeking a Data Engineer who will Own the end-to-end AI data lifecycle - from governed ingestion to training/evaluation datasets, data quality gates, lineage, reproducibility, and run-time monitoring - using AWS + Databricks. This position will be Full-time and Hybrid position.

What You’ll Do

AI Data Strategy & Ownership (Operating Model)

Translate AI use cases into data requirements

  • Features, labels, context documents, metadata, refresh cadence, retention rules.

  • Define the “AI data products” needed for each solution (training set, evaluation set, inference inputs, reference corpora)

  • Develop and maintain an AI data roadmap aligned to the data product roadmap – specific for TE Sensors BU

Develop a data-strategy to tranform from a data-dashboard oriented organization to an AI-first model

  • Collaborating with our DIA Dashboard organization (Philippine spoke team)

  • Develop a data-strategy for our TE Sensors internal databases (e.g. SBI)

Data Ingestion & Curation on AWS + Databricks

  • Build and operate robust ingestion pipelines from enterprise sources into AWS + Databricks:

  • Ensure data pipelines are:

  • Incremental (cost-aware)

  • Observed (metrics & logs)

  • Reliable (SLAs for freshness and completeness)

Establish BU-oriented AI Data Governance (Unity Catalog + AWS controls)

  • Leverage Databricks Unity Catalog for table, column, and row-level controls

  • Implement classification & handling standards

  • PII/PCI/Confidential tagging

  • Retention and deletion rules (e.g., right-to-delete)

  • Audit trails and access logging-

  • Define and maintain data contracts with source owners for schema, semantics, quality SLAs, and change processes

Data Quality Engineering (Hard Gates for AI Readiness)

  • Define data quality dimensions and SLAs (AI-specific):

  • Completeness, consistency, timeliness, uniqueness

  • Distribution stability (for drift-sensitive features)

  • Implement automated quality checks:

  • Schema validation (breaking changes)

  • Null/missingness thresholds

  • Referential integrity

  • Distribution checks (mean/variance, quantiles, KL divergence where appropriate)

Consider data quality dashboards & alerting:

  • Pipeline failures and/or data freshness breaches

  • Quality test failures (e.g. Block training or deployment when critical checks fail)

Performance & Cost Optimization (AWS + Databricks economics)

  • Optimize data storage and compute:

  • Partitioning strategies and file sizing

  • Delta optimization/compaction strategy

  • Cluster sizing, autoscaling, job scheduling

  • Ensure cost transparency

Production Operations & Support Readiness (Run Phase)

  • Provide operational artifacts and support

  • Runbooks (pipeline recovery, backfills, reprocessing)

  • On-call / escalation participation for data incidents

  • Root cause analysis for quality issues

  • Ensure observability via SLAs/health checks for critical pipelines

What You'll Need

Required

EDUCATION/KNOWLEDGE

Bacholor degree: Computer Science, Software Engineering, Data Science, Artificial Intelligence / Machine Learning, Applied Mathematics or Engineering (with strong CS content)

QUALIFICATIONS & EXPERIENCE

  • Data Engineering & Data Management

  • AI / ML Data Foundations

  • Data Quality Engineering

  • Cloud & Platform Fundamentals

  • Platform-Specific Qualifications (Databricks + AWS)

  • Certifications (Optional but highly valuable)

  • Databricks

  • Databricks Data Engineer Professional

  • Databricks Machine Learning Professional

  • AWS

  • AWS Certified Data Analytics – Specialty

  • AWS Solutions Architect (Associate/Professional)

5+ years of overall experience

Reasonable accommodation statement

If you require a reasonable accommodation in completing this application, interviewing, completing any pre-employment testing, or otherwise participating in the employment selection process, please direct your inquiries to application.accommodations@cai.io or (888) 824 – 8111.

DirectEmployers