Job Information
CAI Data Engineer in REMOTE, India
Data Engineer
Req number:
R7299
Employment type:
Full time
Worksite flexibility:
Remote
Who we are
CAI is a global services firm with over 9,000 associates worldwide and a yearly revenue of $1.3 billion+. We have over 40 years of excellence in uniting talent and technology to power the possible for our clients, colleagues, and communities. As a privately held company, we have the freedom and focus to do what is right—whatever it takes. Our tailor-made solutions create lasting results across the public and commercial sectors, and we are trailblazers in bringing neurodiversity to the enterprise.
Job Summary
We are seeking a highly skilled Data Engineer to join our dynamic team. As a Data Engineer, you will play a crucial role in data quality and data governance. This role ensures the team has the right data, with the right quality, with the right controls - so model outcomes are dependable and reliable. Own the end-to-end AI data lifecycle - from governed ingestion to training/evaluation datasets, data quality gates, lineage, reproducibility, and run-time monitoring - using AWS + Databricks as the production backbone. This position will be full-time and Hybrid.
Job Description
We are seeking a Data Engineer who will Own the end-to-end AI data lifecycle - from governed ingestion to training/evaluation datasets, data quality gates, lineage, reproducibility, and run-time monitoring - using AWS + Databricks. This position will be Full-time and Hybrid position.
What You’ll Do
AI Data Strategy & Ownership (Operating Model)
Translate AI use cases into data requirements
Features, labels, context documents, metadata, refresh cadence, retention rules.
Define the “AI data products” needed for each solution (training set, evaluation set, inference inputs, reference corpora)
Develop and maintain an AI data roadmap aligned to the data product roadmap – specific for TE Sensors BU
Develop a data-strategy to tranform from a data-dashboard oriented organization to an AI-first model
Collaborating with our DIA Dashboard organization (Philippine spoke team)
Develop a data-strategy for our TE Sensors internal databases (e.g. SBI)
Data Ingestion & Curation on AWS + Databricks
Build and operate robust ingestion pipelines from enterprise sources into AWS + Databricks:
Ensure data pipelines are:
Incremental (cost-aware)
Observed (metrics & logs)
Reliable (SLAs for freshness and completeness)
Establish BU-oriented AI Data Governance (Unity Catalog + AWS controls)
Leverage Databricks Unity Catalog for table, column, and row-level controls
Implement classification & handling standards
PII/PCI/Confidential tagging
Retention and deletion rules (e.g., right-to-delete)
Audit trails and access logging-
Define and maintain data contracts with source owners for schema, semantics, quality SLAs, and change processes
Data Quality Engineering (Hard Gates for AI Readiness)
Define data quality dimensions and SLAs (AI-specific):
Completeness, consistency, timeliness, uniqueness
Distribution stability (for drift-sensitive features)
Implement automated quality checks:
Schema validation (breaking changes)
Null/missingness thresholds
Referential integrity
Distribution checks (mean/variance, quantiles, KL divergence where appropriate)
Consider data quality dashboards & alerting:
Pipeline failures and/or data freshness breaches
Quality test failures (e.g. Block training or deployment when critical checks fail)
Performance & Cost Optimization (AWS + Databricks economics)
Optimize data storage and compute:
Partitioning strategies and file sizing
Delta optimization/compaction strategy
Cluster sizing, autoscaling, job scheduling
Ensure cost transparency
Production Operations & Support Readiness (Run Phase)
Provide operational artifacts and support
Runbooks (pipeline recovery, backfills, reprocessing)
On-call / escalation participation for data incidents
Root cause analysis for quality issues
Ensure observability via SLAs/health checks for critical pipelines
What You'll Need
Required
EDUCATION/KNOWLEDGE
Bacholor degree: Computer Science, Software Engineering, Data Science, Artificial Intelligence / Machine Learning, Applied Mathematics or Engineering (with strong CS content)
QUALIFICATIONS & EXPERIENCE
Data Engineering & Data Management
AI / ML Data Foundations
Data Quality Engineering
Cloud & Platform Fundamentals
Platform-Specific Qualifications (Databricks + AWS)
Certifications (Optional but highly valuable)
Databricks
Databricks Data Engineer Professional
Databricks Machine Learning Professional
AWS
AWS Certified Data Analytics – Specialty
AWS Solutions Architect (Associate/Professional)
5+ years of overall experience
Reasonable accommodation statement
If you require a reasonable accommodation in completing this application, interviewing, completing any pre-employment testing, or otherwise participating in the employment selection process, please direct your inquiries to application.accommodations@cai.io or (888) 824 – 8111.