Job Information
UnitedHealth Group Principal Data Scientist in Gurugram, India
Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.
The Principal Data Scientist ML/DL is a primary driver in the design and development of state-of-the-art Artificial Intelligence solutions for medical applications. The Principal Data Scientist ML/DL works closely with senior data scientists, machine learning engineers, software engineers and subject matter experts on current company technologies and forward-looking projects. They are the drivers of new research and solution implementation in the creation of novel artificial intelligence approaches.
Primary responsibilities include the enhancement of existing company NLP technologies and extension of those systems in new cloud-based applications. Emphasis is on development of novel machine/deep learning techniques for information extraction and synthesis. They translate research code into clinical NLP solutions deployed at scale in production environments including statistical methods, deep learning, and large language model technologies. Work will involve all aspects of methods development from initial PoC implementation to performance characterization and production launch of new methods.
The successful candidate will have a solid history of publication in Machine/Deep Learning with an emphasis on Natural Language Processing, Information Retrieval and/or Information Extraction. Exposure to recent research literature and the ability to effectively implement new technologies is key. The successful candidate will have proven success in taking machine/deep learning solutions to production environments. Solid technical skills are required.
Primary Responsibilities:
Lead end-to-end training and fine-tuning of Large Language Models (LLMs), including both open-source (e.g., Qwen, LLaMA, Mistral) and closed-source (e.g., OpenAI, Gemini, Anthropic) ecosystems
Architect and implement GraphRAG pipelines, including knowledge graph representation and retrieval for enhanced contextual grounding.
Design, train, and optimize semantic and dense vector embeddings for document understanding, search, and retrieval.
Develop semantic retrieval systems with advanced document segmentation and indexing strategies.
Build and scale distributed training environments using NCCL and InfiniBand for multi-GPU and multi-node training.
Apply reinforcement learning techniques (e.g., RLHF, RLAIF) to align model behavior with human preferences and domain-specific goals.
Collaborate with cross-functional teams to translate business needs into AI-driven solutions and deploy them in production environments
Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
Required Qualifications:
Deep knowledge and extensive experience with Machine/Deep Learning frameworks including transformer architectures, state space models, large language models, and agentic approaches
Knowledge of algorithms and techniques within a computational domain with emphasis on text processing
Demonstrated publication record in AI domain especially relating to text extraction and summarization
Experience with Hybrid NLP solutions that combine symbolic and machine learning approaches
Preferred Qualifications
PhD or master's degree in computer science, Machine Learning, or related field
12+ years of experience in applied AI/ML with statistics, with a strong track record of delivering production-grade models
Deep expertise in: NLP, Fundamental machine learning, deep learning, transformer, state space-based architecture
Azure ML and/or AWS
Exploratory Data Analysis (EDA)
Experience with PyTorch
Experience with LLM training and fine-tuning (e.g., GPT, LLaMA, Mistral, Qwen)
Experience with graph-based retrieval systems (GraphRAG, knowledge graphs)
Experience with embedding models (e.g., BGE, E5, SimCSE)
Experience with semantic search and vector databases (e.g., FAISS, Weaviate, Milvus)
Experience with document segmentation and preprocessing (OCR, layout parsing)
Experience with distributed training frameworks (NCCL, Horovod, DeepSpeed)
Experience with high-performance networking (InfiniBand, RDMA)
Experience with model fusion and ensemble techniques (stacking, boosting, gating)
Experience with optimization algorithms (Bayesian, Particle Swarm, Genetic Algorithms)
Experience with Symbolic AI and rule-based systems
Experience with meta-learning and Mixture of Experts architectures
Experience with reinforcement learning (e.g., RLHF, PPO, DPO, GRPO), Supervised Fine Tuning (SFT), LoRA, QLoRA, axolotl
Experience with prompt optimization framework (AutoPrompt, GreaterPrompt, DSPy), GEPA
Proven solid in Python coding, SQL and database queries, data preparation, and analysis
Bonus Skills:
Experience with healthcare data and medical coding systems (e.g., CPT, CM, PCS)
Familiarity with regulatory and compliance frameworks in AI deployment
Contributions to open-source AI projects or published research. And/Or ability to take research papers to poc - production
At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.