Job Information
Smartsheet Senior Software Engineer II - Applied AI and Evaluations in Bellevue, Washington
You will work closely with our Agent Engineering and AI Platform teams, embedded in a team that has already shipped evaluation infrastructure on Databricks/MLflow and is building toward a mature Agent Development Lifecycle (ADLC).
You Will:
- Own agent quality end-to-end: diagnosis, improvement, and validation across SmartAssist's orchestrator and subagents
- Identify failure modes across quality dimensions factual accuracy, completeness, tone, actionability, and latency and prioritize what to fix
- Drive quality improvements through prompt engineering, context engineering, and RAG retrieval tuning
- Extend and mature our evaluation framework: scorers, golden datasets, regression gates, and online evaluation for production traffic
- Close the feedback loop ensure that every change has a measurable, attributable quality signal
- Collaborate with our Agent Architecture lead to distinguish quality problems that require prompt/context solutions from those that require structural fixes
- Establish repeatable methodology that scales beyond any single agent or subagent
You Have:
Required
- 8+ years of software engineering experience, with at least 2 years working directly with LLMs in production
- Deep, hands-on experience with prompt engineering and context engineering, you understand how model behavior changes with framing, structure, and input design
- Strong working knowledge of RAG architectures: chunking strategies, embedding models, retrieval evaluation, and failure diagnosis
- Experience building or extending LLM evaluation frameworks, you have designed scorers, worked with golden datasets, and thought carefully about what good looks like
- Fluency in agent system design, you don't need to own the architecture, but you can engage as a peer on architectural tradeoffs that affect quality
- Strong Python skills; comfortable working in data-heavy environments (Databricks, Delta tables, or equivalent)
- Ability to communicate complex quality findings (written and verbal) to both technical and non-technical stakeholders, you can explain what's broke, why it matters, and what needs to happen next without losing the room
- Strong cross-functional judgment, you know when to escalate, when to resolve independently, and how to build credibility across engineering, product, and AI platform teams
- A bias for clarity in ambiguous situations, when failure modes are murky and trade-offs are real, you bring structure and a clear point of view rather than waiting for consensus
- Legally eligible to work in the U.S. on an ongoing basis
- BS or MS in Computer Science, a related field, or equivalent industry experience
Strong Plus
- Experience with MLflow or similar experiment tracking platforms
- Familiarity with CI-integrated evaluation pipelines
- Experience with multi-agent orchestration frameworks
- Prior work in an Applied AI or LLMOps function within a product company
What Success Looks Like:
In your first 6 months, you will have:
- Delivered measurable, validated quality improvement on at least one SmartAssist agent across our defined quality dimensions
- Expanded evaluation coverage to close the most significant blind spots in our current framework
- Established a repeatable quality improvement methodology the broader team can apply
Why This Role:
- You will have real ownership, not a supporting role on someone else's roadmap
- SmartAssist is shipping to real users; this work has immediate, visible impact
- You will be part of a fast-moving team that deeply values engineering rigor and actively seeks diverse perspectives
Current US Perks & Benefits: