Job Information
Amazon Systems Administrator , Global Operations Support Engineering in Herndon, Virginia
Description
AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we're looking for talented people who want to help.
You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You'll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you'll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.
The AWS Global Operations Support Engineering (GOSE) team is seeking a System Engineer to build and maintain business automation infrastructure and support the development of AI-driven operational intelligence platforms. This role will implement production systems that transform manual operational processes into automated, intelligent workflows that improve efficiency and reliability across AWS's global data center portfolio.
As a System Engineer, you will build AWS infrastructure for automation tools, implement integrations with internal systems, and support the deployment of AI-driven operational capabilities. You will work hands-on with Lambda functions, AgentCore, Bedrock, API integrations, and infrastructure-as-code to create scalable solutions that enable thousands of data center engineers to work more efficiently.
Key job responsibilities
Build and maintain AWS infrastructure for business automation solutions, including , AgentCore deployments, API integrations, MCP server deployments, IAM roles, CloudWatch monitoring, and dedicated AWS accounts with appropriate security controls
Implement infrastructure-as-code using CDK/CloudFormation to enable repeatable, version-controlled deployments and establish CI/CD pipelines for automation script testing and production deployment
Develop usage logging, database tracking, authentication systems, and API integrations with internal systems to automate ticket creation, data retrieval, and workflow orchestration
Support the productionalization of AI proof-of-concepts by implementing infrastructure components, deployment pipelines, operational monitoring, and integration layers between AI agents and internal systems
Create monitoring and alerting solutions to ensure high availability of automation infrastructure, troubleshoot system issues across AWS services and integration points
Collaborate with other Systems Engineers, Business Intelligence Engineers, TPMs, and Data Engineers to implement technical solutions while participating in code reviews and design reviews
Implement automated testing for infrastructure changes, establish logging and observability for automation systems, and contribute to team documentation and best practice guides
Continuously learn new AWS services, automation techniques, and AI/ML capabilities to improve technical skills and identify improvements to system reliability and performance
About the team
The Global Operations Support Engineering (GOSE) team is focused on maximizing AWS data center infrastructure availability and operational excellence. We achieve this by optimizing labor utilization, deep diving event and incident analysis, developing data engineering and business intelligence solutions, deploying business automation, and managing global operational improvement initiatives.
We transform critical infrastructure data into actionable intelligence that enables the Data Center Community (DCC) organization to prevent customer impact, reduce operational burden, focus on highest-impact activities, and continuously improve fleet-wide reliability and productivity. Through our comprehensive monitoring, analysis, reporting, and program/project management, we serve as the analytical backbone that drives continuous improvement in operational excellence across the global data center portfolio.
The team operates at the intersection of infrastructure operations, data engineering, and artificial intelligence—building systems that fundamentally change how AWS manages its global infrastructure at scale.
Basic Qualifications
- 2+ years of site reliability engineering (SRE), systems engineering, systems administration, DevOps, security administration, or network administration experience
Preferred Qualifications
2+ years of building scripts, tooling, and automation for large-scale computing environments experience
Knowledge of configuration management systems, such as Puppet, Chef, Ansible, or related systems
Experience in network capture and systems troubleshooting
Experience building scripts, tooling, and automation for large-scale computing environments
Experience working in a 24/7 production environment
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits .
USA, VA, Herndon - 82,400.00 - 144,100.00 USD annually