Job Information
ICONMA, LLC Senior Infrastructure Engineer in United States
Our Client, a Ev Manufacturing company, is looking for a Senior Infrastructure Engineer for their Palo Alto, CA location. Responsibilities:
Manage AWS environment using Control Tower, EKS, EC2, S3 and related services.
Triage and resolve ServiceNow tickets (OS-level and cloud-level troubleshooting, vulnerability remediation) and meet SLA requirements.
Commission / decommission cloud resources and manage lifecycle activities.
Plan and execute DR activities with application teams; respond to RCAs and implement corrective actions.
Perform backups, patching, and maintenance of instances and cloud resources.
Carry out cloud migration and site externalization tasks.
Perform cleanup, cost-optimization analysis, and implement cost-saving measures.
Build small automations and PoCs to improve operational efficiency.
Create and implement change controls; maintain thorough documentation and standards.
Collaborate with Windows and Linux platform engineers for OS-level troubleshooting.
Design, implement, and manage network infrastructure (LAN, WAN, WLAN, VPN, Firewalls, Routers, Switches).
Architect, deploy, and manage cloud infrastructure across multiple providers (AWS, Azure), including IaaS, PaaS, and SaaS offerings.
Integrate and maintain Windows-based systems as part of hybrid environments.
Design and implement disaster recovery and business continuity plans for cloud and hybrid systems.
Configure and secure network and cloud environments, including firewalls, routers, switches, and VPNs.
Monitor infrastructure performance, address issues, and optimize for efficiency.
Collaborate on and enhance existing network and system monitoring tools.
Scale infrastructure to meet growing research and product demands while maintaining reliability and performance
Implement and maintain serverless architectures and container orchestration systems
Collaborate with research teams to understand requirements and translate them into robust infrastructure solutions
Develop monitoring, alerting, and observability systems to ensure operational excellence
Participate in on-call rotations and incident response to maintain system reliability
Contribute to infrastructure automation and tooling that improves developer productivity
Partner with security teams to ensure production infrastructure maintains appropriate SLAs and SLOs.
Design, deploy, and manage cloud infrastructure on AWS (EC2, VPC, IAM, S3, RDS, Lambda, ECS/EKS).
Build and maintain Infrastructure as Code (IaC) using Terraform or CloudFormation.
Implement and optimize CI/CD pipelines and automated deployment workflows.
Manage containerized workloads using Docker and Kubernetes.
Ensure cloud environment security, compliance, and cost optimization.
Build monitoring and observability dashboards (CloudWatch, Grafana, Prometheus).
Troubleshoot cloud performance issues and support production environments.
Collaborate with development, security, and operations teams for smooth delivery.
Manages the day-to-day support, policy and engineering of the Endpoint Internet Access Control tools, Zscaler Cloud proxy for production and test environments.
This includes incident, request and change control tickets, problem ticket response/resolution within ticket SLA, Zscaler support tickets, block and unblock policies, testing and deployment of policy ruleset, Proxy Access Control file management and Zscaler cloud platform clean up and maintenance, Splunk logging, querying and dashboarding.
One to one work with teammates, teams, help desk, problem management and others to resolve issues or implement new policy or cloud platform scenarios. Team assistance and brainstorming.
Certificate management, IdP and access control, traffic flow, network and firewall control integration for the proxies, cloud backup and disaster recovery and on-prem device management.
Zscaler Agent deployment, support and management, with Group Policy Controls, authentication and SCIM/SAML/SSO.
API support.
Sanctioned SaaS and other application integration.
F5 GSLB support.
Process and procedural documentation creation, revisions to include PPM storage, help desk procedures and runbooks.
Daily integrations with other teams including Cyber Incident Response, Antivirus, NAC, Firewall and Network.
Requirements:
Experience: 0 - 10 years
AWS Control Tower (operational governance)
EKS (Kubernetes on AWS) administration and troubleshooting
EC2 instance lifecycle, patching, performance troubleshooting
Terraform infrastructure as code for provisioning and change management
Python scripting/automation for operational tasks
S3 lifecycle, permissions, and security best practices
Strong Windows and Linux troubleshooting skills (OS-level diagnostics)
Experience working with ServiceNow (ticketing, SLAs, change management)
Familiarity with vulnerability remediation workflows and security patching
Experience with cost-optimization tools and AWS billing insights
Familiarity with additional AWS services (RDS, CloudWatch, IAM, GuardDuty)
Prior experience running DR exercises and post-mortem RCA work
Experience building operational runbooks and playbooks
Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent experience).
7+ years of experience in network and cloud engineering with a focus on Windows integration.
Strong understanding of networking protocols (TCP/IP, BGP, OSPF).
Proficiency in configuring and administering Windows-based systems in hybrid environments, including Hyper-V, Clustering and Active Directory Services
Hands-on experience with major cloud platforms (AWS, Azure).
Expertise in network security principles and practices.
Skilled in using Powershell scripting for automation.
Strong troubleshooting abilities in complex, hybrid network and system setups.
Excellent communication, collaboration, and time management skills.
Experience designing or implementing end-to-end automation pipelines and internal operational tools
Prior experience in security-conscious or compliance-heavy environments (financial services, healthcare, SaaS, etc.)
Expertise in creating comprehensive monitoring solutions, custom dashboards, and automated reporting mechanisms
Track record of success in fast-paced, high-growth environments with constantly evolving operational needs
Strong documentation habits and demonstrated commitment to continuous improvement and knowledge management
Experience operating production systems on AWS using Kubernetes, Terraform, and observability tooling (e.g., Datadog, Prometheus, SumoLogic). Strong background in Postgres or other relational databases. Bonus for Python or Go scripting.
Familiarity with compliance frameworks such as FedRAMP, PCI, and SOC2.
Excellent communication, interpersonal, and problem-solving skills.
Ability to work effectively in a fast-paced, dynamic environment.
Strong proficiency in Python for automation, data handling, and tool development
Hands-on experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, CloudWatch, or Splunk
Demonstrated expertise in ITSM practices, including incident, problem, and process improvement
Ability to implement secure and compliant offboarding procedures and manage access-related tasks
Bachelor’s degree in Computer Science, Information Security, or related field (or equivalent experience)
3+ years of experience in cloud security, with a focus on AWS
Strong understanding of AWS services (EC2, S3, VPC, Lambda, RDS, etc.)
Proficiency in scripting languages (Python, Bash, etc.) and infrastructure-as-code tools (Terraform, CloudFormation)
Experience with security tools such as AWS WAF, KMS, Inspector, and Macie
Familiarity with SIEM tools and log analysis
Knowledge of network security, encryption, and identity management
Why Should You Apply?
Health Benefits
Referral Program
Excellent growth and advancement opportunities
As an equal opportunity employer, ICONMA provides an employment environment that supports and encourages the abilities of all persons without regard to race, color, religion, gender, sexual orientation, gender identity or express, ethnicity, national origin, age, disability status, political affiliation, genetics, marital status, protected veteran status, or any other characteristic protected by federal, state, or local laws.