Job Information
Citigroup Network SRE Full Stack Observability Engineer in Heredia, Costa Rica
Network SRE Full Stack Observability Engineer
Job Summary
We are seeking a highly skilled and motivated SRE Full Stack Network Observability Engineer to join our team. In this role, you will focus on building and maintaining observability solutions for our network infrastructure, ensuring reliability, scalability, and performance. You will work closely with network engineers, and SRE team to design and implement monitoring, alerting, and visualization tools that provide actionable insights into our network systems.
This is a unique opportunity to combine your expertise in full-stack development, network engineering, and site reliability engineering to enhance the observability and reliability of our critical infrastructure.
Key Responsibilities
Design and Implement Observability Solutions: Develop and maintain monitoring, logging, and tracing systems for network infrastructure using tools like Prometheus, Grafana, Splunk, ELK Stack, or similar platforms.
Full-Stack Development: Build custom dashboards, APIs, and tools to provide real-time insights into network performance and reliability.
Network Monitoring: Collaborate with network engineers to implement telemetry solutions for routers, switches, firewalls, and cloud networking components.
Incident Management: Create automated alerting systems to detect and respond to network anomalies, ensuring minimal downtime and fast recovery.
Automation and Scripting: Develop scripts and automation workflows using Python, Go, or similar languages to streamline observability and troubleshooting processes.
Data Analysis: Analyze network telemetry data to identify trends, bottlenecks, and areas for optimization.
Collaboration: Work closely with SRE, DevOps, and Network Engineering teams to ensure observability solutions aligned with organizational goals.
Reliability Engineering: Apply SRE principles to improve the reliability and scalability of network systems, including implementing SLIs, SLOs, and error budgets.
Documentation: Create and maintain detailed documentation for observability tools, workflows, and best practices.
<>o
Required Qualifications
Education : Bachelor’s degree in computer science, Information Technology, Network Engineering, or a related field (or equivalent experience).
Experience :
5+ years of experience in network engineering, site reliability engineering, or full-stack development.
Strong background in network observability and monitoring tools.
Technical Skills :
Proficiency in programming languages such as Python, Go, or JavaScript.
Experience with observability tools like Prometheus, Grafana, Splunk, ELK Stack, or Datadog.
Strong understanding of network protocols (TCP/IP, BGP, OSPF, DNS, etc.).
Knowledge of Infrastructure as Code (IaC) tools like Terraform or Ansible.
Understanding fundamental concepts in cloud networking (AWS, Azure, GCP) and hybrid environments.
Basic familiarity with container orchestration tools (e.g., Kubernetes) and service meshes.
Soft Skills :
Strong problem-solving and analytical skills.
Excellent communication and collaboration abilities.
Ability to work in a fast-paced, dynamic environment.
Preferred Qualifications
Strong proficiency in programming languages such as Python, Go, or JavaScript, with the ability to develop scripts and tools for automation and observability.
Extensive experience with observability tools like Prometheus, Grafana, Splunk, ELK Stack, or Datadog, including setting up monitoring, alerting, and visualization workflows.
Experience with AI/ML-based network monitoring tools.
Certifications such as CCNA, CCNP, AWS Advanced Networking, or Kubernetes certifications (CKA/CKAD).
Familiarity with container orchestration tools (e.g., Kubernetes) and service meshes.
Familiarity with chaos engineering practices to test network resilience.
This role offers the opportunity to work in a dynamic, global environment, driving innovation and operational excellence in critical network domains. If you are passionate about network reliability, observability, and full-stack development, we encourage you to apply!
Job Family Group:
Technology
Job Family:
Systems & Engineering
Time Type:
Full time
Most Relevant Skills
Please see the requirements listed above.
Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi (https://www.citigroup.com/citi/accessibility/application-accessibility.htm) .
View Citi’s EEO Policy Statement (https://www.citigroup.com/global/eeo-aa-policy) and the Know Your Rights (https://www.eeoc.gov/sites/default/files/2023-06/22-088_EEOC_KnowYourRights6.12ScreenRdr.pdf) poster.
Citi is an equal opportunity and affirmative action employer.
Minority/Female/Veteran/Individuals with Disabilities/Sexual Orientation/Gender Identity.