
Job Overview
Location
Remote
Job Type
Full-time
Category
Software Engineering
Date Posted
May 16, 2026
Full Job Description
đź“‹ Description
- • Serve as a Senior Site Reliability Engineer for one of the world’s largest genealogy and family history platforms, ensuring the reliability, availability, and performance of infrastructure supporting billions of historical records and a global user base.
- • Take full ownership of Dynatrace-based observability systems, designing and implementing automated configurations to monitor and maintain system health at massive scale.
- • Develop and maintain TypeScript-based tooling to automate routine operational tasks, improve incident response workflows, and enhance platform reliability across distributed systems.
- • Consume and integrate Dynatrace REST APIs to extract metrics, trigger alerts, and automate remediation processes in alignment with SRE principles.
- • Collaborate with a mature engineering team to drive automation initiatives that reduce toil, increase system resilience, and support continuous delivery under high-load conditions.
- • Participate in on-call rotations and incident response protocols, applying deep understanding of SRE practices to minimize downtime and improve mean time to resolution (MTTR).
- • Design and implement observability strategies that provide actionable insights across microservices, databases, and distributed architectures serving millions of concurrent users.
- • Build and maintain automated configuration management systems using Dynatrace to enforce consistent monitoring standards across environments.
- • Work closely with development teams to embed reliability practices into the software development lifecycle, including service level objectives (SLOs), error budgets, and alerting thresholds.
- • Maintain documentation of automation scripts, observability dashboards, and incident runbooks to ensure knowledge sharing and operational continuity.
- • Support infrastructure scalability initiatives by identifying performance bottlenecks and recommending automation-driven solutions to handle increasing data ingestion and user traffic.
- • Act as a technical advocate for reliability best practices across engineering teams, promoting proactive monitoring, automated testing, and preventive maintenance.
- • Translate complex system behaviors into clear metrics and reports for cross-functional stakeholders, enabling data-driven decisions on system investments and risk mitigation.
- • Contribute to the evolution of the platform’s SRE roadmap by evaluating new tools, automating manual processes, and proposing improvements to incident management frameworks.
- • Remain agile in a fast-paced engineering environment with high ownership expectations, where system failures have direct impact on user trust and historical data integrity.
- • Engage in continuous learning and knowledge transfer to strengthen team capabilities in observability, automation, and large-scale system reliability.
🎯 Requirements
- • Solid experience automating Dynatrace or Datadog configuration at scale
- • Strong hands-on experience consuming REST APIs, particularly Dynatrace APIs
- • Proficiency in TypeScript for tooling and automation development
- • Strong understanding of SRE principles: reliability, observability, incident response, and automation
- • Experience working in fast-paced engineering environments with high ownership expectations
- • Experience with AWS Lambda for serverless automation workflows
🏖️ Benefits
- • 100% remote work
- • Payment in USD
- • Paid Time Off (PTO)
- • Work-from-home and training reimbursement
- • English lessons
- • Technical training
Skills & Technologies
About SouthGeek S.A.
SouthGeek is an Argentine software development company specializing in scalable web and mobile applications for startups and enterprises. Founded in 2014, the firm offers full-stack engineering, cloud architecture, UX/UI design, and dedicated agile teams. It focuses on fintech, healthcare, and logistics projects across Latin America and the United States, emphasizing clean code, automated testing, and continuous delivery. The company operates remotely from CĂłrdoba, Buenos Aires, and Montevideo, integrating regional talent with global clients to accelerate digital transformation and reduce time-to-market for complex products.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.



