
Job Overview
Location
Remote - US
Job Type
Full-time
Category
Software Engineer
Date Posted
May 17, 2026
Full Job Description
đź“‹ Description
- • Design and implement automated functional health checks for disaster recovery (DR) and production environments using synthetic transactions and API validation to ensure system availability and correctness.
- • Build continuous validation pipelines that verify end-to-end business workflows including authentication, transaction processing, and third-party system integrations under real-world conditions.
- • Develop intelligent alerting mechanisms that prioritize functional failures and customer-impacting behavior over generic infrastructure metrics to reduce noise and improve incident response accuracy.
- • Integrate observability signals—including logs, metrics, and distributed traces—with automated test frameworks to enhance system visibility and accelerate root cause analysis during outages.
- • Create AI/ML-driven systems to detect recurring failure patterns, correlate anomalies across microservices, and identify probable root causes of system degradation or downtime.
- • Engineer automated remediation systems capable of recommending or initiating self-healing actions to reduce mean time to recovery and improve system resilience.
- • Partner cross-functionally with QA, SRE, and Engineering teams to align on service reliability goals, refine incident response playbooks, and enhance disaster recovery readiness.
- • Define, measure, and report on functional SLAs, service health indicators, and quality metrics that reflect customer-facing system performance rather than internal technical benchmarks.
- • Contribute to regular disaster recovery drills and automated validation exercises to test and improve the robustness of Socure’s production systems under failure conditions.
- • Champion the shift from traditional QA practices toward autonomous, data-driven quality systems that proactively validate system integrity without manual intervention.
- • Maintain and evolve automated test frameworks using tools such as Playwright, Jest, and SuperTest to ensure consistent, scalable, and maintainable validation across complex distributed architectures.
- • Translate production incident data into actionable test cases and validation criteria that prevent recurrence and improve long-term system reliability.
- • Evaluate and implement cloud-based monitoring platforms including Datadog, New Relic, CloudWatch, and Splunk to enrich test outcomes with real-time operational insights.
- • Apply systems thinking to understand failure modes in microservices environments and design validation strategies that account for network latency, service dependencies, and partial outages.
- • Contribute to CI/CD pipelines by embedding automated validation gates that prevent unstable code from advancing to production environments.
- • Document validation architecture, test coverage, and reliability metrics to enable transparency and knowledge sharing across engineering teams.
Skills & Technologies
About Socure Inc.
Socure Inc. provides digital identity verification and fraud prevention software for financial services, fintech, e-commerce and government clients. The platform applies machine learning and graph-based analytics to link and validate identity elements in real time, detecting synthetic identities, account takeover and document fraud. It integrates via APIs and SDKs for onboarding, KYC/AML compliance and transaction monitoring, aiming to reduce false positives and manual reviews while improving approval rates for legitimate users worldwide.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Scale Army Careers
4 months ago

FullStory, Inc.
3 months ago
