Wisdom Health Inc. logo

Staff Software Engineer, Infrastructure (USA Only - 100% Remote)

Job Overview

Location

Remote

Job Type

Full-time

Category

Software Engineering

Date Posted

June 26, 2026

Full Job Description

đź“‹ Description

  • • Set the reliability strategy for Wisdom’s platform, defining SLOs, error budgets, and operating standards for systems that handle real dental billing transactions with zero tolerance for downtime.
  • • Own end-to-end observability using Datadog — implementing tracing, metrics, logging, and alerting that proactively surface issues before users are impacted, ensuring any engineer can lead incidents without relying on original code authors.
  • • Define operational patterns for LLM-driven agentic workflows (using Anthropic, Mastra) including retries, backpressure, idempotency, graceful degradation, and capacity controls to prevent batch blowups, stream drops, runaway costs, or model misbehavior in production.
  • • Harden integrations with dental insurance carriers and practice management systems (Dentrix, Eaglesoft) that are poorly documented, inconsistent, and prone to failure under load.
  • • Own the deploy and release engineering pipeline: implement fast, safe, reversible deploys using infrastructure as code (Terraform), ensuring the team can ship multiple times daily without compromising stability.
  • • Build and institutionalize the incident response practice — establish on-call rotations, detailed runbooks, blameless post-incident culture, and follow-up discipline to turn outages into permanent, team-owned fixes.
  • • Raise the reliability bar across the engineering team through code reviews, architectural guidance, and actionable documentation that is consistently referenced and adopted.
  • • Drive resolution of ambiguous, company-level reliability problems without waiting for formal briefs or permission, taking ownership of undefined challenges that impact production stability.
  • • Operate and debug distributed services on AWS with first-principles reasoning, ensuring systems remain resilient under pressure and scale with business growth.
  • • Implement and maintain infrastructure as code (Terraform), container orchestration (ECS/Kubernetes), and CI/CD pipelines to make deployment processes predictable, repeatable, and low-risk.
  • • Apply deep production experience with at least one major LLM API (OpenAI, Anthropic, or Google Vertex AI), managing operational realities such as rate limits, latency, cost control, and failure modes in live systems.
  • • Write and debug application-level code in TypeScript/JavaScript, not just infrastructure scripts, ensuring full-stack understanding of systems that power billing workflows.
  • • Manage relational database performance (Postgres) with expertise in connection pooling, query optimization, and data integrity under high load.
  • • Lead by example: default to ownership, respond to pagers proactively, surface bad news early, change position based on evidence, and write postmortems that improve team-wide practices.
  • • Mentor engineers and establish technical standards that outlive individual contributions, enabling the entire team to operate at a higher reliability level without constant oversight.
  • • Ensure all infrastructure and processes comply with HIPAA requirements for handling protected health information in a healthcare technology environment.

🎯 Requirements

  • • 8+ years running production systems with staff/principal-level ownership of reliability in high-stakes environments
  • • Deep AWS experience deploying, operating, and debugging distributed services in production
  • • Hands-on production experience operating LLM APIs (Anthropic, OpenAI, or Google Vertex AI) with focus on rate limits, cost, latency, and failure modes
  • • Strong command of TypeScript/JavaScript and relational databases (Postgres)
  • • Proven expertise in infrastructure as code (Terraform), containers (ECS/Kubernetes), and CI/CD pipelines
  • • Experience building incident response practices, on-call rotations, and blameless postmortem cultures

🏖️ Benefits

  • • Fully remote role with no geographic restrictions within the US
  • • Reporting directly to the Head of Engineering
  • • Opportunity to build reliability practices from scratch at a Series A startup
  • • Work with cutting-edge LLM-driven agentic systems in a regulated healthcare environment
  • • Join a high-trust, small engineering team shaping the future of dental billing technology
  • • Equal opportunity employer with inclusive policies covering all protected statuses

Skills & Technologies

Python
JavaScript
TypeScript
Go
React
DevOps
Senior
Remote

Ready to Apply?

You will be redirected to an external site to apply.

AI Job Fit Analysis
Pro

See exactly how your profile matches this role — strengths, skill gaps, and what to do about them.

Wisdom Health Inc. logo
Wisdom Health Inc.
Visit Website

About Wisdom Health Inc.

Wisdom Health offers at-home DNA testing for dogs and cats, enabling pet owners and veterinarians to identify breed ancestry, genetic health risks, and traits. The company processes samples via cheek swabs and delivers online reports with actionable insights for personalized care. Its products include Wisdom Panel and Optimal Selection, supported by a CLIA-certified laboratory, an extensive breed database, and ongoing research collaborations with academic and veterinary institutions.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

Expired
Remote LATAM
Full-time
Expired May 16, 2026
AWS
Azure
GCP
+3 more

3 months ago

Expired
US - Remote
Full-time
Expired May 16, 2026
Remote
Degree Required

3 months ago

Expired
Stedi, Inc. logo

Stedi, Inc.

Remote in the USA
Full-time
Expired May 6, 2026
REST
Remote

4 months ago

Expired
Remote LATAM
Full-time
Expired May 16, 2026
AWS
Azure
GCP
+3 more

3 months ago