This job has expired

This position was posted on October 15, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Vivun Inc. logo

Lead Observability Engineer (Remote, North America)

Job Overview

Location

Los Angeles, Indiana, USA

Job Type

Full-time

Category

Data Science

Date Posted

October 15, 2025

Full Job Description

đź“‹ Description

  • • Own the end-to-end observability strategy for Ava, Vivun’s AI Sales Teammate, defining the standards, tools, and patterns that ensure reliable visibility across infrastructure and agentic components. You will architect a unified observability layer that spans traditional SaaS services, LLM-driven agents, and the orchestration glue that binds them.
  • • Design and implement correlation models that link agent behavior, LLM interactions, and SaaS telemetry into cohesive, actionable insights. Your dashboards will tell a story: how a spike in token latency cascades into slower call wrap-ups, or how an OpenAI rate-limit warning correlates with a drop in rep productivity.
  • • Unify observability tooling across teams, ensuring metrics, logs, and traces flow into a central platform (e.g., Observe, Datadog, or equivalent). You will evaluate vendors, negotiate contracts, and migrate legacy data so that every engineer—from backend to prompt-tuning—has a single pane of glass.
  • • Collaborate with engineering and QA to embed observability best practices into development workflows, CI/CD, and quality gates. Pull-request templates will include “observability checklist” items; feature flags will auto-inject trace IDs; load tests will fail if p95 latency regresses beyond SLO thresholds.
  • • Establish enablement frameworks—documentation, dashboards, and templates—that make observability self-serve for all engineering teams. You will run internal workshops, record Loom demos, and maintain a living runbook library so that on-call engineers can diagnose issues without paging you.
  • • Partner with teammates to ensure observability aligns with infrastructure reliability, alerting, and incident response patterns. You will co-define error budgets, participate in blameless post-mortems, and refine PagerDuty escalation policies so that alerts are actionable and fatigue is minimal.
  • • Contribute to performance and reliability strategy, helping define how we measure agent quality, responsiveness, and system scalability. You will instrument token-usage cost tracking, model-drift detection, and user-perceived latency SLIs that directly map to revenue impact.
  • • Champion a culture of curiosity and ownership. When the CEO asks “Why did Ava hallucinate on yesterday’s demo call?”, you will already have the trace that shows the prompt injection, the vector-db recall score, and the downstream service timeout—all in one view.
  • • Iterate rapidly in a fully remote, high-growth startup. You will ship weekly, pivot quarterly, and scale yearly, turning observability from a cost center into a product differentiator that lets Vivun sell “AI you can trust.”

🎯 Requirements

  • • 6+ years in SRE, DevOps, or Observability Engineering roles, with at least 2 years leading or designing observability initiatives
  • • Deep knowledge of observability tooling (OpenTelemetry, Prometheus, Grafana, Datadog, Honeycomb, Observe, etc.) and distributed tracing practices
  • • Experience with Agentic/LLM-based systems (LangChain, Celery, OpenAI APIs, or similar orchestration frameworks)
  • • Proven ability to define cross-team standards, influence engineering culture, and establish scalable monitoring patterns
  • • Strong collaboration and communication skills—you enable, not dictate

🏖️ Benefits

  • • Competitive salary and full health benefits
  • • Stock options at a well-funded, pre-IPO company on a fast-growth track
  • • Flexible work schedules and work from anywhere in North America
  • • Unlimited PTO with two weeks designated as “quiet period” each year

Skills & Technologies

Python
JavaScript
Node.js
Prometheus
Grafana
Senior
Remote

Ready to Apply?

You will be redirected to an external site to apply.

Vivun Inc. logo
Vivun Inc.
Visit Website

About Vivun Inc.

Vivun provides a SaaS platform that aligns pre-sales teams with product, sales, and customer success. The software captures technical field insights, manages product gap requests, and offers analytics to prioritize roadmap decisions. Features include demo environments, opportunity scoring, and AI-driven recommendations. Founded in 2020, the company serves B2B technology vendors seeking to shorten sales cycles and improve product-market fit.

Similar Opportunities

Nice, Argentina
Full-time
Expires Apr 25, 2026
JavaScript
TypeScript
React
+3 more

16 days ago

Apply
Buenos Aires, Argentina
Contract
Expires Apr 28, 2026
Python
AWS
Azure
+4 more

13 days ago

Apply
Sydney, Australia
Full-time
Expires Apr 27, 2026
Python
JavaScript
Node.js
+1 more

14 days ago

Apply
Nice, Australia
Full-time
Expires Apr 25, 2026
JavaScript
TypeScript
React
+3 more

16 days ago

Apply