This job has expired
This position was posted on October 12, 2025 and is likely no longer accepting applications. We've kept it here for historical reference. Check out the similar jobs below!

Job Overview
Location
Remote
Job Type
Full-time
Category
DevOps
Date Posted
October 12, 2025
Full Job Description
đź“‹ Description
- • Own the reliability, scalability, and performance of PDL’s multi-tenant data platform, which ingests and serves billions of people and company records daily to thousands of customers worldwide.
- • Architect and continuously refine our AWS-centric infrastructure—spanning EC2, EKS, RDS, S3, Lambda, and EMR—while also extending our on-prem data-center footprint to balance cost, latency, and compliance requirements.
- • Design, implement, and harden a fully automated CI/CD pipeline (GitHub Actions, ArgoCD, Terraform, Helm) that enables every engineering squad to ship code to production multiple times per day with zero-downtime deployments and instant rollback capabilities.
- • Establish and monitor SLIs/SLOs that directly tie system health to customer experience; build error budgets, chaos-engineering experiments, and blameless post-mortems that turn outages into durable learning loops.
- • Optimize petabyte-scale Spark and Flink pipelines for throughput, memory efficiency, and cost; tune JVM flags, shuffle partitions, and checkpoint strategies to cut runtime by double-digit percentages without sacrificing data fidelity.
- • Instrument end-to-end observability using Prometheus, Grafana, Loki, and distributed tracing (OpenTelemetry) so anomalies are detected and triaged before customers feel them.
- • Partner with Security & Compliance to embed guardrails into every layer—IAM policies, network segmentation, secrets management, and SOC 2 evidence collection—ensuring we stay ahead of evolving privacy regulations.
- • Champion Infrastructure-as-Code best practices; every pull request to our Terraform or CloudFormation repos triggers automated policy checks (OPA, Checkov) and cost estimates so engineers see the blast radius of their changes in real time.
- • Act as the on-call commander for critical incidents, orchestrating cross-functional war rooms, writing concise status updates, and driving long-term fixes that eliminate entire classes of failure.
- • Mentor junior SREs and backend engineers through pair programming, design reviews, and weekly "lunch-and-learn" sessions that raise the technical bar across the organization.
- • Experiment fearlessly with emerging technologies—Graviton instances, serverless Spark on Kubernetes, or data-tier caching with Redis 7—to shave milliseconds off API latency and dollars off the AWS bill.
- • Contribute to open-source projects that we rely on, giving back to the community while showcasing PDL’s engineering culture at conferences and meetups.
- • Translate complex technical trade-offs into clear narratives for product managers and executives, ensuring infrastructure investments are aligned with revenue-generating features.
- • Build self-service tooling that empowers data scientists and ML engineers to spin up isolated, reproducible environments for model training without ever opening a support ticket.
🎯 Requirements
- • 5+ years of production-grade experience designing and operating AWS infrastructure at scale (multi-region, multi-account) with deep expertise in EKS, RDS, S3, IAM, and CloudWatch.
- • Expert-level proficiency in Infrastructure-as-Code (Terraform or CloudFormation) and container orchestration (Kubernetes, Helm, ArgoCD) in a data-heavy environment.
- • Demonstrated success defining and meeting strict SLIs/SLOs for latency, availability, and durability in a SaaS or DaaS product serving external customers.
- • Strong coding skills in Python or Go for automation, plus hands-on experience with CI/CD tooling such as GitHub Actions, Jenkins, or GitLab CI.
- • Nice-to-have: hands-on tuning of Spark/Flink clusters, familiarity with SOC 2 or GDPR compliance workflows, and contributions to open-source observability projects.
🏖️ Benefits
- • Fully remote-first culture with quarterly off-sites in exciting global locations and a generous home-office stipend.
- • Competitive salary plus equity in a fast-growing, profitable data company backed by Tier-1 investors.
- • Flexible PTO policy, 12 company holidays, and a mandatory "recharge week" each year when the entire company powers down.
- • Annual learning & development budget ($3,000) plus paid attendance at conferences, certifications, and online courses.
Skills & Technologies
About People Data Labs, Inc.
People Data Labs provides an API platform that aggregates publicly available information on over three billion individuals, enabling developers and enterprises to enrich profiles with contact, work history, education, and demographic data for sales, marketing, fraud prevention, and analytics use cases while ensuring compliance and privacy safeguards.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.
Similar Opportunities

Block, Inc.
2 months ago

Block, Inc.
2 months ago

