
Job Overview
Location
Remote
Job Type
Full-time
Category
Data Engineer
Date Posted
December 31, 2025
Full Job Description
đź“‹ Description
- • Own the end-to-end architecture of Rockerbox’s next-generation data platform, designing pipelines and storage layers that feed real-time AI agents, predictive models, and automated decision engines used by hundreds of brands to optimize multi-million-dollar marketing budgets.
- • Research bleeding-edge techniques for feature engineering, retrieval-augmented generation (RAG), and fine-tuning of large language models so that marketing insights surface faster, cheaper, and more accurately than anything on the market today.
- • Translate abstract research papers into production-grade code: implement vector databases, embedding services, prompt-caching layers, and low-latency inference endpoints that scale to petabytes of click-stream, impression, and conversion data.
- • Partner with product managers, data scientists, and MLOps engineers to define the roadmap for AI-driven experimentation—prioritizing use cases such as budget reallocation, creative fatigue detection, and audience expansion—then ship MVPs within weeks, not quarters.
- • Build self-healing data workflows that detect schema drift, model drift, and data-quality anomalies automatically; your systems will page the right owner, roll back bad commits, and keep downstream AI agents humming 24/7.
- • Champion a culture of reproducibility: containerize training jobs with deterministic seeds, version every dataset with DVC or LakeFS, and publish internal “cookbooks” so any engineer can retrain or audit your models in minutes.
- • Instrument everything—from GPU utilization to token-latency histograms—then surface actionable dashboards that help leadership decide when to scale clusters, switch foundation models, or negotiate cloud-commit discounts.
- • Mentor senior and junior engineers through architecture reviews, pair programming, and guild talks; your feedback will shape coding standards, testing strategies, and the hiring bar for the next wave of AI data talent.
- • Collaborate with privacy and compliance teams to embed differential privacy, federated learning, and PII-masking techniques directly into the data layer, ensuring that every AI insight we deliver is both powerful and regulation-proof.
- • Contribute to open-source communities (e.g., Ray, LangChain, DuckDB) and represent Rockerbox at conferences, turning our internal innovations into public artifacts that attract top-tier candidates and strategic partners.
- • Run weekly “failure parties” where the team dissects outages, near-misses, and experiment flops; you’ll turn these stories into resilient design patterns that make the platform antifragile over time.
- • Design cost-aware architectures: leverage spot instances, tiered storage, and quantization tricks to cut inference spend by 30% while improving model accuracy, directly impacting EBITDA and customer NPS.
- • Experiment with emerging modalities—image, video, audio ad creatives—and build multimodal embeddings that let marketers ask questions like “Which TikTok clip will resonate with Gen-Z gamers next week?” and get answers grounded in live performance data.
- • Create synthetic data generators that simulate counterfactual campaign outcomes, enabling safe A/B tests on synthetic populations before brands risk real dollars.
- • Establish SLAs for data freshness (sub-minute), model refresh cadence (hourly), and insight delivery (interactive queries under 2 s); your dashboards will become the single source of truth for CMOs deciding whether to double down or pull spend.
- • Influence the long-term technical vision: whiteboard how edge inference, on-device learning, and decentralized data markets could redefine marketing analytics in the next five years, then prototype the riskiest ideas in quarterly “moonshot sprints.”
- • Foster an inclusive, remote-first culture where time-zone differences become 24-hour development cycles; you’ll rotate on-call duties fairly and document tribal knowledge so no brilliant idea is lost to Slack scrollback.
🎯 Requirements
- • 7+ years designing and operating large-scale data pipelines (10+ TB/day) on Spark, Flink, or Snowflake, with proven experience optimizing for both throughput and cost.
- • Deep expertise in Python or Scala, plus hands-on fluency with PyTorch, TensorFlow, or JAX for fine-tuning LLMs and building custom embeddings.
- • Production experience with vector databases (Pinecone, Weaviate, Milvus) and retrieval-augmented generation patterns at sub-second latency.
- • Nice-to-have: PhD or MS in Computer Science, Statistics, or related field with published research in NLP, recommendation systems, or causal inference.
🏖️ Benefits
- • Fully remote culture with quarterly in-person “summits” in destinations like Lisbon, Mexico City, or Tokyo.
- • $4,000 annual learning stipend plus 10% time for open-source contributions and personal research.
- • Equity compensation that turns every optimization you ship into direct ownership in DoubleVerify’s continued growth.
- • Premium health, dental, vision, and mental-health coverage for you and dependents—100% company-paid.
Skills & Technologies
Senior
Remote
About DoubleVerify Holdings, Inc.
DoubleVerify Holdings, Inc. provides digital media measurement and analytics software. The company offers fraud protection, brand safety, viewability, and performance verification for online, mobile, and connected-TV advertising, helping advertisers ensure campaigns are seen by real people in brand-safe environments.



