Software Engineer II, Data

Iambic Therapeutics, Inc.

Job Overview

Location

Remote

Job Type

Full-time

Full Job Description

📋 Description

• Architect and continuously refine high-throughput, fault-tolerant data pipelines that ingest, validate, and transform petabyte-scale, multi-modal data from wet-lab instruments, public repositories, and third-party partners into pristine training sets for cutting-edge AI drug-discovery models.
• Own the end-to-end evolution of our cloud-native data storage layer—designing schema evolution strategies, implementing reproducible data snapshots, and optimizing for both millisecond-level analytics queries and bulk model-training reads.
• Partner shoulder-to-shoulder with ML researchers and computational chemists to profile, debug, and accelerate Python-based data processing workflows, shaving hours off model-training iterations and enabling rapid experimentation on novel therapeutic hypotheses.
• Select and deploy modern orchestration engines (Prefect, Airflow, Argo, or Databricks) to schedule, monitor, and auto-retry complex ETL graphs that span on-demand GPU clusters, spot-instance fleets, and Kubernetes namespaces.
• Implement rigorous data-quality gates, automated regression tests, and CI/CD pipelines so every dataset version that reaches production is traceable, auditable, and compliant with FDA 21 CFR Part 11 and GxP expectations.
• Leverage AWS services (S3, Glue, Athena, EMR, Lambda) and container orchestration (Kubernetes/EKS) to achieve elastic, cost-efficient scaling—from nightly batch jobs crunching genomic sequences to real-time feature extraction for active-learning loops.
• Contribute clean, well-documented Python modules to our internal data toolkit, establishing patterns that junior engineers can adopt and that ML scientists can extend without deep infrastructure knowledge.
• Evaluate emerging data-lake technologies (Iceberg, Delta Lake, Hudi) and storage formats (Parquet, Zarr, HDF5) to ensure our platform remains two steps ahead of exploding data volumes and increasingly sophisticated AI architectures.
• Translate biological and chemical questions into scalable data models—whether harmonizing assay readouts across 50+ cell lines or aligning 3-D molecular conformers—so researchers spend their time on science, not data wrangling.
• Champion a culture of reproducibility: containerize environments, version datasets with DVC or LakeFS, and publish internal “data cookbooks” that let any team member recreate a training set from raw sources in a single command.
• Provide on-call support for critical data pipelines, responding to alerts, performing root-cause analyses, and turning incidents into post-mortems that prevent future regressions.
• Mentor early-career engineers through code reviews, pair programming, and brown-bag sessions, raising the technical bar of the entire data organization.
• Collaborate with security and compliance teams to implement encryption, access controls, and audit logging that safeguard proprietary assay data and patient-level information.
• Present quarterly tech talks to the broader company, showcasing pipeline optimizations that reduced training costs by double-digit percentages or unlocked entirely new model capabilities.

🎯 Requirements

• 8+ years (BS), 6+ years (MS), or 3+ years (PhD) of hands-on experience building production-grade ETL or data-engineering systems.
• Demonstrated expertise with workflow orchestration tools such as Prefect, Airflow, Argo, Databricks, or Spark in multi-tenant, cloud environments.
• Proven track record processing multi-terabyte to petabyte-scale datasets, including performance tuning of distributed compute jobs.
• Strong Python software-engineering skills with deep knowledge of pandas, NumPy, PyArrow, and modern packaging/testing practices.
• Familiarity with AWS core services (S3, Glue, Athena, EMR) and container orchestration (Kubernetes/EKS); bonus points for infrastructure-as-code (Terraform, CDK).

🏖️ Benefits

• Industry-leading competitive salary with company-paid healthcare (medical, dental, vision) for you and dependents.
• Uncapped vacation policy, flexible spending accounts, voluntary life insurance, and 401(k) with generous matching.
• Brand-new, state-of-the-art San Diego headquarters featuring an onsite gym, gourmet dining, and panoramic views—plus fully remote flexibility with East-Coast-friendly hours.

Skills & Technologies

Remote

Ready to Apply?

Apply Externally

You will be redirected to an external site to apply.

Iambic Therapeutics, Inc.

Visit Website

About Iambic Therapeutics, Inc.

Iambic Therapeutics is a biotechnology company leveraging a cutting-edge, AI-driven platform to revolutionize drug discovery and develop superior medicines. Utilizing physics-based AI algorithms and high-throughput experimental processes, Iambic addresses challenging design problems to generate optimized drug candidates and explore novel mechanisms of action. Their platform-driven pipeline focuses on first-in-class and best-in-class programs, including a HER2 program already in Phase 1 clinical studies, aimed at unlocking the potential of known targets and transforming undruggable targets into breakthrough treatments for patients with unmet medical needs. This innovative approach enables them to deliver differentiated clinical candidates at an accelerated pace, supported by over $100 million raised in an oversubscribed financing round to advance their portfolio.

View Company Profile