
Job Overview
Location
India
Job Type
Full-time
Category
Machine Learning Engineer
Date Posted
September 18, 2025
Full Job Description
đź“‹ Description
- • Architect and own the end-to-end design of production-grade AI systems that serve millions of requests daily. You will translate cutting-edge research into fault-tolerant, horizontally-scalable architectures, selecting the right mix of micro-services, message queues, and data stores to guarantee sub-second latency and 99.9 % uptime.
- • Build modular, version-controlled data pipelines that ingest terabytes of multimodal data (images, text, logs, sensor feeds) from on-prem, cloud, and edge sources. You will implement idempotent ETL jobs, schema validation, and data-quality gates so every downstream model is trained on clean, auditable datasets.
- • Develop high-performance APIs and streaming services using FastAPI, gRPC, and WebSockets to expose model predictions in real time. You will design authentication, rate-limiting, and graceful-degradation strategies so external and internal consumers—from mobile apps to BI dashboards—receive consistent, secure access.
- • Operationalize AI at scale by writing Kubernetes manifests, Helm charts, and Terraform modules that deploy training clusters, model-serving endpoints, and monitoring stacks across AWS, GCP, or hybrid clouds. You will champion GitOps and immutable infrastructure so rollbacks take minutes, not hours.
- • Implement continuous training and CI/CD workflows that automatically retrain, validate, and promote models when data or code drifts. You will integrate MLflow, DVC, and Weights & Biases for experiment tracking, artifact storage, and lineage, ensuring every model is reproducible and explainable.
- • Optimize training and inference for cost and speed: profile GPU/CPU utilization, apply mixed-precision training, TensorRT compilation, and dynamic batching to cut cloud spend while doubling throughput. You will benchmark new hardware (A100s, Jetson Orin, Inferentia) and quantify ROI for leadership.
- • Instrument comprehensive observability—Prometheus metrics, OpenTelemetry traces, structured logs, and custom dashboards—so anomalies are caught before users notice. You will define SLOs, error budgets, and on-call runbooks, turning chaos into predictable operations.
- • Collaborate cross-functionally with research scientists, product managers, and UX designers to translate vague requirements into concrete engineering tasks. You will run design reviews, threat-modeling sessions, and post-mortems that raise the technical bar across the organization.
- • Mentor and upskill junior engineers through pair programming, code reviews, and lunch-and-learn sessions on topics like vector databases, retrieval-augmented generation, or zero-downtime migrations. Your guidance will create a culture of craftsmanship and psychological safety.
- • Contribute back to the community: publish blog posts, speak at meetups, and release open-source tooling that showcases Weekday Technologies’ thought leadership in AI systems engineering. Your reputation will help us attract top-tier talent and strategic partners.
Skills & Technologies
Python
JavaScript
Rust
Node.js
Flask
Senior
Remote
Degree Required
About Weekday Technologies Inc.
Weekday Technologies operates a hiring platform that connects tech companies with pre-vetted software engineers through community referrals. The product crowdsources candidate recommendations from existing engineering teams, verifies skills, and offers employers a searchable talent pool for contract and full-time roles. Founded in 2021 and headquartered in San Francisco, the company focuses on reducing time-to-hire for startups and scale-ups by leveraging trusted peer networks rather than traditional recruiting pipelines.



