
DevOps Site Reliability Engineer (Global-Remote-Non.US)
Job Overview
Location
Athens
Job Type
Full-time
Category
Cybersecurity
Date Posted
September 18, 2025
Full Job Description
ďż˝
ďż˝ Description
- • Own the heartbeat of Token Metrics’ global infrastructure: you will be the primary cloud systems administrator for our AWS-centric, multi-cloud environment (AWS + Google Cloud) that serves thousands of crypto investors in 50+ countries. Your uptime, security, and performance decisions directly influence whether a trader in Tokyo or a fund manager in São Paulo can access AI-driven indices in real time.
- • Design, build, and harden the next generation of monitoring and alerting. From Prometheus and Grafana dashboards to custom CloudWatch and Stackdriver integrations, you will create automated pipelines that surface anomalies before customers notice them. Every alert you tune reduces noise and saves engineering hours.
- • Reduce mean-time-to-recovery (MTTR) to minutes, not hours. You will craft incident-response runbooks, automate root-cause analysis, and build self-healing infrastructure with Terraform, Ansible, and Kubernetes. When an outage occurs, your tooling will spin up replacement nodes, drain traffic gracefully, and notify stakeholders with concise post-mortems.
- • Champion security by design. Implement IAM least-privilege policies, enforce CIS benchmarks, rotate secrets automatically with Vault or AWS Secrets Manager, and run continuous compliance scans. Your security posture will protect millions of dollars in digital assets and safeguard customer trust.
- • Accelerate software delivery through CI/CD excellence. You will own GitHub Actions, Argo CD, or similar pipelines that push AI models, APIs, and front-end code to production multiple times per day without downtime. Blue-green and canary deployments are second nature to you.
- • Optimize cost and performance at scale. By rightsizing EC2 instances, leveraging Spot and Reserved capacity, tuning RDS and BigQuery workloads, and implementing auto-scaling policies, you will cut cloud spend while doubling throughput during peak trading hours.
- • Provide white-glove support and mentorship. Triage and resolve escalated tickets from internal teams, create self-service portals, and coach junior engineers on SRE best practices. Your documentation becomes the single source of truth for runbooks, architecture decisions, and disaster-recovery playbooks.
- • Drive data durability and disaster recovery. Architect multi-region backups, test restore procedures quarterly, and ensure RPO/RTO targets of <15 minutes. When a region fails, your automation brings services back online before the next candle closes.
- • Stay ahead of the curve. Evaluate emerging tools—e.g., eBPF for kernel-level observability, chaos engineering with Litmus, or policy-as-code with OPA—and run proof-of-concepts that keep Token Metrics on the cutting edge of reliability engineering.
- • Collaborate across time zones. Work closely with data scientists, backend engineers, and product managers in an async, remote-first culture. Your clear, concise Slack updates and Loom videos ensure alignment without 3 a.m. calls.
Skills & Technologies
About Token Metrics Ventures LLC
Token Metrics Ventures LLC is a Delaware-registered research and analytics firm that uses artificial intelligence to rate and forecast crypto-assets. Founded in 2017 by Ian Balina, the company combines machine-learning models, on-chain data, and sentiment analysis to generate trading signals, portfolio strategies, and weekly newsletters for retail and institutional investors. The platform covers over 6,000 coins and tokens, assigning grades for technology, adoption, and investment merit. Revenue comes from tiered subscriptions, API access, and custom research. Headquartered in Austin, Texas, the firm also operates a media channel and hosts global investor summits.