
Job Overview
Location
Remote
Job Type
Full-time
Category
Software Engineering
Date Posted
February 16, 2026
Full Job Description
đź“‹ Description
- • As a Site Reliability Engineer at Fanvue, you will be instrumental in shaping the future of our rapidly growing creator economy platform. Fanvue is at the forefront of empowering creators to connect, engage, and earn directly from their audiences at scale, powered by cutting-edge AI. Having recently secured Series A funding, we've achieved over $100M in annual recurring revenue and continue to experience triple-digit year-on-year growth. Our mission is to provide a robust, scalable, and highly available platform that supports hundreds of thousands of creators and millions of fans globally. In this critical role, your primary focus will be to ensure our systems are predictable, scalable, and resilient. This proactive approach will empower our product teams to innovate and ship new features rapidly without compromising on uptime, performance, or the trust our creators place in us.
- • You will be a key member of the Platform and Product Engineering teams, contributing directly to the design, operation, and evolution of the core systems that underpin Fanvue's speed, availability, and security. This is a hands-on position where you will take ownership of our production systems, focusing on infrastructure, observability, automation, and fostering a culture of operational excellence. Your contributions will directly impact the user experience for both creators and fans, ensuring a seamless and reliable service.
- • Key responsibilities will include designing, building, and operating reliable infrastructure across Fanvue's cloud environment. This involves not just setting up systems but also ensuring their long-term stability and efficiency. You will own and continuously improve our observability, monitoring, and alerting systems for all critical services, ensuring we have deep insights into system health and performance. A significant part of your role will be dedicated to reducing operational toil by implementing automation, developing essential tooling, and championing infrastructure as code practices. This will free up valuable engineering time and reduce the likelihood of human error.
- • You will collaborate closely with various engineering teams to enhance the overall reliability, scalability, and safety of our deployments. This partnership approach ensures that reliability is a shared responsibility and integrated into the development lifecycle from the outset. Leading incident response for infrastructure-related issues will be a crucial aspect of your role, requiring you to act swiftly and decisively. Following incidents, you will drive high-quality post-incident reviews, identifying root causes and implementing preventative measures to avoid recurrence. Furthermore, you will define and meticulously track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets. This data-driven approach will help us strike the right balance between delivering new features quickly and maintaining the high level of reliability our users expect.
- • You will also contribute significantly to our disaster recovery, backup, and overall resilience planning, ensuring Fanvue can withstand and recover from unexpected events. Improving the reliability of our CI/CD pipelines and rollout practices will be another key area of focus, aiming to minimize risk during software deployments. This role offers a unique opportunity to work on complex, real-world scaling challenges within a fast-paced, high-growth environment. You will have the autonomy to make impactful decisions and the support to grow your skills and career within a company that values innovation and reliability.
🎯 Requirements
- • Proven experience as a Site Reliability Engineer, Infrastructure Engineer, or Platform Engineer, with a strong track record of operating production systems at scale.
- • Demonstrated comfort and expertise in operating and managing distributed systems within a cloud environment (e.g., AWS, GCP, Azure).
- • Solid background in observability, including monitoring, logging, tracing, and incident management, with practical experience in defining and tracking SLOs/SLIs.
- • Proficiency in writing automation scripts and infrastructure as code (e.g., Terraform, Ansible, CloudFormation), coupled with experience in CI/CD pipeline development and optimization.
- • Excellent communication skills, with the ability to remain calm, clear, and effective during high-pressure incident response scenarios and escalations.
- • A strong sense of ownership and a proactive mindset focused on long-term system reliability and continuous improvement.
🏖️ Benefits
- • Competitive salary and comprehensive benefits package.
- • Unlimited holiday policy to promote work-life balance and employee wellbeing.
- • Fully remote working environment with the flexibility to work according to your peak performance hours.
- • Dedicated budget for professional growth, learning, and personal wellbeing initiatives.
- • Opportunity to own and drive reliability for a rapidly scaling, $100M ARR platform.
- • Work on challenging, real-world scaling problems with a talented and supportive team.
Skills & Technologies
Remote
About Fanvue Ltd
Fanvue is a London-based subscription social platform that lets creators monetize content through monthly fees, tips, and pay-per-view media. It targets adult entertainers, fitness instructors, chefs, and other influencers by providing payout tools, analytics, and messaging features comparable to OnlyFans. The company was founded in 2020 by William Monange and YouTuber Joel Morris, secured seed funding in 2021, and emphasizes discoverability and faster customer support to attract creators seeking alternative revenue streams.
Similar Opportunities
⏰ EXPIRES SOON

InsiderOne LLC
Istanbul, Turkiye
Full-time
Expires Mar 1, 2026 (Soon)
Go
Grafana
Senior
+1 more
2 months ago

Faith Technologies, Inc.
Menasha-OMC
Full-time
Expires Mar 4, 2026
Go
Onsite
Degree Required
2 months ago

