
Job Overview
Location
United States
Job Type
Full-time
Category
Software Engineering
Date Posted
May 22, 2026
Full Job Description
đź“‹ Description
- • Design, build, and optimize real-time speech-to-text pipelines including streaming ASR, VAD, and audio processing for ClickUp’s voice platform.
- • Improve transcription accuracy by injecting contextual data such as user names, team names, custom vocabulary, and language detection into the ASR system.
- • Develop and maintain LLM-powered post-processing systems to correct grammar, remove filler words, resolve speaker mentions, and reformat transcribed text for clarity and usability.
- • Build voice-to-action pipelines that interpret natural language voice commands and convert them into structured workspace actions within ClickUp’s productivity environment.
- • Evaluate, benchmark, and integrate third-party ASR models (including Whisper, AssemblyAI, Fireworks) based on criteria such as cost, latency, and transcription accuracy.
- • Collaborate with product and platform engineering teams to ship voice features across ClickUp’s MAX Desktop, Mobile, Web, and Browser Extension platforms.
- • Explore and prototype multimodal AI capabilities that combine voice input with screen activity and text input to enable next-generation assistant experiences.
- • Own the end-to-end lifecycle of AI systems powering ClickUp’s voice interface, from research and prototyping to production deployment and performance monitoring.
- • Continuously iterate on voice recognition and command interpretation systems based on user feedback, usage analytics, and emerging AI advancements.
- • Ensure voice features meet scalability, reliability, and privacy standards for a global user base of millions.
- • Work cross-functionally with UX, QA, and infrastructure teams to align voice capabilities with product vision and technical constraints.
- • Stay current with advancements in voice AI, natural language processing, and real-time streaming technologies to inform system design and innovation.
- • Contribute to documentation, model versioning, and monitoring systems to ensure transparency and maintainability of voice AI components.
- • Participate in code reviews, technical design discussions, and sprint planning to ensure high-quality delivery of voice platform features.
- • Advocate for user-centric voice interactions by prioritizing accuracy, speed, and intuitive command structures in all system improvements.
- • Maintain compliance with data privacy and security standards when processing audio and voice data across global regions.
- • Support on-call and incident response for critical voice platform outages or degradation in transcription quality.
Skills & Technologies
About Mango Technologies, Inc.
ClickUp is a San Diego-based productivity platform that unifies tasks, documents, goals, chat, and whiteboards in one cloud workspace. Founded in 2017, the company serves individuals, startups, and large enterprises seeking to replace scattered tools with a single, customizable hub. Its feature set includes hierarchical task management, real-time collaboration, time tracking, reporting dashboards, and hundreds of pre-built templates and integrations. ClickUp targets teams in software development, marketing, and operations that need to plan, execute, and monitor work without switching applications.
Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.
Newsletter
Weekly remote jobs and featured talent.
No spam. Only curated remote roles and product updates. You can unsubscribe anytime.



