
Job Overview
Location
Remote - Spain
Job Type
Full-time
Category
Data Engineer
Date Posted
March 3, 2026
Full Job Description
đź“‹ Description
- • As a Content Developer Engineer at Clio, you will play a pivotal role in shaping the future of legal technology by building and maintaining robust data-collection pipelines. Your primary focus will be on gathering critical information from the public web and various partner sources, ensuring the accuracy, scalability, and resilience of our acquisition systems.
- • This role is instrumental in expanding Clio's comprehensive legal content database, which serves as a foundational element for our AI-driven legal solutions. You will be responsible for designing, developing, and operating sophisticated web crawlers and scrapers capable of handling a wide array of content formats, including HTML, JSON, XML, PDFs, and images.
- • Your expertise will be crucial in transforming this extracted data into structured, usable formats that power downstream systems and research features, directly contributing to Clio's mission of transforming the legal experience and increasing access to justice.
- • You will be tasked with building, maintaining, and continuously improving web crawlers and scrapers, leveraging modern Node.js tooling such as Puppeteer and Playwright to achieve efficient and effective data extraction.
- • Implement advanced scraping strategies tailored for both static and dynamic websites, employing browser automation techniques where necessary to navigate complex web structures and retrieve data accurately.
- • Demonstrate a strong command of HTTP and FTP protocols, adeptly managing requests and responses, handling various authentication methods, optimizing headers, implementing caching strategies, and adhering to rate limiting protocols to ensure smooth and compliant data acquisition.
- • Ensure the resilience and reliability of content pipelines by integrating comprehensive error handling mechanisms, implementing intelligent retry logic, and performing rigorous schema validation to maintain data integrity.
- • You will be responsible for parsing and transforming input data from a diverse range of formats, including but not limited to HTML/XHTML, XML, JSON, PDFs, Word documents, and images, ensuring compatibility with our data infrastructure.
- • Convert extracted content into pre-defined schemas with consistent validation, guaranteeing the accuracy and usability of data for all downstream systems and analytical processes.
- • Gain experience working with various data storage systems, encompassing both relational and NoSQL databases, to effectively manage and organize the vast datasets acquired through your pipelines.
- • Uphold high-quality standards by writing thorough unit and integration tests for all crawlers and parsers, ensuring deterministic execution and predictable outcomes.
- • Produce clear, concise, and comprehensive technical documentation, including detailed setup instructions, operational runbooks, and effective troubleshooting guides for identified edge cases.
- • Actively participate in peer code reviews, providing constructive feedback and maintaining test fixtures to enhance the operability and extensibility of our systems.
- • Foster strong collaborative relationships with engineering and product leads, proactively clarifying requirements, sharing project progress, and identifying and escalating potential risks early in the development lifecycle.
- • Utilize project management tools such as DevOps or Linear to effectively plan, track, and communicate your work, ensuring alignment with team objectives and project timelines.
- • Showcase your ability to work autonomously, translating high-level requirements into actionable tasks and delivering high-quality results with minimal supervision.
- • Contribute to a culture of innovation and continuous improvement within the engineering team, sharing knowledge and best practices related to web scraping, data engineering, and pipeline development.
- • This role offers a unique opportunity to work at the intersection of cutting-edge AI technology and the legal industry, making a tangible impact on how legal professionals operate and how justice is accessed globally.
- • You will be part of a dynamic, fast-paced environment where your contributions are valued and directly influence the success of Clio's product offerings and strategic goals.
- • Embrace the challenge of working with complex, real-world data, developing creative solutions to overcome extraction hurdles and ensure data quality.
- • Collaborate with a talented and diverse team of engineers, product managers, and legal experts, fostering a rich learning environment.
- • Take ownership of critical data pipelines, ensuring their ongoing performance, reliability, and evolution to meet the growing demands of the legal market.
- • This position is ideal for a proactive and detail-oriented individual with a passion for data, technology, and problem-solving, eager to make a significant impact in a rapidly growing tech company.
- • You will be empowered to experiment with new technologies and approaches to data collection and processing, contributing to Clio's technological leadership in the legal AI space.
- • The role requires a blend of strong technical skills in JavaScript development and a keen understanding of data structures and web technologies, enabling you to build efficient and effective data solutions.
- • Your work will directly support Clio's mission to improve the lives of legal professionals and enhance access to justice, providing a sense of purpose and impact beyond typical software development roles.
Skills & Technologies
JavaScript
TypeScript
Node.js
Remote
About Themis Solutions Inc.
Themis Solutions Inc. operates as Clio, providing cloud-based legal practice management software for law firms. Its platform integrates case management, time tracking, billing, document management, and client communication tools. Serving solo practitioners to large firms globally, Clio emphasizes data security, mobile access, and workflow automation, enabling legal professionals to manage practices efficiently while maintaining compliance with industry standards.
Similar Opportunities
Brazil
Full-time
Expires May 3, 2026
Python
AWS
Senior
+1 more
3 days ago

Pyyne Inc.
Brazil (Remote)
Full-time
Expires May 3, 2026
Python
JavaScript
TypeScript
+1 more
3 days ago


