
Job Overview
Location
Indiana, USA
Job Type
Full-time
Category
Data Engineer
Date Posted
February 25, 2026
Full Job Description
đź“‹ Description
- • Are you a seasoned Data Engineer with a passion for leveraging Artificial Intelligence to solve complex real-world problems? SouthGeek S.A. is partnering with an innovative real estate technology startup that is revolutionizing the commercial real estate industry through AI-driven intelligence. We are seeking a Senior Data Engineer (AI) to join their dynamic, early-stage team and play a pivotal role in building and scaling their core data infrastructure.
- • This is a unique opportunity to work at the forefront of AI application in a rapidly evolving sector. Our client's platform is designed to transform how commercial real estate teams negotiate and manage leases. By combining cutting-edge AI, robust structured data pipelines, and intuitive user-centered design, they are automating intricate lease workflows, extracting valuable market-aligned insights, and streamlining the proposal generation process. The ultimate vision is to infuse speed, clarity, and data-backed confidence into every stage of the deal lifecycle.
- • As a Senior Data Engineer specializing in AI, you will be instrumental in this mission. This is a hands-on, high-ownership role where you will be responsible for the entire lifecycle of data systems. Your primary focus will be on designing, building, and operating sophisticated systems that meticulously extract, transform, and validate structured data from complex leasing documents. You will own the complete ELT (Extract, Load, Transform) loop, taking raw, often messy, real-world documents and converting them into clean, reliable JSON data that fuels their web applications and other critical downstream systems. The ability to turn unstructured text into actionable, structured data is at the heart of this role.
- • In this fast-paced, early-stage startup environment, agility, adaptability, and a proactive approach are paramount. You will be expected to scope ambiguous problems, experiment with novel AI-driven extraction techniques, and continuously iterate on and refine your data pipelines to enhance accuracy, efficiency, and scalability. Your contributions will directly impact the core functionality and growth of the platform.
- • Key responsibilities will include:
- • Designing and iterating on robust data extraction and transformation pipelines. This involves developing sophisticated processes to convert unstructured leasing documents, such as contracts and agreements, into well-defined, structured JSON formats suitable for programmatic use.
- • Writing and optimizing Large Language Model (LLM) API calls and crafting precise prompts. You will leverage your expertise to effectively query LLMs and guide them to extract and interpret specific textual data at scale, ensuring high accuracy and relevance.
- • Orchestrating complex AI-driven workflows. This includes integrating multiple LLM models and other AI components to intelligently handle a wide variety of document types, formats, and challenging edge cases, ensuring the system's resilience and adaptability.
- • Building and maintaining efficient ELT workflows primarily using Python. You will manage the seamless flow of data from initial ingestion through cloud storage, transformation, and loading into relational databases, ensuring data integrity at every step.
- • Developing comprehensive data quality and validation frameworks. Your focus will be on implementing rigorous checks and balances to guarantee that all structured outputs are accurate, consistent, and ready for production deployment, minimizing errors and ensuring data reliability.
- • Implementing robust monitoring, alerting, and automated quality checks across all extraction pipelines. This proactive approach will ensure the health and performance of the data systems, enabling rapid identification and resolution of any issues.
- • Collaborating closely with product managers and fellow engineering teams. You will work collaboratively to define, refine, and evolve data schemas, ensuring they meet the evolving needs of the product and business intelligence requirements.
- • Owning the data pipeline end-to-end. From the moment raw data is ingested to the final validated structured output, you will have full responsibility and ownership, driving the success of these critical data flows.
- • This role offers a significant opportunity to make a tangible impact on a product that is reshaping a major industry. You will work with a talented team dedicated to innovation and growth, in a supportive environment that values your contributions and professional development. If you are a proactive, skilled Data Engineer eager to apply your AI and data engineering expertise in a challenging and rewarding setting, we encourage you to apply.
Skills & Technologies
Python
PostgreSQL
AWS
Senior
Remote
Degree Required
About SouthGeek S.A.
SouthGeek is an Argentine software development company specializing in scalable web and mobile applications for startups and enterprises. Founded in 2014, the firm offers full-stack engineering, cloud architecture, UX/UI design, and dedicated agile teams. It focuses on fintech, healthcare, and logistics projects across Latin America and the United States, emphasizing clean code, automated testing, and continuous delivery. The company operates remotely from CĂłrdoba, Buenos Aires, and Montevideo, integrating regional talent with global clients to accelerate digital transformation and reduce time-to-market for complex products.



