Nessolabs Ltd. logo

Web Scraping Engineer — European Public Procurement

Job Overview

Location

Indonesia

Job Type

Contract

Category

Software Engineering

Date Posted

May 8, 2026

Full Job Description

📋 Description • Build and maintain async scrapers using Python and Playwright to extract tender data from European public procurement portals, starting with Italian platforms like Maggioli PortaleAppalti, ANAC, and MePA, with expansion across Europe. • Handle complex anti-bot protections including FriendlyCaptcha, Mosparo, Cloudflare WAF, and session management (JSESSIONID), implementing IP rotation, rate limit backoff, and retry logic to ensure resilient data collection. • Parse diverse Italian data formats such as monetary values (€ 1.234.567,89), dates (DD/MM/YYYY, textual), and identifiers (CIG/CUP), including detection of placeholders and validation logic. • Extract and process documents in multiple formats — PDF, .p7m (PKCS#7 signed), ZIP/7Z — applying OCR fallback when text extraction fails. • Integrate scrapers into a Prefect orchestration pipeline with monitoring, alerting, and anomaly detection to ensure data quality and pipeline reliability. • Store data using dual-sink architecture with PostgreSQL, Supabase, Clickhouse, and AWS S3, implementing upsert and idempotency patterns for consistency. • Collaborate in a mission-driven environment focused on creating the data backbone for European public procurement, enabling transparency and access to tender opportunities across 100+ e-procurement systems. • Continuously adapt scraper strategies to handle varying HTML layouts, SPAs, and dynamic content across portals that serve different structures across pages or regions. 🎯 Requirements • Strong proficiency in async Python (asyncio), with ability to write non-blocking, efficient scraper logic without reliance on time.sleep(). • Hands-on experience with Playwright or Selenium, including interception of XHR requests, handling of SPAs, and debugging timing and rendering issues. • Expertise in handling real-world anti-bot measures such as CAPTCHAs (FriendlyCaptcha, Mosparo), session cookies, IP rotation, and rate limiting with exponential backoff. • Skill in parsing messy, inconsistent HTML using multi-strategy approaches to extract data from /,
/
,

Skills & Technologies

Python
PostgreSQL
AWS
Selenium
Backend
Onsite

Ready to Apply?

You will be redirected to an external site to apply.

Nessolabs Ltd. logo
Nessolabs Ltd.
Visit Website

About Nessolabs Ltd.

Nessolabs Ltd., operating as Nessodigitale.it, specializes in crafting bespoke software solutions tailored for businesses and providing on-demand developers. They serve clients seeking to enhance their digital operations with custom-built technology. While the provided content doesn't explicitly mention remote work, the nature of software development often lends itself to distributed teams. Nessolabs focuses on delivering high-quality, personalized digital tools to empower their business clientele, ensuring a precise fit for their unique operational needs and strategic goals.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

PDI Technologies Inc. logo

PDI Technologies Inc.

Bangalore
Full-time
Expires Jun 6, 2026
Go
Onsite

1 month ago

Apply
India Hub - Remote
Full-time
Expires May 24, 2026
JavaScript
TypeScript
Git
+3 more

2 months ago

Apply
❌ EXPIRED
London
Full-time
Expired Nov 19, 2025
Remote

8 months ago

Apply
PDI Technologies Inc. logo

PDI Technologies Inc.

Bangalore
Full-time
Expires Jun 6, 2026
Onsite

1 month ago

Apply