Cross-portal project grouping
Entity resolution collapses 8,000 annual tenders across 17 portals into approximately 1,500 coherent procurement projects. Instead of tracking duplicate notices, re-advertised lots, and sub-packages separately, follow the project — from planning through PIN to design brief to construction tender to subcontracts.
The deduplication problem
Pacific-aid infrastructure procurement is fragmented by design. A single power grid upgrade project may produce: a preliminary engineering notice on ADB CMS, a procurement plan update on ADB Documents, an invitation for expressions of interest on AusTender, three lots on Tenderlink (PNG Power, Solomon Power, EFL), a request for qualifications on ADB eTendering, and a subcontract notice on two Tenderlink utility portals. Six portals, eight records, one project.
Without project grouping, capture teams track 8 separate records, receive 8 separate alerts, and miss the connections between them. With cross-portal projects, all 8 records collapse into a single project view — with a timeline showing every stage as it posted, and a unified set of tender documents.
How entity resolution works
- 01
Scope-text embedding similarity
Tender descriptions are embedded and compared across portals. Records with cosine similarity above a threshold are candidates for merging — weighted against buyer name, country, sector, and indicative budget band.
- 02
Structured metadata matching
Project numbers (ADB project IDs, World Bank P-numbers, DFAT program IDs) are extracted from tender texts and documents. Exact matches are merged automatically with high confidence.
- 03
Buyer and donor chain graph
The system maintains a graph of buyer organisations and their donor relationships. A PNG Power tender and an AusTender entry both referencing AIFFP financing are structurally related — this narrows the similarity search space.
- 04
Curator queue for ambiguous cases
Cases below the confidence threshold are queued for human review. A curator sees the two candidates side-by-side with the similarity signals highlighted, and makes a merge / split / flag decision. Curator decisions feed the model as training signal.