
Data Quality & Dedup Agent
Purpose
This agent automatically analyzes structured files or databases to identify duplicates, anomalies, missing values, and inconsistencies. It helps teams assess and improve data quality before reporting, migration, MDM initiatives, or operational use.
Primary users
The primary users are data teams, consultants, business analysts, MDM teams, data owners, IT teams, and transformation teams responsible for data quality and reliability.
Where it fits (process/stage/trigger)
It fits during data audits, migration preparation, MDM projects, reporting reliability checks, database cleanup, and digital transformation initiatives. It is triggered when Excel, CSV, or database extracts need to be checked, deduplicated, enriched, or corrected.
Key capabilities / workflow
The agent ingests structured data files, profiles quality issues by field, detects duplicates, missing values, abnormal formats, and incoherent records, then applies business rules or external reference checks. It can use sources such as the SIRENE API to verify company data including SIREN, company name, and address. If the quality score remains insufficient, the workflow loops back to refine rules and validations.
Inputs
Inputs include Excel files, CSV files, database extracts, business rules, domain constraints, expected data repositories, external reference data, and optional SIRENE API access for company data verification.
Outputs / Deliverables
Outputs include a detailed data quality report, duplicate and anomaly counts, missing value analysis, inconsistency mapping by field, correction recommendations, a global data quality score, and an optional cleaned dataset.
Value
The agent accelerates data quality diagnostics, reduces manual cleaning effort, improves data reliability, supports MDM and transformation programs, and helps teams identify the most important correction actions before downstream use.