Data Cleaning, Deduplication, Enrichment & CRM Migration (Accounting Database ~200k Records)

Data Cleaning, Deduplication, Enrichment & CRM Migration (Accounting Database ~200k Records)

Data Cleaning, Deduplication, Enrichment & CRM Migration (Accounting Database ~200k Records)

Upwork

Upwork

Remoto

12 hours ago

No application

About

📝 Job Title Data Cleaning, Deduplication, Enrichment & CRM Migration (Accounting Database ~200k Records) 📋 Project Overview We are building a clean, relational accounting-sector database to support Salesforce (or similar CRM). You will take two master spreadsheets—one for candidates and one for client contacts—apply fuzzy matching and deduplication, enrich and normalise the data, create company/account records, and import the final dataset. The starting point is up to 200,000 raw records (approx. 100,000 unique contacts after deduplication). Accuracy, data integrity, and careful record-linking are essential. 🏗️ Scope of Work & Deliverables 1️⃣ Fuzzy Match & Deduplicate Fuzzy-match candidates and client contacts across both sheets (name, email, phone, company). Merge duplicates into single parent records. Collate and preserve all contact details (e.g., secondary emails/phones). Assign and write back unique IDs for: Candidate records Client contact records 2️⃣ Create Company / Account Data From the deduped contacts, build a clean Account/Company table. Standardise company names (e.g., “PwC” vs. “PricewaterhouseCoopers”). Assign unique company IDs and link all relevant contacts and candidates to their account. 3️⃣ Light Enrichment (Clay + OpenAI) Use Clay tables and minimal OpenAI credits to: Add zip/post codes where missing. Normalise job titles and skills into predefined picklist values. Ensure enrichment is low-token and cost-efficient. 4️⃣ Prepare & Perform CRM Import Structure final CSVs for Accounts, Contacts, and Candidates with all required relationships and unique IDs. Run test imports and perform the full migration into Salesforce (or the final system we specify). Deliver final audit/QC report confirming record counts and linkages. 🧩 Technical & Process Requirements Strong experience with fuzzy matching & deduplication (Python/SQL/Excel/Power Query or equivalent). Comfort with unique ID generation and writing IDs back to related rows. Familiarity with Clay tables and OpenAI API for low-cost enrichment. Knowledge of Salesforce or similar CRM imports. 💰 Payment Structure (Milestone-Based) Milestone Deliverable % of Total 1. Fuzzy Match & Deduplication Complete Two clean master lists (candidates & client contacts) with unique IDs 20% 2. Company/Account Table Built & Linked Final company table with unique company IDs; contacts/candidates linked 15% 3. Light Enrichment Complete Enriched dataset with picklist job titles, skills, and zip/post codes 15% 4. Final Import & QC Report Successful CRM import, full audit of record counts and links 50% 50% of the payment is released only after full migration and successful QC sign-off.