Contact Database Restructure

Cleaned and restructured a contact database that had accumulated duplicates and inconsistencies over eight years, enabling more effective stakeholder communication.

Client

Regional Arts Council

Role

Database Administrator

Duration

6 weeks

Tools

PostgreSQL, Python, Mailchimp, Google Sheets

Overview

The Regional Arts Council had been building their contact database for eight years across multiple staff members and systems. The result was over 15,000 records with significant duplication, inconsistent formatting, and unclear categorisation. Email campaigns were bouncing at 18%, and staff couldn't trust the data for funding reports.

Constraints

Time

Six weeks to complete before the autumn funding cycle

Budget

Non-profit budget requiring cost-effective solutions

Team

Two staff members available for validation work

Platform

Needed to remain compatible with existing Mailchimp integration

Approach

Exported all data and ran initial analysis to understand the scale of the problem—duplicate detection, email validation, and field consistency checks.

Built Python scripts to standardise formatting (phone numbers, postcodes, organisation names) and flag probable duplicates using fuzzy matching on names and email addresses.

Created a validation workflow where staff reviewed flagged records in batches, making merge decisions for duplicates and confirming categorisation updates.

Redesigned the database schema with proper constraints and validation rules to prevent the same problems recurring.

Data quality analysis showing duplicate clusters

Validation workflow interface for staff review

New schema design with integrity constraints

Outcome

Record count reduced from 15,000 to 9,200 after deduplication
Email bounce rate dropped from 18% to under 2%
Staff confidence in data accuracy restored
Funding report generation time reduced significantly

The restructured database gave the council reliable data for the first time in years. The validation rules we implemented mean it should stay clean going forward.

Reflection

What worked

Automated detection combined with human validation caught edge cases that pure automation would have missed
Involving staff in the validation process built trust in the final result
Adding constraints to the schema prevents future data quality issues

What didn't

Initial duplicate thresholds were too aggressive and flagged legitimate separate contacts
Some historical categorisation decisions were impossible to understand without context that no longer existed

What I would do differently

Would have built the validation interface earlier in the process. Staff feedback on the first batch would have informed better automation rules for the rest.

Back to Work