Contact Database Restructure
Cleaned and restructured a contact database that had accumulated duplicates and inconsistencies over eight years, enabling more effective stakeholder communication.
Client
Regional Arts Council
Role
Database Administrator
Duration
6 weeks
Tools
PostgreSQL, Python, Mailchimp, Google Sheets
Overview
The Regional Arts Council had been building their contact database for eight years across multiple staff members and systems. The result was over 15,000 records with significant duplication, inconsistent formatting, and unclear categorisation. Email campaigns were bouncing at 18%, and staff couldn't trust the data for funding reports.
Constraints
Time
Six weeks to complete before the autumn funding cycle
Budget
Non-profit budget requiring cost-effective solutions
Team
Two staff members available for validation work
Platform
Needed to remain compatible with existing Mailchimp integration
Approach
Exported all data and ran initial analysis to understand the scale of the problem—duplicate detection, email validation, and field consistency checks.
Built Python scripts to standardise formatting (phone numbers, postcodes, organisation names) and flag probable duplicates using fuzzy matching on names and email addresses.
Created a validation workflow where staff reviewed flagged records in batches, making merge decisions for duplicates and confirming categorisation updates.
Redesigned the database schema with proper constraints and validation rules to prevent the same problems recurring.
Outcome
- Record count reduced from 15,000 to 9,200 after deduplication
- Email bounce rate dropped from 18% to under 2%
- Staff confidence in data accuracy restored
- Funding report generation time reduced significantly
The restructured database gave the council reliable data for the first time in years. The validation rules we implemented mean it should stay clean going forward.
Reflection
What worked
- Automated detection combined with human validation caught edge cases that pure automation would have missed
- Involving staff in the validation process built trust in the final result
- Adding constraints to the schema prevents future data quality issues
What didn't
- Initial duplicate thresholds were too aggressive and flagged legitimate separate contacts
- Some historical categorisation decisions were impossible to understand without context that no longer existed
What I would do differently
Would have built the validation interface earlier in the process. Staff feedback on the first batch would have informed better automation rules for the rest.