Specializing in Enterprise Data Warehousing
and Business Intelligence since 1991.
Current State of Data Quality
What is data quality?
Dirty data - How did it happen?
Major cause for data deficiencies
Problems with current development approach
Symptoms of poor-quality data
Impact of poor-quality data
Cost of poor-quality data
BI proliferation of data quality problems
Data Quality Best Practices
Data uniquenessData relationships
Generalization and sub-typing
Data domains
Data dependencies
Data completeness
Data accuracy and precision
Data consistency
Naming standards
Common Data Quality Violations
Dirty data categoriesDummy (default) values
"Intelligent" dummy values
Missing values
Multi-purpose fields
Cryptic values
Free-form address lines
Contradicting values
Violation of business rules
Reused primary key
Non-unique primary key
Missing data relationships
Inappropriate data relationships
Data Quality Improvement Practices
To cleanse or not to cleanse…Source data profiling
Categorization by data significance
Data cleansing triage (prioritization)
Operational source data repairs
Data defect prevention
Data quality training
Continuous data quality improvement
Management support and sponsorship
Enterprise-wide Data Quality Disciplines
DQ maturity levelsData quality improvement steps
Data stewardship
Data ownership
Standards and procedures
Enterprise architecture
Enterprise data model
Logical versus physical data models
Cross-organizational development approach
Coordinated ETL staging
ETL reconciliation
Responsibility for Data Quality
Data ownersData stewards
Business representatives
Data administrator
Data quality analyst
Meta data administrator
Database administrator
Enterprise information architect
Developers
Project manager
Support personnel
Auditor
Organizational Changes and Organizational impact
Inevitable culture shiftIncreased user role
Accountability for data quality
New charge-back structure
New incentives
New leadership
Back