data quality
scroll ↓ to Resources
Note
- Dimensions
- accuracy
- completeness
- consistency
- timeliness
- validity
- Databricks expectations platform
- Issues
- missing data
- duplication
- freshness
- personal data
- data labeling mistakes
- Key topics
- metadata
- real-time processing
- lineage
- unstructured data
- leakage
-CI/CD
Data Quality management lifecycle
- Discovery
- Talk to stakeholders: business users, DS\DE, executives
- Requirement gathering: surveys, workshops, interviews, etc.
- Rule setting
- business rules
- semantic rules
- industry
- compliance
- Data Profiling
- identify trends and outlier detection
- metadata management
- Cleansing & Standardization
- Monitoring & Reporting
- data lakehouse monitoring in Databricks
- data drift
Resources
- Monitoring Data Quality at Scale with Statistical Modeling | Uber Blog
- Learn Practical Techniques for Applying Data Quality in the Lakehouse with Databricks
Transclude of base---related.base
Links to this File
table file.inlinks, filter(file.outlinks, (x) => !contains(string(x), ".jpg") AND !contains(string(x), ".pdf") AND !contains(string(x), ".png")) as "Outlinks" from [[]] and !outgoing([[]]) AND -"Changelog"