Did lots of work with metadata, especially in the corporate laboratory space. Had never heard the term 'Lineage Metadata', though again it was often considered in our work. Now would think this is more important than ever, if we are to create useful predictions, and also add some accuracy to future maintenance of any automated systems that emerge. Lineage predicts changing context. At very least the lineage can determine what errors might exist in the data, but in reality should provide much more in AI. Should always be considered. Lineage can also be considered as part of data asset value, predicting stability of value in context. Also be included in a semantic representation of data involved.
I like that this is presented here. Below an excerpt, much more at the link.- FAD
Lineage Metadata: The Fuel for Data Governance in Informationweek
Moshe Kranc is the chief technology officer at Ness Digital Engineering
The best way to achieve data quality is by combining or blending these three techniques: decoded lineage, data similarity lineage and manual lineage mapping.
Enterprises aspire to derive insights from data that can provide a competitive advantage. The most common impediment to achieving this goal is poor data quality. If the data that is being input to a predictive algorithm is “dirty” (with missing or invalid values), then any insights produced by that algorithm cannot be trusted.
To achieve data quality, it’s not enough to clean up the existing historical data. You also need to ensure that all newly generated data is clean by instituting a set of capabilities and processes known collectively as data governance. In a governed data environment, each type of data has a data steward who is responsible for defining and enforcing criteria for data cleanliness. And, each data value has a clearly defined lineage: We know where it came from, what transformations it underwent along the way, and what other data items are derived from this data value.
Data lineage provides an enterprise with many benefits:
The ability to perform impact analysis and root-cause analysis, by tracing lineage backwards (to find all data that influenced the current data) or forwards (to identify all other data that is impacted by the current data) from a given data item;
Standardization of the business vocabulary and terminology, which facilitates clear communication across business units;
Ownership, responsibility and traceability for any changes made to data, thanks to the lineage’s comprehensive record of who made what changes and when.
It sounds great, but where does data lineage information come from? Looking at a specific data value in the database tells us its current value, but it will not provide information about how the data evolved into its current value. What is missing is data about the data (lineage metadata) that automatically remembers the time and source of every change made to every data item, whether the change was made by software or by a human database administrator.
There are three competing techniques for collecting lineage metadata, each of which has its strengths and weaknesses: .... "
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment