/* ---- Google Analytics Code Below */

Friday, January 08, 2021

Data Lineage Tools

This problem was brought to my attention.     I don't buy that these are the best,  but they are reasonable examples I will examine.   Especially with regard to impact and potential risk to the outcome of business process.   We built our own then,  prebuilt here.  It is a key part of the intelligence behind data and its use. 

 " .... Data lineage tools document the flow of data in and out of organization systems. They capture end-to-end lineage and ensure impact and risk analysis can be performed in the event of problems or changes to data assets as they move across pipelines.  ... "

The 8 Best Open-Source Data Lineage Tools to Consider   by 7wData

Searching for data integration and data management software can be a daunting (and expensive) process, one that requires long hours of research and deep pockets. The most popular enterprise data lineage tools often provide more than what’s necessary for non-enterprise organizations, with advanced functionality relevant to only the most technically savvy users. Thankfully, there are a distinct group of the best open-source data lineage tools out there. Some of these solutions are offered by vendors looking to eventually sell you on their enterprise product, and others are maintained and operated by a community of developers looking to democratize the process.

In this article, we will examine the best open-source data lineage tools, first by providing a brief overview of what to expect and also with short blurbs about each of the currently available options in the space. This is the most complete and up-to-date directory on the web.

Apatar

Apatar is a free and open-source data integration software package designed to help business users and developers move data in and out of a variety of data sources and formats. The tool requires no programming or design to accomplish even complex integration with joins across several data sources. Apatar provides a visual interface to minimize the impact of system changes. The tool comes with a pre-built set of integration tools and enables users to re-use previously built mapping schemas as well.

CloverETL

CloverETL (now CloverDX) was one of the first open-source ETL tools. The Java-based data integration framework was designed to transform, map, and manipulate data in various formats. CloverETL can be used standalone or embedded and connects to RDBMS, JMS, SOAP, LDAP, S3, HTTP, FTP, ZIP, and TAR. Though the product is no longer offered by the provider, it can be downloaded securely using SourceForge . CloverDX still supports CloverETL in line with their standard support agreement as well.   

Dremio

Dremio offers a product called Data Lake engine that provides fast query speed and a self-service semantic layer that operates directly against Data Lake storage. The solution connects to S3, ADLS, Hadoop or wherever enterprise data resides. Apache Arrow, Data Reflections and other Dremio technologies work together to hasten query speeds, and the semantic layer enables IT to apply security and business meaning. Users do not have to send data to Dremio or have it stored in proprietary formats to access it. .... (more below) .... 

No comments: