/* ---- Google Analytics Code Below */

Thursday, July 11, 2019

Integrating Metadata

In the latest ACM, interesting piece on transferring data.   And notable about mentioning the metadata ultimately involved ....

Extract, Shoehorn, and Load  By Pat Helland
Communications of the ACM, July 2019, Vol. 62 No. 7, Pages 32-3310.1145/333113

A lot of data is moved from system to system in an important and increasing part of the computing landscape. This is traditionally known as ETL (extract, transform, and load). While many systems are extremely good at this process, the source for the extraction and the destination for the load frequently have different representations for their data. It is common for this transformation to squeeze, truncate, or pad the data to make it fit into the target. This is really like using a shoehorn to fit into a shoe that is too small. Sometimes it's a needed step. Frequently it's a real pain!

Two major parts of ETL are the extraction and the load. These processes are where the rubber meets the participating data stores.

Extraction pulls data out of a source system. This may be relational data kept in a database. If so, it may be converted to an object relational format where each object transforms the join of multiple relational rows into a cohesive thing. Data is frequently organized as messages when it is sucked out. It's also common for data to be extracted from key-value stores where it is kept in a semi-structured representation.

Load happens when the data is placed into the target system. The target will have its own metadata describing the shape and form of the data in its belly. If the target is an analytics system, then its data will likely be loaded into a relational form.

While it may be counterintuitive, it is frequently useful to take relational data out of a system as objects; convert, massage, and shoehorn the data from one object representation to another; and load it into the target system in relational form.  .... "

No comments: