/* ---- Google Analytics Code Below */

Saturday, October 01, 2022

Teradata's Data Lakehouse

 Do I need a Lakehouse? 

Data Infrastructure by TeraData

Last week Teradata offered its long-awaited response to the emergence of the data lakehouse. As VentureBeat’s George Lawton reported, Teradata has always differentiated itself by stretching the capabilities of analytics, first with massively parallel processing on its own specialized machines, and more recently, with software-defined appliances tuned for variations in workloads — from compute-intensive to IOPS (input/output operations per second)-intensive. And since the acquisition of Aster Data Systems over a decade ago, Teradata morphed from solving big analytics problems to solving any analytics problem with a diverse portfolio of analytic libraries stretching SQL to new areas such as path or graph analytics.

With the cloud, we’ve been waiting for when Teradata would fully exploit cloud object storage, which is the de facto data lake. So the dual announcements last week of VantageCloud Lake Edition and ClearScape Analytics were logical next steps on Teradata’s journey to the data lakehouse. Teradata is finally making cloud storage a first-class citizen and opening it up to its wide analytics portfolio.

Summit Cloud in a Web 3.0 World How Edge computing drives platforms that converge the needs of IT, Ops, and Developers_Landscape

But unlike Teradata’s previous moves to parallelized and polyglot analytics, where it led the field, this time with the lakehouse, it has company. The announcement might not have mentioned the lakehouse word, but that’s what it was all about. As we noted several months back, almost everyone in the data world including Oracle, Teradata, Cloudera, Talend, Google, HPE, Fivetran, AWS, Dremio and even Snowflake has felt compelled to respond to Databricks, which introduced the data lakehouse.

Teradata’s path to the data lakehouse

Nonetheless, Teradata approaches the data lakehouse with some unique twists and is all about optimization. Teradata’s secret sauce has always been about highly optimized compute, interconnects, storage and query engines, along with workload management designed to run compute resources up to 95% utilization. When commodity hardware got good enough, Teradata introduced IntelliFlex where performance and optimizations could be configured through software. The capability to optimize for hardware not-invented-here opened the door to Teradata optimizing for AWS, and down the road, the other hyperscalers.

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Teradata introduced VantageCloud a year ago, and late last year ran a 1,000+ node benchmark that no other cloud analytics provider has so far matched. But this was for a more conventional data warehouse using customary block storage.

The complication in making the lakehouse happen was developing a table format for data sitting in cloud object storage. That allows all the niceties associated with data warehouses, such as ACID transactions, which are key to ensuring consistency of data, more granular security and access controls, and raw performance. Databricks fired the first shot with Delta Lake, and more recently, other providers from Snowflake to Cloudera and others have embraced Apache Iceberg, the common thread being that this is all based on open source technology. For Lake Edition, Teradata went its own way with its own data lake table format, which the company claims delivers superior performance compared to Delta and Iceberg.

The other side of the lakehouse coin is software. Aside from its SQL engine, which has been designed to handle large, complex queries that can join up to hundreds of tables, Teradata has a large portfolio of analytic libraries that run in-database. This has been one of Teradata’s best-kept secrets. Largely the legacy of the Aster Data acquisition over a decade ago, these analytics were specially tuned to exploit the underlying parallelism, and they went well beyond SQL, encompassing functions such as n-Path, graph, time series analysis, and machine learning, all accessed through SQL extensions.  ... '

No comments: