The Eponymous Pickle: BenchMarking

Showing posts with label BenchMarking. Show all posts

Monday, May 08, 2023

DIAMETRICS: Benchmarking Query Engines at Scale

Benchmarking and Performance Monitoring

DIAMETRICS: Benchmarking Query Engines at Scale

By Shaleen Deep, Anja Gruenheid, Kruthi Nagaraj, Hiro Naito, Jeff Naughton, Stratis Viglas

Communications of the ACM, December 2022, Vol. 65 No. 12, Pages 105-112

10.1145/3567464

This paper introduces DIAMETRICS: a novel framework for end-to-end benchmarking and performance monitoring of query engines. DIAMETRICS consists of a number of components supporting tasks such as automated workload summarization, data anonymization, benchmark execution, monitoring, regression identification, and alerting. The architecture of DIAMETRICS is highly modular and supports multiple systems by abstracting their implementation details and relying on common canonical formats and pluggable software drivers. The end result is a powerful unified framework that is capable of supporting every aspect of benchmarking production systems and workloads. DIAMETRICS has been developed in Google and is being used to benchmark various internal query engines. In this paper, we give an overview of DIAMETRICS and discuss its design and implementation. Furthermore, we provide details about its deployment and example use cases. Given the variety of supported systems and use cases within Google, we argue that its core concepts can be used more widely to enable comparative end-to-end benchmarking in other industrial environments.

Saturday, April 23, 2022

Supply Chain Benchmarking

Important to compare to competitive uses, and changes over time and industry changes.

Why is Supply Chain Benchmarking Important? By Marisa Brown in APQC

Supply chain benchmarking is important because managers need to: understand how their supply chains compare to competitors’, evaluate “as is” conditions before they can determine what to fix, encourage innovation, compare performance across business units, and leverage data for restructuring and change.

APQC defines benchmarking as the process of comparing and measuring your organization against others, anywhere in the world, to gain information on philosophies, practices, and measures that will help your organization take action to improve its performance. Benchmarking gathers the tacit knowledge—the know-how, judgments, and enablers—that explicit knowledge often misses.

The benefits of supply chain benchmarking start with the task of collecting the necessary data and converting it into industry-standard practices and metrics. The simple—and not so simple—act of gathering key operational measures on one scorecard provides a broad snapshot of current performance. This data-gathering process can be a useful cross-functional, team-building exercise when it helps managers who are ultimately responsible for making changes better understand what needs to be done.

Benchmarking can be seen as the systematic process of searching for best practices, innovative ideas, and more productive operating methods. Strategic benchmarking helps make sure that improvement efforts and resources are directed at activities that will move the organization forward.

Comparative information that shows key performance gaps can spark change within underperforming areas and focus limited resources. Of course, any benchmarking activity is pointless if it doesn’t prompt an organization into action. Download APQC's free infographic Ensuring Supply Chain Success with Measures to begin your Supply Chain Benchmarking journey.

How is Benchmarking Done in Supply Chain? .... '

Tuesday, September 15, 2020

Inconsistent Benchmarking Found

Important finding. Further classification of form of inconsistency would also be useful for later pre checking new papers.
.
Researchers find ‘inconsistent’ benchmarking across 3,867 AI research papers By Kyle Wiggers in Venturebeat

The metrics used to benchmark AI and machine learning models often inadequately reflect those models’ true performances. That’s according to a preprint study from researchers at the Institute for Artificial Intelligence and Decision Support in Vienna, which analyzed data in over 3,000 model performance results from the open source web-based platform Papers with Code. They claim that alternative, more appropriate metrics are rarely used in benchmarking and that the reporting of metrics is inconsistent and unspecific, leading to ambiguities.

Benchmarking is an important driver of progress in AI research. A task (or tasks) and the metrics associated with it (or them) can be perceived as an abstraction of a problem the scientific community aims to solve. Benchmark data sets are conceptualized as fixed representative samples for tasks to be solved by a model. But while benchmarks covering a range of tasks including machine translation, object detection, or question-answering have been established, the coauthors of the paper claim some — like accuracy (i.e., the ratio of correctly predicted samples to the total number of samples) — emphasize certain aspects of performance at the expense of others. ... "

Friday, January 03, 2020

Baidu and GLUE China Benchmarks

China and langage meaning. A very key part of ultimately delivering useful conversation.

Baidu has a new trick for teaching AI the meaning of language in 7Wdata

Earlier this month, a Chinese tech giant quietly dethroned Microsoft and Google in an ongoing competition in AI. The company was Baidu, China’s closest equivalent to Google, and the competition was the General Language Understanding Evaluation, otherwise known as GLUE.

GLUE is a widely accepted benchmark for how well an AI system understands human language. It consists of nine different tests for things like picking out the names of people and organizations in a sentence and figuring out what a pronoun like “it” refers to when there are multiple potential antecedents. A language model that scores highly on GLUE, therefore, can handle diverse reading comprehension tasks. Out of a full score of 100, the average person scores around 87 points. Baidu is now the first team to surpass 90 with its model, ERNIE. .... "

Friday, November 29, 2019

Benchmarking a Big Quantum Computer

This article ultimately gets very technical, but attracted me because the bit size starts to get interesting for real problems. The abstract and intro part are enough to give you a feeling for advances and their implications. I post it here for a look I will do at the use of such systems for supply chain optimization.

Benchmarking an 11-qubit quantum computer

K. Wright, K. M. Beck, S. Debnath, J. M. Amini, Y. Nam, N. Grzesiak, J.-S. Chen, N. C. Pisenti, M. Chmielewski, C. Collins, K. M. Hudek, J. Mizrahi, J. D. Wong-Campos, S. Allen, J. Apisdorf, P. Solomon, M. Williams, A. M. Ducore, A. Blinov, S. M. Kreikemeier, V. Chaplin, M. Keesan, C. Monroe & J. Kim

Nature Communications volume 10, Article number: 5464 (2019) in Nature.com

Abstract
The field of quantum computing has grown from concept to demonstration devices over the past 20 years. Universal quantum computing offers efficiency in approaching problems of scientific and commercial interest, such as factoring large numbers, searching databases, simulating intractable models from quantum physics, and optimizing complex cost functions. Here, we present an 11-qubit fully-connected, programmable quantum computer in a trapped ion system composed of 13 171Yb+ ions. We demonstrate average single-qubit gate fidelities of 99.5%, average two-qubit-gate fidelities of 97.5%, and SPAM errors of 0.7%. To illustrate the capabilities of this universal platform and provide a basis for comparison with similarly-sized devices, we compile the Bernstein-Vazirani and Hidden Shift algorithms into our native gates and execute them on the hardware with average success rates of 78% and 35%, respectively. These algorithms serve as excellent benchmarks for any type of quantum hardware, and show that our system outperforms all other currently available hardware. .... " .... '

Monday, September 03, 2018

Requirements for an Enterprise AI Benchmark

(Update to a recent talk of note given on 8/23/2018)

Slides
Talk recording.

ISSIP Cognitive Systems Institute Group Webinar

full series here http://cognitive-science.info/community/weekly-update/

Talk Title: Requirements for an Enterprise AI Benchmark

Speakers: Cedric Bourrasset, Atos Bull; Rajesh Bordawekar, IBM

Talk Description:

At present, AI benchmarks either focus on evaluating deep learning approaches or infrastructure capabilities. These approaches don’t capture end-to-end performance behavior of enterprise AI workloads. It is also clear that there is not one reference metric that will be suitable for all AI applications nor all existing platforms. Cedric and Rajesh first present the state of the art regarding the current basic and most popular AI benchmarks. They then present the main characteristics of AI workloads from various industrial domains. Finally, they focus on the needs for ongoing and future industry AI benchmarks and conclude on the gaps to improve AI benchmarks for enterprise workloads.

Cedric Bourasset : After receiving a Ph.D. in Electronics and computer vision in 2016 from the Blaise Pascal University of Clermont-Ferrand defending the dataflow model of computation for FPGA High Level Synthesis problematic in embedded machine learning application, Cedric is now working as AI Product Manager at Atos Bull with the mission to develop Atos AI product line. One product is a software solution for developing AI enterprise solutions and the other one is computer vision solution for people detection, tracking and reidentification into multi-camera environments.

Rajesh Bordawekar: Rajesh is a member of the Systems Acceleration department at the IBM T. J. Watson Research Center. Prior to joining IBM Research in September 1998, he was a post-doctoral fellow at the Center for Advanced Computing Research, California Institute of Technology.
He received his PhD in Computer Engineering from Syracuse University.
Rajesh studies interactions between applications, programming languages/runtime systems, and computer architectures.   He is interested in understanding how modern hardware, multi-core processors, GPUs, and SSDs impact design of optimal algorithms for main memory and out-of-core problems. ....

Join LinkedIn Group https://www.linkedin.com/groups/6729452

Monday, July 02, 2018

Benchmark Suite for Machine Learning

Worthwhile direction.

A new benchmark suite for machine learning
MLPerf is a new set of benchmarks compiled by a growing list of industry and academic contributors. By Ben Lorica in O'Reilly

We are in an empirical era for machine learning, and it’s important to be able to identify tools that enable efficient experimentation with end-to-end machine learning pipelines. Organizations that are using and deploying machine learning are confronted with a plethora of options for training models and model inference, at the edge and on cloud services. To that end,MLPerf, a new set of benchmarks compiled by a growing list of industry and academic contributors,was recently announced at the recent Artificial Intelligence conference in NYC. .... "

List of problems include: Image classification, Object detection, Speech to text, Translation, Recommendation, Sentiment Analysis, Reinforcement Learning ...

See MLPerf.org

Sunday, July 01, 2018

AI Performance Benchmarks

Metrics can help get us out of purely hype based impressions.

The challenge of finding reliable AI performance benchmarks
By James Kobielus in SiliconAngle

Artificial intelligence can be extremely resource-intensive. Generally, AI practitioners seek out the fastest, most scalable, most power-efficient and lowest-cost hardware, software and cloud platforms to run their workloads.

As the AI arena shifts toward workload-optimized architectures, there’s a growing need for standard benchmarking tools to help machine learning developers and enterprise information technology professionals assess which target environments are best suited for any specific training or inferencing job. Historically, the AI industry has lacked reliable, transparent, standard and vendor-neutral benchmarks for flagging performance differences between different hardware, software, algorithms and cloud configurations that might be used to handle a given workload.

In a key AI industry milestone, the newly formed MLPerf open-source benchmark group last week announced the launch of a standard suite for benchmarking the performance of ML software frameworks, hardware accelerators and cloud platforms. The group — which includes Google, Baidu, Intel, AMD and other commercial vendors, as well as research universities such as Harvard and Stanford — is attempting to create an ML performance-comparison tool that is open, fair, reliable, comprehensive, flexible and affordable. ... "

Friday, October 30, 2015

Example Use of Watson for Social Benchmarking

Always looking for good, simple examples of the use of cognitive methods, and thus also Watson. Just recently connected with Eric Santos, of Benchmark Intelligence, and he writes how they use Watson for their social intelligence bench marking and trending. A good example of what can be done.

" ... Benchmark is a product suite that helps retail chains understand why certain locations perform better than others. Benchmark discovers the factors (customer service, product quality, cleanliness, etc) that affect unit performance. Benchmark Intelligence is a proud IBM Watson ecosystem partner.

Currently Benchmark collects its data through various ways which includes social media listening, SMS comments, surveys and field audits. A good portion of this data is qualitative and unstructured. We needed a way to run analysis on this data and identify trends, that’s why we turned to IBM Watson.

Benchmark is leveraging Watson’s Alchemy languages, specifically their sentiment analysis and keyword extraction technologies. We are using these cognitive technologies to analyze this unstructured data and discover the variables (customer service, cleanliness, etc.) that affect performance at each location.

Watson looks at thousands of open-ended data points (social media reviews, SMS comments, etc.) on our platform for any given chain. For each data point Watson defines whether the statement as a whole is positive, negative or neutral. Watson also identifies the key words that make up the statement. That way as locations gather more data points, we can identify the trends that are going on at each location in the chain.

Example use cases of this include knowing that customers complained about cockroaches at a specific location 5 times in one week and customers at another location in the same chain complained about a cashier named Bryan 4 times in one week.

Once Benchmark understand what these trends are, we can surface actionable insights that retail chains can use to improve the performance across their portfolio of locations. .... "

Monday, October 12, 2015

Benchmark Intelligence

Brought to my attention by the Brandery. Benchmark Intelligence. " ... The New way to Manage a Chain ... Operational Business Intelligence for Restaurant Chains (and more) ... Streamlined field auditing, text based customer feedback with innovative reporting and more. ... "

Thursday, June 25, 2015

Global IT Trends Research

Excellent report on IT trends, via Jerry Luftman of GIIM. " ...

GLOBAL IT TRENDS RESEARCH
A significant set of IT benchmarking insights that provide organizations with a point of reference (domestic & global) for important business, technology, sourcing, social, and spending transformations. ... "

Friday, February 06, 2015

Whats Next for Data Security in Retail?

In Retailwire: " ... According to Boston Retail Partners' 2015 POS/Customer Engagement Benchmarking Survey, payment security ranked among the top three priorities by retailers for 2015 for the first time in 16 years. More than 63 percent of the respondents indicated payment security, and protecting the confidentiality of sensitive information is among their top-three priorities. ... "

Friday, December 13, 2013

IT as a Growth Engine for CPG

New from the GMA:

" ... BCG and GMA study: CPG companies can benefit from viewing IT as a growth engine A large proportion of consumer packaged goods companies are following a less than optimal approach in their information technology strategies, according to a new report by the Grocery Manufacturers Association and The Boston Consulting Group. The report, GMA Information Technology Benchmarking 2013: The New Mission for IT in CPG, is available for download.... "

Monday, December 02, 2013

Real time Cloud Retail Data Analysis

This site was recently brought to my attention. I am currently in the process of helping a retail marketing operation better understand how well it is doing when doing promotions under varying conditions. IBM Digital Analytics Benchmark Hub: Your source for real-time cloud-based online retail data and analysis. As they describe it further:

IBM Digital Analytics Benchmark
View performance benchmarks for your peers and competitors to help you uncover opportunities to grow your digital marketing and web properties. ...

In general, the small to medium size business has a more difficult time in getting the data to understand how their business is operating. In the enterprise we had many sources of data and expertise. In the small to medium sized firms you track your own sales, but suppose you are trying to understand how well your promotional activities are working. What promotions should you use in the future, what are your competitors doing, what are the demographics of your buyers? All key questions to ask. New analytics exist to measure each of this questions, and the site includes tools and articles about them. They point me to a study:

Read the study: How Marketing is Taking Charge: Leading the Customer Experience
Learn what leading marketers are doing to differentiate themselves in a perpetually shifting omnichannel world. ....

Which highlights the fact that we are heading toward a very multi channel world. Also a world where it is not about which channel we use, but how many channels we use to support a purchase and even what order we use them in.

This is all about leveraging data and analytics. In rapidly changing conditions. Benchmarking the results against your business sector, and helping you plan for the future.

An interesting place to start. I plan to give it a try.

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don't necessarily represent IBM's positions, strategies or opinions. #MidsizeIBM

Tuesday, May 15, 2012

International Institute for Analytics

Bill Franks has a number of interesting posts at the International Institute for Analytics blog. Like a recent post on in-memory analytics. " ... At IIA, we study how leading enterprises use analytics to dominate their industries. This focus allows us to go deeper than other multi-disciplined research firms. We are the market leader in benchmarking analytical capabilities and we specialize in supporting executive leadership by industry and function.... "

Wednesday, May 26, 2010

Tribalization of Business Study

I am passing along the SNCR Tribalization of Business Study invitation. I used to be a Fellow there and they produce some interesting work. Pass it on!

" ... We're pleased to announce the launch of the 3rd annual Tribalization of Business Survey (http:// 2010tribalizationofbusiness.com).

We hope that you will once again join us in taking the survey and perhaps also participate in the upcoming qualitative interviews that make up the second part of the annual study.

As you may recall, the Tribalization of Business Study is sponsored by Beeline Labs, Deloitte and the Society for New Communications Research. The yearly study has come to be known as a valuable resource for companies that plan on leveraging social media and communities as part of their business, as well as a benchmarking tool for those already engaged.

In return for your time and your valuable input, we will send you preliminary results of the study. In addition, you will receive a special invitation to take part in a webinar with the study's authors, Francois Gossieaux, Beeline Labs partner and SNCR Senior Fellow, and Ed Moran of Deloitte, and a special discount to attend the 5th Annual SNCR Research Symposium & Awards Gala, where the study's findings will be shared ... '

Take this survey
.

The Eponymous Pickle

About Me

RSS

Blog Archive

Monday, May 08, 2023

DIAMETRICS: Benchmarking Query Engines at Scale

Saturday, April 23, 2022

Supply Chain Benchmarking

Tuesday, September 15, 2020

Inconsistent Benchmarking Found

Friday, January 03, 2020

Baidu and GLUE China Benchmarks

Friday, November 29, 2019

Benchmarking a Big Quantum Computer

Monday, September 03, 2018

Requirements for an Enterprise AI Benchmark

Monday, July 02, 2018

Benchmark Suite for Machine Learning

Sunday, July 01, 2018

AI Performance Benchmarks

Friday, October 30, 2015

Example Use of Watson for Social Benchmarking

Monday, October 12, 2015

Benchmark Intelligence

Thursday, June 25, 2015

Global IT Trends Research

Friday, February 06, 2015

Whats Next for Data Security in Retail?

Friday, December 13, 2013

IT as a Growth Engine for CPG

Monday, December 02, 2013

Real time Cloud Retail Data Analysis

Tuesday, May 15, 2012

International Institute for Analytics

Wednesday, May 26, 2010

Tribalization of Business Study