The Eponymous Pickle: Data Assets

Showing posts with label Data Assets. Show all posts

Friday, October 01, 2021

The Value of Phone Location Data

The Another example of how valuable personal data can be.

ACM NEWS

There's a Multibillion-Dollar Market for Your Phone's Location Data

By The Markup, September 30, 2021

Companies that you likely have never heard of are hawking access to the location history on your mobile phone. An estimated $12 billion market, the location data industry has many players: collectors, aggregators, marketplaces, and location intelligence firms, all of which boast about the scale and precision of the data that they've amassed.

Location firm Near describes itself as "The World's Largest Dataset of People's Behavior in the Real-World," with data representing "1.6B people across 44 countries." Mobilewalla boasts "40+ Countries, 1.9B+ Devices, 50B Mobile Signals Daily, 5+ Years of Data." X-Mode's website claims its data covers "25%+ of the Adult U.S. population monthly."

In an effort to shed light on this little-monitored industry, The Markup has identified 47 companies that harvest, sell, or trade in mobile phone location data. While hardly comprehensive, the list begins to paint a picture of the interconnected players that do everything from providing code to app developers to monetize user data to offering analytics from "1.9 billion devices" and access to datasets on hundreds of millions of people. Six companies claimed more than a billion devices in their data, and at least four claimed their data was the "most accurate" in the industry.

Full article

Thursday, April 29, 2021

Trust and Scientific Data Sharing

A key point. Its trust in various contexts as well, like misuse. And data is an asset, a concept we also experimented with. We also discovered that data value often emerged much later. So what then is the basis of the sharing?

Trustworthy Scientific Computing By Sean Peisert

Communications of the ACM, May 2021, Vol. 64 No. 5, Pages 18-21 10.1145/3457191

Data useful to science is not shared as much as it should or could be, particularly when that data contains sensitivities of some kind. In this column, I advocate the use of hardware trusted execution environments (TEEs) as a means to significantly change approaches to and trust relationships involved in secure, scientific data management. There are many reasons why data may not be shared, including laws and regulations related to personal privacy or national security, or because data is considered a proprietary trade secret. Examples of this include electronic health records, containing protected health information (PHI); IP addresses or data representing the locations or movements of individuals, containing personally identifiable information (PII); the properties of chemicals or materials, and more. Two drivers for this reluctance to share, which are duals of each other, are concerns of data owners about the risks of sharing sensitive data, and concerns of providers of computing systems about the risks of hosting such data. As barriers to data sharing are imposed, data-driven results are hindered, because data is not made available and used in ways that maximize its value.

Hardware trusted execution environments can form the basis for platforms that provide strong security benefits while maintaining computational performance.

And yet, as emphasized widely in scientific communities,3,5 by the National Academies, and via the U.S. government's initiatives for "responsible liberation of Federal data," finding ways to make sensitive data available is vital for advancing scientific discovery and public policy. When data is not shared, certain research may be prevented entirely, be significantly more costly, take much longer, or might simply not be as accurate because it is based on smaller, potentially more biased datasets.

Scientific computing refers to the computing elements used in scientific discovery. Historically, this has emphasized modeling and simulation, but with the proliferation of instruments that produce and collect data, now significantly also includes data analysis. Computing systems used in science include desktop systems and clusters run by individual investigators, institutional computing resources, commercial clouds, and supercomputers such as those present in high-performance computing (HPC) centers sponsored by U.S. Department of Energy's Office of Science and the U.S. National Science Foundation. Not all scientific computing is large, but at the largest scale, scientific computing is characterized by massive datasets and distributed, international collaborations. However, when sensitive data is used, computing options available are much more limited in computing scale and access. .... '

Tuesday, April 27, 2021

Data Monetization / Data as an Asset

A long time area of study for us. Measuring the value of data in context.

Greasing the Wheels for Data Monetization by 7wData

Data is a critical asset to business success.” That’s probably the closest thing to a self-evident truth you’re likely to find in today’s ultra-competitive business landscape. We all know how important data is. Those who have more of it, and know how to wield AI and advanced analytics upon it, have a substantial advantage. And yet, businesses face headwinds when trying to monetize their data, particularly outside their company. That’s why, when it comes to data monetization, we are still in the early stages of the game.

Doug Laney has dedicated a portion of his career to finding methods to break down the value behind data. His data valuation journey started after the Twin Towers came tumbling down on 9/11. The loss of life was tragic, but companies also lost enormous amounts of data. Insurance companies claimed that data had no value, hence Laney’s study of “Infonomics” was born.

Twenty years later, after a stint as a Gartner analyst Laney continues to study the nature of data at the business consulting firm West Monroe. He helps clients come up with strategies for managing data like the real asset that it is, and finding ways to use it for competitive advantage. ... '

Saturday, February 27, 2021

Data and its Useful Nature as an Asset

Recall this echoes some of our own examination of 'data as an asset'. Thoughtful ideas here but I think it is still useful to consider it as an asset that can be combined with methods and other data to drive further value. Does not have include exclusive 'ownership' as a element.

From Schneier:

Excellent Brookings paper: “Why data ownership is the wrong approach to protecting privacy.”

From the introduction:

Treating data like it is property fails to recognize either the value that varieties of personal information serve or the abiding interest that individuals have in their personal information even if they choose to “sell” it. Data is not a commodity. It is information. Any system of information rights — whether patents, copyrights, and other intellectual property, or privacy rights — presents some tension with strong interest in the free flow of information that is reflected by the First Amendment. Our personal information is in demand precisely because it has value to others and to society across a myriad of uses ... '

Tuesday, February 16, 2021

Data Lineage Platform

First I had heard of this kind of platform, we had used semantic web models. to create models of enterprise data. Can this be used in conjunction with semantic models? Also addresses the data as an asset measure.

Solidatus raises $19.5 million to expand its enterprise data lineage platform

Paul Sawers @psawers in Venturebeat, February 15, 2021

Solidatus, a U.K.-based data management and modeling platform for enterprises, has raised £14 million ($19.5 million) in a series A round of funding led by AlbionVC, with participation from HSBC Ventures and Citi — both Solidatus clients.

Founded out of London in 2011, Solidatus helps businesses monetize their data by charting the data journey from its origin while noting any transformations and presenting anything relevant in a visual format. This can be particularly pertinent for highly regulated industries, such as banking, where businesses may need to provide detailed accounts of all their data. Solidatus helps ensure that all that data is “cataloged and owned,” as the company puts it. ... '

Thursday, February 11, 2021

India Analyzing Data Goals and Gaps with AI

Notable use of AI and other pattern recognition approaches: What data do we have, its ability, to achieve certain levels of understanding of our economy. Then what data, quality of data, and metadata is needed at what cost, to further tune that understanding? Note out past looks at the 'data as an asset', for more explorations of this. Gaps analysis to achieve goals. With risks considered along the way. Note the mention of real-time data.

AI Is India's Solution to Fix Data Gaps By Bloomberg Quint, February 11, 2021 in ACM

India is turning from man to machines to improve the quality and speed of its economic data.

India's Ministry of Statistics is accelerating artificial intelligence (AI) usage for collecting, analyzing, and disclosing data to better monitor the economy, including an initiative with the World Bank employing an information portal that collates real-time data.

The Ministry's Kshatrapati Shivaji said, "There's a growing need for more and more data, faster data, and also more refined data products," and end-to-end computerization "will enhance the quality, credibility, and timeliness of data."

He added that AI will see extensive use, and help to overcome staffing limitations.

Shivaji said, "Because of automation and technology-intensive applications, the capability and productivity of staff is getting enhanced substantially. Wherever there is a component where we're able to squeeze the time with the help of technology, we're trying to do that."

From Bloomberg Quint in ACM

Thursday, August 27, 2020

Valuing Spatio-Temporal Information

The valuation of data is a long time interest, have consulted with large companies on the topic. Lately about automotive connections. Technical detail on the topic at the link.

Computing Value of Spatio-temporal Information
By Heba Aly, John Krumm, Gireeja Ranade, Eric Horvitz
Communications of the ACM, September 2020, Vol. 63 No. 9, Pages 85-92
10.1145/3410387

Location data from mobile devices is a sensitive yet valuable commodity for location-based services and advertising. We investigate the intrinsic value of location data in the context of strong privacy, where location information is only available from end users via purchase. We present an algorithm to compute the expected value of location data from a user, without access to the specific coordinates of the location data point. We use decision-theoretic techniques to provide a principled way for a potential buyer to make purchasing decisions about private user location data. We illustrate our approach in three scenarios: the delivery of targeted ads specific to a user's home location, the estimation of traffic speed, and the prediction of location. In all three cases, the methodology leads to quantifiably better purchasing decisions than competing approaches.

1. Introduction
As people carry and interact with their connected devices, they create spatiotemporal data that can be harnessed by them and others to generate a variety of insights. Proposals have been made for creating markets for personal data1 rather than for people either to provide their behavioral data freely or to refuse sharing. Some of these proposals are specific to location data.6 Several studies have explored the price that people would seek for sharing their GPS data.5, 13, 9 However, little has been published on determining the value of location data from a buyer's point of view. For instance, a Wall Street Journal blog says10:

"What groceries you buy, what Facebook posts you 'like' and how you use GPS in your car:
Companies are building their entire businesses around the collection and sale of such data. The problem is that no one really knows what all that information is worth. Data isn't a physical asset like a factory or cash, and there aren't any official guidelines for assessing its value."
We present a principled method for computing the value of spatiotemporal data from the perspective of a buyer. Knowledge of this value could guide pursuit of the most informative data and would provide insights about potential markets for location data.

We consider situations where a buyer is presented with a set of location data points for sale, and we provide estimates of the value of information (VOI) for these points. Because the coordinates of the location data points are unknown, we compute the VOI based on the prior knowledge that is available to the buyer and on side information that a user may provide (e.g., the time of day or location granularity). The VOI computation is customized to the specific goals of the buyer, such as targeting ad delivery for home services, offering efficient driving routes, or predicting a person's location in advance. We account for the fact that location data and user state are both uncertain. Additional data purchases can help reduce this uncertainty, and we quantify this reduction as well.

In the next section, we introduce a decision-making framework with a detailed analysis of geo-targeted advertising. We focus on the buyer's goal of delivering ads to people living within a certain region. We show that our method performs better than alternate approaches in terms of inferential accuracy, data efficiency, and cost. In Section 3, we apply the methodology to a traffic estimation scenario using real and simulated spatiotemporal data. We present our last scenario in Section 4, where we show how to make good data-buying decisions for predicting a person's future location. ...
"

Thursday, July 23, 2020

P&G Works with Consumer Data via AI

Some useful details here. Note different forms of data being used as an asset.

P&G Gets Personal with Consumers through Data, AI Tech Collaboration
By Alarice Rajagopal - in Cnsumergoods

The Procter & Gamble Company (P&G) has selected data analytics and AI technology from Google Cloud, to enable more personalized experiences for consumers. Through this new collaboration, P&G will now be able to leverage consumer and media data to innovate product experiences and enrich the shopping journey for existing and new consumers.

"We're always looking to ensure a great consumer experience across all our categories, from healthcare to beauty products and much more," says Vittorio Cretella, CIO, Procter & Gamble. "As a leader in analytics and AI, Google Cloud is a strategic partner helping us offer our consumers superior products and services that provide value in a secure and transparent way."

In its more than 180 years, P&G has been at the forefront of innovation. P&G is now modernizing and integrating consumer, brand, and media data using cloud technology to deliver the next generation of consumer goods. Examples of how P&G is leveraging data for better consumer experiences include: ... "

Saturday, July 18, 2020

Data Dividend Plan

We worked on a number of projects that linked to 'data as an asset' ideas, and was reminded of this project in Wired. Its value, limitations are and issues worth re-examining.

Andrew Yang’s Plan to Pay You for Your Data Doesn't Add Up
He wants social media companies to pay you for the data you produce. But loopholes abound, it's too expensive, and other plans like it have failed.

LAST MONTH, Former presidential hopeful Andrew Yang launched a new initiative called the Data Dividend Project that would force social media companies to compensate users for the use of their data. As Yang told the Verge, “That first day that people get paid their dividend through DDP for All is going to be such a great day.” But the second day, as platform users and advertisers adjust to the new costs, is sure to be a mess. ... '

Thursday, June 25, 2020

Getting Pay for Data

Another project with aim at paying users for their data. Links to our long term data as an asset view.

Andrew Yang Is Pushing Big Tech to Pay Users for Data
By The Verge
June 22, 2020

Andrew Yang wants people to get paid for the data they create on big tech platforms like Facebook and Google, and with a new project launching on Monday, he believes he can make it happen. ...

Yang's Data Dividend Project is a new program tasked with establishing data-as-property rights under privacy laws like the California Consumer Privacy Act (CCPA) all across the country. The program hopes to mobilize over 1 million people by the end of the year, focusing primarily on Californians, and "pave the way for a future in which all Americans can claim their data as a property right and receive payment" if they choose to share their data with platforms.

At the beginning of the year, the CCPA went into effect, granting consumers new control over their data online like the right to delete and opt out of the sale of their personal information. There's nothing in the law about tech companies paying for data (or, more specifically, paying them not to opt out), but Yang's new project is looking to show that the idea is popular with voters. The Data Dividend Project is betting on collective action as a means of changing the law and extending data property rights to users across the country. If this idea becomes law, Yang's team says it will work on behalf of users to help them get paid.

"We are completely outgunned by tech companies," Yang told The Verge. "We're just presented with these terms and conditions. No one ever reads them. You just click on them and hope for the best. And unfortunately, the best has not happened." ... '

Wednesday, June 24, 2020

Price of Personal Data

Looking for the full report mentioned here, will post when I get a reference. Back to the our long time looked at question of what the price of private data should be, and how should people be made to understand the implications?

Brits will sell their personal data for pennies

Surprising findings from an Okta report on digital identity suggest Brits would be willing to part with valuable personal data for a surprisingly low amount ....
By Alex Scroxton, Security Editor in ComputerWeekly ...

Sunday, March 29, 2020

Data Resources: Our World in Data

As part of a larger project that is looking at Data Sources, Open Source Data, Data as an Asset, Data Quality, Data for Machine Learning, Semantic Data, Knowledge Mapping, Metadata and related topics. This looks to be a great resource, just examining

Specific Data of the Coronavirus/COVID-19 (Updated frequently)

And via the Center For Disease Conrol: https://www.cdc.gov/coronavirus/2019-ncov/index.html

Our World in Data: (Used widely for teaching, research etc)

About:

Research and data to make progress against the world’s largest problems

Poverty, disease, hunger, climate change, war, existential risks, and inequality: The world faces many great and terrifying problems. It is these large problems that our work at Our World in Data focuses on.

Thanks to the work of thousands of researchers around the world who dedicate their lives to it, we often have a good understanding of how it is possible to make progress against the large problems we are facing. The world has the resources to do much better and reduce the suffering in the world.

We believe that a key reason why we fail to achieve the progress we are capable of is that we do not make enough use of this existing research and data: the important knowledge is often stored in inaccessible databases, locked away behind paywalls and buried under jargon in academic papers.

The goal of our work is to make the knowledge on the big problems accessible and understandable. As we say on our homepage, Our World in Data is about Research and data to make progress against the world’s largest problems. ... "

Wednesday, March 04, 2020

Pay for Data You Use—Not Data You Store

Towards using/measuring the value of data you use.

Cloud Services Tool Lets You Pay for Data You Use—Not Data You Store
IEEE Spectrum
Charles Q. Choi

Computer scientists at George Mason University have developed a new caching technique that can support pay-per-use cloud storage service. The team tested the service, InifiniCache, on Amazon Web Services' (AWS) Lambda computing service, and found that the technique achieved at least a 100-fold improvement in latency compared to the Amazon S3 service in about 60% of requests for objects larger than 10 megabytes. InfiniCache performed comparably with the AWS ElastiCache cloud caching service, but when it worked with large objects, InfiniCache cost users about one-thirtieth to one-ninetieth as much as ElastiCache. InfiniCache utilizes a data backup mechanism in which cached objects synchronize with clones of themselves to minimize the chances that reclaiming memory causes data loss. ... "

Thursday, January 30, 2020

Avast Shutters Jumpshot

Followed this because of recent revelations and our look at data as an asset. At one time had been using Avast.

Avast shutters data-selling subsidiary amid user outrage
Users were not happy to learn "security" software sold their browsing habits.

One of the world's largest antivirus providers is ending a program that collected and sold users' Web browsing data a few days after media reports exposed the platform.

Avast CEO Ondrej Vlcek announced late Thursday the end of the data-selling subsidiary, known as Jumpshot. Writing in an open letter, he said that he and the company's board "have decided to terminate the Jumpshot data collection and wind down Jumpshot's operations, with immediate effect."

The pervasive operations of Jumpshot came to light earlier this week following reporting by Vice Motherboard and PCMag. Jumpshot described itself as "the only company that unlocks walled garden data... to provide marketers with unparalleled visibility, analytical insights and a more comprehensive understanding of the online customer journey that delivers a highly competitive advantage." ... "

Monday, January 27, 2020

Avast Data to be Sold

Currently looking at the process of how data is gathered, enhanced and sold ... here a good example

Avast packaged detailed user data to be sold for millions of dollars
The data doesn't include personal information, but experts fear it could be 'de-anonymized.'
By Christine Fisher, @cfisherwrites in Engadget

Thursday, January 09, 2020

Data Valuation Perspectives

Have had several journeys into the valuation of data and its treatment as an asset (or liability). Another article on the topic has come up in The Berkeley Artificial Intelligence Research Blog that is worth looking at, in particular from a machine learning perspective. Technical but of value to look at. What is My Data Worth? – in Berkeley AI Blog By Ruoxi Jia

Thursday, December 26, 2019

IOTA, FiWare and more

Brought to my attention regarding IOTA, also the connection to Fiware, which was new to me. Regarding identification and secure use of knowledge/data. Mentioning this here for later examination.

Video from the IOTA meetup in Karlsruhe gives a great overview of identities and the semantic layer of the decentralized IOTA marketplace using eClass… https://youtu.be/nbQbLeKLUkQ

... IOTA is also collaborating with FIWARE https://www.fiware.org/

FIWARE: THE OPEN SOURCE PLATFORM FOR OUR SMART DIGITAL FUTURE
-Driving key standards for breaking the information silos
-Making IoT simpler
-Transforming Big Data into knowledge
-Unleashing the potential of right-time Open Data
-Enabling the Data Economy
-Ensuring sovereignty on your data ....

https://iota-news.com/fiware-tangle-poc-what-is-fiware-friend-or-foe/

Tuesday, December 24, 2019

Being Paid for Data

Considering the design of methods of payment and ensuring their security. An example.

AI Needs your Data and you should get pad for it. By Gregory Barber in Wired

ROBERT CHANG, A Stanford ophthalmologist, normally stays busy prescribing drops and performing eye surgery. But a few years ago, he decided to jump on a hot new trend in his field: artificial intelligence. Doctors like Chang often rely on eye imaging to track the development of conditions like glaucoma. With enough scans, he reasoned, he might find patterns that could help him better interpret test results.

That is, if he could get his hands on enough data. Chang embarked on a journey that’s familiar to many medical researchers looking to dabble in machine learning. He started with his own patients, but that wasn’t nearly enough, since training AI algorithms can require thousands or even millions of data points. He filled out grants and appealed to collaborators at other universities. He went to donor registries, where people voluntarily bring their data for researchers to use. But pretty soon he hit a wall. The data he needed was tied up in complicated rules for sharing data. “I was basically begging for data,” Chang says.

Chang thinks he might soon have a workaround to the data problem: patients. He’s working with Dawn Song, a professor at the University of California-Berkeley, to create a secure way for patients to share their data with researchers. It relies on a cloud computing network from Oasis Labs, founded by Song, and is designed so that researchers never see the data, even when it’s used to train AI. To encourage patients to participate, they’ll get paid when their data is used. ... "

Thursday, December 19, 2019

An AI Transparency Risk Paradox

Risk is not thought of formally enough. And the point made in the article points out that to do current AI med\methods you need more data, but more data creates higher risk. In our own work in the area we looked at valuation of data ... but also the cost risk of storing and moving it around, sharing it with suppliers and tech vendors. Assets can have negative value.

The AI Transparency Paradox
By Andrew Burt HBR

In recent years, academics and practitioners alike have called for greater transparency into the inner workings of artificial intelligence models, and for many good reasons. Transparency can help mitigate issues of fairness, discrimination, and trust — all of which have received increased attention. Apple’s new credit card business has been accused of sexist lending models, for example, while Amazon scrapped an AI tool for hiring after discovering it discriminated against women.

At the same time, however, it is becoming clear that disclosures about AI pose their own risks: Explanations can be hacked, releasing additional information may make AI more vulnerable to attacks, and disclosures can make companies more susceptible to lawsuits or regulatory action.

Call it AI’s “transparency paradox” — while generating more information about AI might create real benefits, it may also create new risks. To navigate this paradox, organizations will need to think carefully about how they’re managing the risks of AI, the information they’re generating about these risks, and how that information is shared and protected. .... "

Friday, November 15, 2019

Should Customers be Paid for their Data?

From a retail perspective, below the outline, more at the link

Should customers just be paid for their data? by Guest contributor
Wise Marketer Staff

In light of rising security failures, more calls are being heard for businesses to tangibly pay customers for their data.

“California’s consumers should also be able to share in the wealth that is created from their data,” California governor Gavin Newson declared in February in proposing such a “data dividend” concept. “[Tech companies] make billions of dollars collecting, curating and monetizing our personal data [and so should] have a duty to protect it.”

But is the direct transfer of wealth from corporations to customers the panacea? Let’s play devil’s advocate and explore some of the claims supporting the “data dividend” concept: ... "

About Me

RSS

Blog Archive