/* ---- Google Analytics Code Below */

Thursday, April 29, 2021

Trust and Scientific Data Sharing

A key point.  Its trust in various contexts as well, like misuse.    And data is an asset, a concept we also experimented with.  We also discovered that data value often emerged much later.  So what then is the basis of the sharing?

Trustworthy Scientific Computing   By Sean Peisert

Communications of the ACM, May 2021, Vol. 64 No. 5, Pages 18-21  10.1145/3457191

Data useful to science is not shared as much as it should or could be, particularly when that data contains sensitivities of some kind. In this column, I advocate the use of hardware trusted execution environments (TEEs) as a means to significantly change approaches to and trust relationships involved in secure, scientific data management. There are many reasons why data may not be shared, including laws and regulations related to personal privacy or national security, or because data is considered a proprietary trade secret. Examples of this include electronic health records, containing protected health information (PHI); IP addresses or data representing the locations or movements of individuals, containing personally identifiable information (PII); the properties of chemicals or materials, and more. Two drivers for this reluctance to share, which are duals of each other, are concerns of data owners about the risks of sharing sensitive data, and concerns of providers of computing systems about the risks of hosting such data. As barriers to data sharing are imposed, data-driven results are hindered, because data is not made available and used in ways that maximize its value.

Hardware trusted execution environments can form the basis for platforms that provide strong security benefits while maintaining computational performance.

And yet, as emphasized widely in scientific communities,3,5 by the National Academies, and via the U.S. government's initiatives for "responsible liberation of Federal data," finding ways to make sensitive data available is vital for advancing scientific discovery and public policy. When data is not shared, certain research may be prevented entirely, be significantly more costly, take much longer, or might simply not be as accurate because it is based on smaller, potentially more biased datasets.

Scientific computing refers to the computing elements used in scientific discovery. Historically, this has emphasized modeling and simulation, but with the proliferation of instruments that produce and collect data, now significantly also includes data analysis. Computing systems used in science include desktop systems and clusters run by individual investigators, institutional computing resources, commercial clouds, and supercomputers such as those present in high-performance computing (HPC) centers sponsored by U.S. Department of Energy's Office of Science and the U.S. National Science Foundation. Not all scientific computing is large, but at the largest scale, scientific computing is characterized by massive datasets and distributed, international collaborations. However, when sensitive data is used, computing options available are much more limited in computing scale and access. .... ' 

No comments: