Self correcting data is a good idea, but want to see some examples. Its similar to removing outliers. It depends on the context and often metadata involved. Have seen it improperly used.
Researchers Develop AI Tool Better Able to Identify Bad Data
University of Waterloo News
An international team of researchers led by Alireza Heidari and Ihab Ilyas at the University of Waterloo in Canada has developed an artificial intelligence-powered system to manage data quality. The HoloClean tool sifts out bad data and corrects errors prior to processing. The new system also can automatically generate bad examples without tainting source data, so the system can learn to identify and correct errors on its own. Once HoloClean is trained, it can independently differentiate between errors and correct data, and determine the most likely value for missing data if an error exists. Ilyas said the work “deviates from the old way of manually trying to clean the data, which was expensive, didn’t scale, and does not meet the current needs for cleaning the data.” ... '
Saturday, June 01, 2019
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment