Privacy's Random Answer
CNet (11/24/04); Kanellos, Michael
Michael
Kanellos is impressed by IBM's work with data randomization as a
possible solution to the increasingly contentious issue of consumer
privacy, which is fueled by consumers' growing outrage with how
companies and organizations collect, exchange, and distribute their
personal information. The idea is that data randomization would use
indecipherable mathematical calculations to effectively scramble
consumer data such as age, income, past purchases, or medical
information while allowing back-end systems to discern patterns within
the customer base. The randomization system uses Bayesian probability
to determine the relationship between different values, so that
consumers do not have to falsify their data, which is randomized before
being transmitted to the corporate server. The back-end computer
attempts to ascertain the randomizing calculations employed to conceal
the original values, so that accurate customer base simulations can be
extrapolated. "I think the key insight was that you don't have to have
access to precise information to build good models," explains IBM
senior fellow Rakesh Agrawal, who is directing the data randomization
project. In several trials, there was a mere 2 percent to 3 percent
difference between the curve plotted by the original data and the
reconstructed curve. Among the areas Agrawal believes could benefit
from data randomization technology is hospital services, which could
provide records about disease epidemics without fear of litigation. In
addition, large businesses could share their data without revealing
customer lists, while network security could be bolstered.
Click Here to View Full Article