Computing on Masked Data: a High Performance Method for Improving Big Data Veracity
The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V’s of big data, an emerging fourth “V” is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic techniques that ensure the veracity of data can have overheads that are too large to apply to big data. This work introduces a new technique called Computing on Masked Data (CMD), which improves data veracity by allowing computations to be performed directly on masked data and ensuring that only authorized recipients can unmask the data. Using the sparse linear algebra of associative arrays, CMD can be performed with significantly less overhead than other approaches while still supporting a wide range of linear algebraic operations on the masked data. Databases with strong support of sparse operations, such as SciDB or Apache Accumulo, are ideally suited to this technique. Examples are shown for the application of CMD to a complex DNA matching algorithm and to database operations over social media data.
💡 Research Summary
This paper introduces a new technique called Computing on Masked Data (CMD) to address the challenges of big data veracity, which encompasses confidentiality, integrity, and availability. Traditional cryptographic methods often impose high overheads that make them unsuitable for large-scale applications. CMD allows computations directly on masked data, ensuring only authorized recipients can unmask it. By leveraging sparse linear algebra operations with associative arrays, CMD reduces computational overhead significantly while supporting a wide range of linear algebraic operations on the masked data. The paper highlights that databases like SciDB or Apache Accumulo, which support sparse operations effectively, are ideal for implementing CMD. Practical applications include enhancing complex DNA matching algorithms and performing database operations over social media data, showcasing how CMD can be applied in real-world scenarios to improve big data veracity without compromising performance.
Comments & Academic Discussion
Loading comments...
Leave a Comment