Future Perspectives of Privacy-Preserving Machine Learning
Presenter: Rafael Dowsley, Monash University, Australia
Abstract: With success stories ranging from speech recognition to self-driving cars, machine learning (ML) has been one of the most impactful areas of computer science. ML’s versatility stems from the wealth of techniques it offers, making ML seem an excellent tool for any task that involves building a model from data. Nevertheless, there is a big problem with ML in that ML makes an implicit overarching assumption that severely limits its applicability to a broad class of critical domains: the data owner is willing to disclose the data to the model builder/holder. This assumption is particularly problematic in industries with sensitive data, such as the financial sector, electronic surveillance, or healthcare.
The dilemma between enjoying the benefits from ML techniques and keeping data private might become a severe restriction to the social and economic gains that ML can provide. In many ML applications, data owners are being forced to sacrifice the privacy of their data in order to get the benefits that ML can offer. This is a significant problem both from the point of view of citizens and corporations that reveal their data, as well as from the point of view of the institutions that receive this private data. Recent data leakage scandals have already demonstrated the potential reputation damage for those institutions. Additionally, in many industries, players cannot or will not share their data because of economic incentives or privacy legislation. This necessitates privacy-preserving ML techniques that can solve the privacy versus utility dilemma by using cryptographic techniques to protect the privacy of the data.
Most previous work concerning privacy-preserving ML has used techniques from differential privacy: the area in cryptography that aims to answer database queries with as high accuracy as possible while minimising the information that is leaked to a third party about a particular record. However, as pointed out by Dwork and Pappas: “While differential privacy ensures privacy of the information in data, we must consider additional privacy risks at the level of communications, as well as computations that manipulate the data. In that context, recent ideas from the cryptography community, such as homomorphic encryption and secure multi-party computation, as well as information theoretic secrecy, are promising”.
In this talk we focus on privacy-preserving ML solutions based on secure multi-party computation. The area is rapidly evolving, but there are still challenges ahead: (1) many solutions do not scale for the case of big data and therefore do not meet the scalability needs of present-day data collection volume and variety; (2) some machine learning algorithms and techniques have not been adapted to the privacy-preserving setting; (3) the privacy-preserving solutions for training assume that the datasets are ready for consumption. In reality, however, much work is done before the training: datasets must be cleaned and pre-processed, missing values need to be addressed, continuous data must be discretized. And posteriorly the models must be validated and possibly fine-tuned; (4) some works consider weak security models that do not capture the complexity of realistic scenarios, such as the Internet, and therefore provide no security guarantee for such scenarios; and so on.
Short Bio: Rafael Dowsley is a Lecturer in the Department of Software Systems and Cybersecurity at Monash University, Australia. He got his PhD in 2016 from the Karlsruhe Institute of Technology, Germany, where he worked in the Cryptography and Security Group. Later on, he was a researcher in the Center for Research in Applied Cryptography and Cyber Security at Bar-Ilan University, Israel, and in the Cryptography and Security Research Group at Aarhus University, Denmark. He was also a research fellow of IOHK, a world-leading blockchain company.
Far beyond (or nearer) trust: addressing the main challenges associated to privacy protection and security management in the data era
Presenter: David Arroyo, ITEFI-CSIC, Spain
Abstract: Data production and exploitation are the very core of our economic and social reality. Our daily activity is more determined the day by the way we access cloud services through our smartphones and, the less the day, by means of our computer. The value chain associated to all these data interfaces and computation modalities is very dependent on the quality of information sources and the trustworthiness of the mechanisms for data coding, decision making and information security.
As part of our research in information security and privacy enhancing technologies, we have addressed the main vulnerabilities in cryptographic systems and data protection policies and procedures. In this vein, we have proposed a comprehensive methodology for the design, implementation and validation of cryptographic protocols. Regarding the deployment of privacy by default solutions, we have leveraged this methodology in order to achieve privacy respectful solutions in Internet services without eroding neither functionality nor security. Considering specific applications as e-commerce and e-voting, we have shown that is possible to properly articulate business logic and privacy protection.
Although privacy preserving is a must in the construction and protection of the realms of e-democracy, we have to take into account that it does not solve another of the most challenging matters in today’s data deluge: information quality. Mis and dis-information phenomena are, indeed, eroding the lifecycle of decision making. Technical solutions can be devised to contain and diminish the effects of fake news campaigns, but effectiveness calls for an integral methodology upon multidisciplinary approaches and interdisciplinary spirit.
In this talk we will discuss the above concerns, in terms of our contributions and on-going works on technical and social aspects associated to the Data Lifecycle Management (DLM), ranging from the deployment of cryptographic protocols for trust anchoring and fair management of anonymity, to the design of attribution techniques for disinformation and fake news detection and countering.
Short Bio: Dr. David Arroyo is Tenured Scientist in the Institute of Physical and Information Technologies (ITEFI) of the Spanish National Research Council (CSIC). He has a MSc in Telecommunication Engineering from the University of Seville (Spain) and a PhD in Physics of Complex Systems from the Polytechnic University of Madrid. Before joining CSIC, he has worked in the Computer Science and Engineering Department of the Autonomous University of Madrid for eight years. His research is mainly devoted to inter- and multidisciplinary applications in the areas of cryptography, information security and privacy, decentralized trust and blockchain, information theory and coding, signal processing, and nonlinear dynamics. As member of the CSIC, David Arroyo collaborates with the Spanish Association for Standardization (UNE) as expert in the CTN 320 “Cybersecurity and Protection of Personal Data”, CTN 71/SC 307 “Blockchain and distributed ledger technologies”, and CTN 71/SC 42 “Artificial Intelligence and Big Data”. From January 2020, Dr. Arroyo is involved in the deployment tools for fake news detection in the context of the H2020 project TRESCA (Trustworthy, Reliable and Engaging Scientific Communication Approaches), and from October 2021 he is in charge of the deployment of Privacy Enhancing Technologies in the H2020 project SPIRS (Secure Platform for ICT Systems Rooted at the Silicon manufacturing process).