Part 2: Wenhai Sun’s mission to bridge AI and online privacy with NSF award

Wenhai Sun became a faculty member in Purdue Polytechnic's Department of Computer and Information Technology in 2019. (Courtesy: NSF logo)

This article is part two of a two-part series on NSF-funded AI research by professors in Purdue Polytechnic’s Department of Computer and Information Technology.

This July, the National Science Foundation (NSF) awarded Purdue Polytechnic’s Wenhai Sun with a five-year grant of $500,000. Sun, an assistant professor in computer and information technology, received NSF’s CAREER award for promising early-career faculty for his proposal to develop machine intelligence-powered privacy protection technology with implications for end users and online service providers.

The NSF CAREER award is its most prestigious, given to individual researchers who have displayed evidence of promising innovations in their fields, and seeks to fund their endeavors to enable such innovation. Sun, a double-PhD in computer science and cryptography, has spent the first leg of his academic career researching data and network security and privacy with an additional focus on how AI and machine learning may be a valuable tool in these areas.

The following Q&A gives an introduction to Sun’s research related to his five-year NSF CAREER award:

Purdue Polytechnic Newsroom: With recent technologies like ChatGPT, AI has experienced a major boost in public attention. But there’s a longer history of AI in many technology disciplines. Has AI already been used or combined with privacy protection software in the past?

Dr. Wenhai Sun: Well, there is a long history of using privacy technology to protect AI or machine-learning models that are trained on large quantities of user data. From this perspective, the combination of AI and privacy is not new. But with the unprecedented growth of AI that you see with something like ChatGPT for example, we should really be asking the reverse question. And that’s, basically, what can AI do for privacy? That kind of opens up a new research area that this CAREER project aims to pursue, and the NSF viewed as novel.

The National Science Foundation's CAREER awards are determined after a nationwide selection process. (Courtesy: NSF)

Newsroom: And what does this new technology look like?

Sun: It’s intended as an evolution for both privacy technology and AI. Both of these things have developed to a certain point, and my design is to bring both of them together and to integrate them.

You know, when I’m asked to check boxes for an online service agreement or a privacy policy, like most people I just overlook it. They’re just too long, so we kind of blindly trust the company. But online service providers would be able to greatly simplify the process of setting up privacy protections for users. If a user had the ability to prompt an AI with particular questions about where their data goes and how it is protected, that would be a big improvement on how most providers, not to mention most consumers, deal with privacy. Service providers also have certain challenges. They may lack a deep understanding of the adopted privacy protection technique, which can lead to larger errors when responding to users’ privacy needs, utility requirements and new cyber threats.

Newsroom: You mentioned that integrating these two pieces—the privacy component with the AI assistant—used to be too difficult for researchers to gel together. Why is that, and what’s changed?

Sun: There’s intrinsic complexity to these technologies that, in the past, has made it difficult to use AI to its fullest potential. It’s difficult to really show any of the AI’s processes to the end user.

But on the other hand, we’ve recently seen that AI has gotten really good at identifying complex things, getting an accurate but simplified explanation to the user, and delivering a result to them. And they have gotten better at helping users make decisions, and in a privacy context, those decisions can now be personalized to a high degree. So I think that artificial intelligence can help develop this new kind of privacy technology—something that can be easily understood by the general public and developers, still involve the public, and ask for their active participation in evolving the privacy AI to better protect their data.

Newsroom: Could you think of a way that this privacy AI might be useful in a commonly-used app or program?

Sun: Yes, and let me also clarify that there’s a three-part goal here. AI can help service providers mitigate attacks that compromise their security. It can also help improve the service quality by making the data as accurate as possible without sacrificing user privacy. I also see an accountable use of AI, where we can make a transparent system that would reinforce the trust of human users in privacy decisions made by intelligent machines.

So for instance, Google Maps uses your location data. And that’s not just to determine your location, it’s also for the prediction of traffic patterns and a lot of other complicated, data-centric processes. But still, we don’t want that service provider—in this case Google—to identify you specifically. You can use all this information collectively, because that aggregated data is what allows the app to work. But I don’t want Google to identify or select my specific info for the purposes of tracking me.

Our current privacy protections work to protect users while still transmitting the necessary data on paper, but it gets a lot less effective in reality.

Technologies like this are vulnerable to attack. There’s not really anything to prevent someone from submitting false information. They can leverage data collection processes to manipulate the final results. That’s called a poisoning attack, and if that’s happening with enough users then it can affect decisions being made on the provider’s servers. So on top of the privacy questions, we also have these security concerns.

Additionally, the privacy comes at the price of sacrificing the data’s utility. A provider like Google often has to intentionally skew information at the personal level to maintain privacy. But at the aggregate level you want it to be as accurate as possible. Still, the fact that you have to inject some noise into the data means that there’s a natural tradeoff between privacy and utility. If you have the highest privacy settings on, then the utility of the data may be very low. When you take this dynamic and also lump in the security concerns, this tension becomes extremely difficult to handle.

This is where AI can come in and help tackle this. AI can understand these complex processes very well, and can help providers make the best decisions—basically in an effort to still get maximum utility out of the data while staying resilient to related attacks and not sacrificing the user’s privacy.

Newsroom: So part of the point is that there’s mutual benefit both to the provider of an online service, as well as the person using it.

Sun: Right—that’s our third goal of building the trust and confidence of stakeholders using this technology. So on the user side of things, we want to have an AI assistant that people can interact with. This is designed to explain privacy protections to users, and then for people to be able to actually interact with it. They can tailor their privacy opt-in and opt-out choices with a much greater level of detail than most people currently have the option for when they use online services.

Then on the developer side, those choices get communicated and implemented with much greater attention to detail. This is designed to prevent the intentional manipulation of user data by hacking, as well as boost data utility that is crucial to service providers.

This can be done by designing an accountable machine intelligence that delivers transparent inner workings and decision-making processes to stakeholders.

Newsroom: Beyond NSF’s award acting as a vindication of your work and your research interests, what does their support mean for you?

Sun: From much earlier versions of the internet until now, privacy has been a bit of a black box for many users. In the past, AI would’ve run the risk of becoming yet another black box on top of everything else. But AI is now at the point where it can actually be a tool, something that increases transparency about where your data is going and helps you take a proactive role in deciding how much info you’re permitting providers to use and how to use them.

My long-term goal has been to use AI to simplify this process. I’m a very passionate promoter of privacy protection. I want to help people understand and implement these kinds of new privacy technologies—that continues to be my main interest.

Additional information:

People in this Article: