Research Papers

AI Security & Information Security Research

Published

Indirect Prompt Injection: Compromising LLM-Integrated Applications

greshake, Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M. | 2023-10-01 | Prompt Injection

We propose a taxonomy of indirect prompt injection attacks targeting LLM-integrated applications. Unlike direct prompt injection, our attack model assumes the adversary cannot directly interact wit...

prompt injection | LLM security | retrieval-augmented generation | adversarial attacks

Deep Leakage from Gradients

zhu, Zhu, L., Liu, Z., Han, S. | 2019-12-01 | Privacy / FL

We demonstrate that shared gradients in distributed learning can be inverted to recover private training data with high fidelity. Our method, DLG (Deep Leakage from Gradients), optimizes dummy inpu...

gradient leakage | federated learning | privacy | data reconstruction | distributed learning

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

brundage, Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B., Dafoe, A., Scharre, P., Zeitzoff, T., Filar, B. | 2018-02-01 | AI Governance

We survey the landscape of potential malicious uses of AI across digital, physical, and political security domains. As AI capabilities expand in scope and accessibility, we identify how advances in...

AI governance | dual-use | malicious AI | digital security | political security | AI policy

Stealing Machine Learning Models via Prediction APIs

tramer, Tramer, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T. | 2016-08-01 | AI Red Teaming

We demonstrate that machine learning models deployed as prediction APIs can be effectively stolen through equation-solving and path-finding attacks. Our methods extract near-perfect copies of logis...

model extraction | MLaaS | prediction API | intellectual property | model stealing

Explaining and Harnessing Adversarial Examples

goodfellow, Goodfellow, I.J., Shlens, J., Szegedy, C. | 2015-03-01 | Adversarial ML

We propose a linear explanation for adversarial examples and introduce FGSM (Fast Gradient Sign Method), a computationally efficient method for generating adversarial perturbations. We demonstrate ...

adversarial examples | FGSM | robustness | deep learning | adversarial training