Research Papers
AI Security & Information Security Research
Published
Indirect Prompt Injection: Compromising LLM-Integrated Applications
greshake, Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M. | 2023-10-01 | Prompt Injection
We propose a taxonomy of indirect prompt injection attacks targeting LLM-integrated applications. Unlike direct prompt injection, our attack model assumes the adversary cannot directly interact wit...
prompt injection | LLM security | retrieval-augmented generation | adversarial attacks
Deep Leakage from Gradients
zhu, Zhu, L., Liu, Z., Han, S. | 2019-12-01 | Privacy / FL
We demonstrate that shared gradients in distributed learning can be inverted to recover private training data with high fidelity. Our method, DLG (Deep Leakage from Gradients), optimizes dummy inpu...
gradient leakage | federated learning | privacy | data reconstruction | distributed learning
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
brundage, Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B., Dafoe, A., Scharre, P., Zeitzoff, T., Filar, B. | 2018-02-01 | AI Governance
We survey the landscape of potential malicious uses of AI across digital, physical, and political security domains. As AI capabilities expand in scope and accessibility, we identify how advances in...
AI governance | dual-use | malicious AI | digital security | political security | AI policy
Stealing Machine Learning Models via Prediction APIs
tramer, Tramer, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T. | 2016-08-01 | AI Red Teaming
We demonstrate that machine learning models deployed as prediction APIs can be effectively stolen through equation-solving and path-finding attacks. Our methods extract near-perfect copies of logis...
model extraction | MLaaS | prediction API | intellectual property | model stealing
Explaining and Harnessing Adversarial Examples
goodfellow, Goodfellow, I.J., Shlens, J., Szegedy, C. | 2015-03-01 | Adversarial ML
We propose a linear explanation for adversarial examples and introduce FGSM (Fast Gradient Sign Method), a computationally efficient method for generating adversarial perturbations. We demonstrate ...
adversarial examples | FGSM | robustness | deep learning | adversarial training