FaceShield — Jaehwan (Jay) Jeong

TL;DR

We propose FaceShield, a proactive defense that embeds invisible adversarial noise into facial images to neutralize deepfake face swapping. Prior defenses target only the general noise addition and removal process of diffusion models, whereas FaceShield directly disrupts the cross-attention conditioning mechanism, the core bottleneck through which the original face is projected into the deepfake output. By computing gradients at this exact pathway, the method achieves maximized disruption with only minimal perturbation. Defense coverage is further extended to GAN-based architectures by jointly attacking common facial feature extractors. To prevent the protective noise from being erased by image compression or purification, we apply Low-Pass Filtering (LPF), and to eliminate visually noticeable noise boundaries, we incorporate a Gaussian blur. As a result, FaceShield achieves state-of-the-art protection against the latest DM-based deepfake models, while also exhibiting strong transferability to GAN-based architectures, with lower noise visibility than all baselines.

Key Contributions

First defense directly targeting the diffusion deepfake core pathway — we disrupt the Key-Value pairs in cross-attention where the source face is synthesized, a mechanism entirely overlooked by prior defenses. Defense coverage is further extended to GAN-based models by targeting facial feature extractors under a single unified perturbation.
Enhanced imperceptibility via Gaussian blur — by smoothing out harsh differences between adjacent noise pixels, visual artifacts are reduced, making the protective noise far less noticeable to the human eye.
Robustness against purification via Low-Pass Filtering — by concentrating noise in low-frequency components, the defense remains highly effective even after JPEG compression, bit reduction, and resizing.

Project Design

Motivation: Why Prior Defenses Fail on Deepfakes

Diffusion-based image editing vs deepfake process

(a) Image Editing

Uses a single image as query Q and modifies it via a text prompt. Prior defenses are designed around this pathway.

(b) Deepfake

Uses two images where the target serves as query Q, while the source face enters as Key K and Value V through cross-attention. This is a fundamentally different conditioning pathway that prior defenses entirely overlook.

FaceShield directly disrupts this K-V mechanism, targeting the bottleneck where the source identity is injected into the output.

Overall Architecture

(i) Conditioned Face Attack

When a source face conditions a DM-based deepfake, it is projected through a CLIP Image Projector and injected into the denoising UNet as key K and value V. FaceShield attacks this pathway via two modules: (1) Face Projector Attack — corrupts the source face embedding at the projector output. (2) Attention Disruption Attack — disrupts mid-layer cross-attention to prevent source face features from conditioning the generation.

(ii) Facial Feature Extractor Attack

MTCNN's three-stage cascade (P-Net → R-Net → O-Net) for face detection and landmark localization.

The MTCNN attack adversarially perturbs the P-Net bounding box probabilities across multiple scales, ensuring the noise disrupts face detection regardless of interpolation mode or framework (PyTorch / TensorFlow). By combining this with an ArcFace identity attack, FaceShield performs a joint ensemble attack that broadly covers GAN-based deepfake models under a single perturbation.

(iii) Enhanced Noise Update

The adversarial noise is transformed to the frequency domain via 8×8 DCT patches. Using the Luminance Quantization Table — the same mask applied during actual JPEG compression — only low-frequency components are retained, then reconstructed via IDCT. This concentrates the perturbation in a band that survives JPEG compression without being discarded.

Even under a bounded noise budget (e.g., δ = 12/255), adjacent pixels can receive opposite extremes (e.g., +12 and −12), creating a local contrast gap of up to 24 that becomes visually conspicuous.

Gaussian blur reduces boundary noise visibility

Sobel filtering identifies these boundary regions where neighboring noise values diverge sharply, and Gaussian blur is selectively applied to smooth them out, maximizing imperceptibility without affecting the overall perturbation strength.

Experiments

Protection across Diverse Deepfake Models

Qualitative results across 6 deepfake models

FaceShield protects against 6 deepfake models — DiffFace, DiffSwap, FaceSwap, IP-Adapter (DM-based, orange box) and SimSwap, InfoSwap (GAN-based, blue box). DM-based outputs exhibit distorted artifacts and GAN-based outputs generate unrelated identities, both preventing source face reproduction.

Method	DiffFace				DiffSwap				FaceSwap				IP-Adapter
Method	L₂ ↑	ISM ↓	PSNR ↓	HE ↑	L₂ ↑	ISM ↓	PSNR ↓	HE ↑	L₂ ↑	ISM ↓	PSNR ↓	HE ↑	L₂ ↑	ISM ↓	PSNR ↓	HE ↑
CelebA-HQ
AdvDM	0.021	0.471	39.368	4.22	0.068	0.199	28.362	4.68	0.303	0.245	21.615	4.52	0.207	0.235	25.332	2.76
Mist	0.021	0.468	39.443	3.94	0.067	0.201	28.384	4.18	0.287	0.230	22.263	4.78	0.152	0.265	28.213	4.26
PhotoGuard	0.022	0.469	39.194	3.82	0.068	0.201	28.292	4.58	0.282	0.238	22.316	4.44	0.153	0.268	28.101	4.44
SDST	0.021	0.470	39.512	4.08	0.067	0.207	28.383	5.04	0.274	0.261	22.582	4.68	0.147	0.273	28.440	4.32
Ours	0.044	0.243	32.052	5.76	0.072	0.163	27.833	6.20	0.336	0.194	20.759	6.16	0.350	0.072	20.266	6.60
Ours (Q=75)	0.043	0.259	32.259	—	0.070	0.169	28.034	—	0.317	0.209	21.286	—	0.326	0.112	20.867	—
VGGFace2-HQ
AdvDM	0.042	0.479	33.064	3.68	0.105	0.215	24.769	4.78	0.419	0.361	18.596	4.38	0.251	0.271	23.250	2.36
Mist	0.041	0.478	33.215	4.26	0.102	0.227	24.964	3.94	0.379	0.259	19.626	4.50	0.181	0.291	26.070	4.10
PhotoGuard	0.043	0.479	32.938	3.96	0.110	0.215	24.272	4.18	0.373	0.266	19.655	4.14	0.180	0.294	26.157	3.82
SDST	0.041	0.483	33.242	5.30	0.107	0.225	24.506	4.58	0.359	0.258	19.996	4.14	0.166	0.292	26.784	4.06
Ours	0.062	0.278	29.204	6.10	0.113	0.177	24.054	6.12	0.453	0.237	17.919	6.16	0.382	0.112	19.478	6.42
Ours (Q=75)	0.060	0.308	29.435	—	0.112	0.185	24.201	—	0.421	0.237	18.573	—	0.377	0.167	19.618	—

Comparison of perturbation effectiveness among baseline methods on four deepfake models using the CelebA-HQ and VGGFace2-HQ datasets. Bold = best, underline = second best per column.

Comparison with State-of-the-Art Baselines

Qualitative comparison with AdvDM, Mist, PhotoGuard, SDST

Compared to AdvDM, Mist, PhotoGuard, and SDST on IP-Adapter deepfakes, prior methods fail to disrupt the output. FaceShield causes visible generation failures, confirming that attacking the conditioning pathway is essential for DM-based deepfake protection.

Targeting the mid-layers of the denoising UNet produces the strongest protection compared to using only down/up layers or all layers, as confirmed by our ablation study. Mid-layer cross-attention carries the highest sensitivity to the conditioning source face, as this is where source identity is most strongly encoded and the face-swap signal is concentrated, making it the most effective and precise attack target.

Layer selection ablation for conditioned face attack

Face Detection Disruption — MTCNN

MTCNN face detection failure under FaceShield

Under FaceShield's perturbation, MTCNN fails to produce high-confidence bounding boxes at the P-Net stage (bottom row), preventing detected face crops from propagating to downstream deepfake models — compared to the unprotected case where detection succeeds cleanly (top row).

Robustness to Image Purification

Robustness across JPEG, bit reduction, and resizing

FaceShield's low-frequency noise concentration keeps performance close to lossless (PNG) across JPEG compression (Q90/75/50), bit reduction (8-bit/3-bit), and resizing (75%/50%, BILINEAR/INTER AREA) — consistently outperforming all baselines on both ISM and PSNR metrics.

JPEG compression and bit reduction qualitative

Qualitative results after JPEG compression (Q90, Q75, Q50) and bit reduction (8-bit, 3-bit). Protection degrades minimally relative to the lossless PNG baseline.

Qualitative results after 75% and 50% resizing (BILINEAR and INTER AREA). The perturbation remains effective across both interpolation modes.

Method	CelebA-HQ					VGGFace2-HQ
Method	LPIPS ↓	PSNR ↑	SSIM ↑	FR ↑	HE ↑	LPIPS ↓	PSNR ↑	SSIM ↑	FR ↑	HE ↑
AdvDM	0.4214	30.4476	0.8438	2.1077	3.86	0.4108	30.2523	0.8436	2.0667	3.66
Mist	0.5492	29.9935	0.8684	1.6583	4.70	0.5208	29.9068	0.8721	1.6872	4.34
PhotoGuard	0.5515	29.9127	0.8669	1.6538	4.82	0.5221	29.8204	0.8712	1.6824	4.62
SDST	0.5409	31.4762	0.9033	1.6767	5.12	0.5060	31.3545	0.9092	1.6892	4.48
Ours	0.2017	32.6289	0.9394	18.4651	5.64	0.1941	31.5799	0.9341	18.0400	5.28

Noise quality comparison under identical settings. Our method shows the lowest distortion (LPIPS, SSIM, PSNR) and highest low-frequency energy (FR). Bold = best, underline = second best per column.

Resource Cost Comparison

Method	ISM ↓	LPIPS ↓	VRAM	Sec. ↓
AdvDM	0.288	0.4214	20 GB	39
Mist	0.291	0.5492	22 GB	80
PhotoGuard	0.294	0.5515	28 GB	234
SDST	0.303	0.5409	11 GB	34
Ours	0.168	0.2017	15 GB	24

Resource cost comparison with baseline methods. Bold = best, underline = second best per column.

Transferability across IP-Adapter Variants

FaceShield generalizes across eight IP-Adapter variants (ControlNet, ImageVariation, Multi-modal prompts, Plus/PlusXL, FaceID, FaceIDPlus) — demonstrating robust transferability not only across structurally different architectures but also across models sharing the same backbone with different pre-trained weights.

Conclusion

We propose FaceShield, a proactive invisible protection technique against diverse deepfake systems. By targeting the cross-attention conditioning pathway specific to DM-based deepfakes — a mechanism overlooked by prior defenses — and jointly disrupting common facial feature extractors used by GAN-based models, FaceShield achieves broad architectural coverage under a single perturbation. An enhanced noise update combining Gaussian blur and low-pass filtering further ensures that protection is both imperceptible and robust to JPEG compression and resizing purification. Extensive experiments confirm state-of-the-art protection with the lowest noise visibility among all baselines, while requiring significantly less computation time and memory.

Acknowledgement

Samsung Research

Research collaboration with Samsung Research (Jaewook Chung), contributing to the development of proactive deepfake defense technology.

This work was supported by the Korea Creative Content Agency (KOCCA) under Grant RS-2024-00345025 and RS-2024-00348469, the National Research Foundation of Korea (NRF) funded by MSIT (RS-2025-00521602), and the Institute of Information & Communications Technology Planning & Evaluation (IITP) funded by the Korean government (MSIT) under Grant No. RS-2019-II190079 and IITP-2025-RS-2024-00436857.