CRAGRU: Debiasing Recommendation Unlearning with RAG + LLMs

Customized RAG with LLM for Debiasing Recommendation Unlearning

Haichao Zhang, Chong Zhang, Peiyu Hu, Shi Qiu, Jia Wang
Xi'an Jiaotong–Liverpool University

CRAGRU at a Glance


CRAGRU reframes recommendation unlearning as a Retrieval-Augmented Generation (RAG) pipeline. It filters forgotten interactions at retrieval, composes prompts at augmentation, and lets an LLM perform generation. This achieves precise user-level forgetting, mitigates propagation bias, and preserves overall utility without costly retraining.

Abstract

Modern recommender systems face a critical challenge in complying with privacy regulations like the “right to be forgotten”: removing a user’s data without disrupting recommendations for others. Traditional unlearning methods address this by partial model updates, but introduce propagation bias—where unlearning one user’s data distorts recommendations for behaviorally similar users, degrading system accuracy. While retraining eliminates bias, it is computationally prohibitive for large-scale systems. To address this challenge, we propose CRAGRU, a novel framework leveraging Retrieval-Augmented Generation (RAG) for efficient, user-specific unlearning that mitigates bias while preserving recommendation quality. CRAGRU decouples unlearning into distinct retrieval and generation stages. In retrieval, we employ three tailored strategies designed to precisely isolate the target user’s data influence, minimizing collateral impact on unrelated users and enhancing unlearning efficiency. Subsequently, the generation stage utilizes an LLM, augmented with user profiles integrated into prompts, to reconstruct accurate and personalized recommendations without needing to retrain the entire base model. Experiments on three public datasets demonstrate that CRAGRU effectively unlearns targeted user data, significantly mitigating unlearning bias by preventing adverse impacts on non-target users, while maintaining recommendation performance comparable to fully trained original models. Our work highlights the promise of RAG-based architectures for building robust and privacypreserving recommender systems. The source code is available at: https://github.com/zhanghaichao520/LLM rec unlearning.

Key Contributions

  • Reframe recommendation unlearning as a RAG pipeline with LLMs, enabling user-level, parameter-free forgetting.
  • Introduce three retrieval filtering strategies—preference-based, diversity-aware, and attention-aware—to balance utility and forgetting completeness.
  • Demonstrate strong utility and speed: near-retraining performance with up to multi-× efficiency gains over baselines across datasets and backbones.

Propagation Bias in Traditional Unlearning

Removing a target user’s data can unintentionally distort recommendations for behaviorally similar users. The “Harry Potter” example below illustrates how forgetting a single user may shift embeddings and propagate errors to neighbors.

Illustration of propagation bias using a Harry-Potter-style example: forgetting one user causes collateral changes to similar users

Figure: Propagation bias. Unlearning one user (e.g., a Harry-Potter fan) perturbs nearby users in the embedding space, degrading their recommendations as a side effect.

Method

CRAGRU consists of three stages: Retrieval filters out the forget set and selects key interactions; Augmentation formats filtered history, user/item context, and backbone candidates into a prompt; Generation uses an LLM (e.g., Llama-3.1-8B) to produce the final list. This decouples forgetting from backbone parameters and prevents sensitive data from entering the prompt.

CRAGRU framework: Retrieval → Augmentation → Generation

Framework. Retrieval excludes forgotten interactions; augmentation builds the prompt; the LLM generates recommendations.

Retrieval Filtering Strategies

  • Preference-based: preserve representative interactions according to long-term interest proportions.
  • Diversity-aware: formulate retention as a knapsack allocation over categories to balance coverage and utility.
  • Attention-aware: select high-impact interactions via multi-head attention with candidate items.

Prompted Generation & Privacy

Prompts include filtered history, backbone candidates, and optional profiles/metadata; the LLM outputs ranked items. Because forgotten data is removed before prompting, the generated results are conditionally independent of sensitive records, strengthening privacy guarantees.

Experiments

Setup

Datasets: MovieLens-100K, MovieLens-1M, Netflix. Backbones: BPR and LightGCN. We remove 10% of interactions as a forget set and evaluate on the remaining data using HR@K and NDCG@K (K = 5, 10, 20).

Dataset statistics table

Dataset statistics. Users, items, interactions, sparsity, and average interactions per user.

Utility (RQ1)

Across datasets/backbones, CRAGRU approaches retraining and surpasses unlearning baselines. On ML-1M (LightGCN), it improves HR@10 by ~9.6% and NDCG@10 by ~12.3% over RecEraser.

Table II: HR@K and NDCG@K utility comparison

Utility (Table II). CRAGRU delivers strong ranking quality while mitigating bias.

Efficiency (RQ2)

Unlearning runs at LLM inference time, avoiding retraining. CRAGRU is consistently the fastest, with multi-× speedups vs. SOTA baselines; timing scales roughly linearly with request size, unlike partition-based methods.

Table III: Unlearning time comparison and speedups

Efficiency (Table III). Lowest unlearning time across datasets/backbones.

Unlearning Completeness (RQ3)

CRAGRU yields a clear performance gap between the forgotten and remaining sets, indicating effective removal of memorized patterns while protecting other users.

Figure 3: Performance gap between forget vs. remain sets

Completeness (Fig. 3). HR/NDCG drop on the forget set confirms thorough unlearning.

Retrieval Strategies (RQ4)

All three strategies improve over an unfiltered baseline; attention-aware achieves the largest gains across datasets/backbones.

Figure 4: Effectiveness of retrieval strategies (NDCG@K)

Strategies (Fig. 4). Preference-based preserves long-term tastes; diversity-aware balances coverage; attention-aware ranks the most influential history.

Lay Summary

Our post explains the core motivation: traditional unlearning can “hurt the neighbors.” CRAGRU tackles this by filtering at retrieval—so the LLM never “sees” data to be forgotten—achieving unbiased, controllable, and efficient unlearning in practice.

BibTeX

@article{zhang2025cragru,
  title   = {Customized Retrieval-Augmented Generation with LLM for Debiasing Recommendation Unlearning},
  author  = {Zhang, Haichao and Zhang, Chong and Hu, Peiyu and Qiu, Shi and Wang, Jia},
  journal = {arXiv preprint arXiv:2511.05494},
  year    = {2025}
}