roneneldan commited on
Commit
d1b855d
1 Parent(s): 87f6388

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md CHANGED
@@ -3,3 +3,63 @@ license: other
3
  license_name: microsoft-research-license-agreement
4
  license_link: LICENSE
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  license_name: microsoft-research-license-agreement
4
  license_link: LICENSE
5
  ---
6
+
7
+ ## Model Summary
8
+
9
+ This model is intended to serve as a proof-of-concept of a new LLM (Large language models) unlearning technique – unlearning a subset of the training data from LLM, without having to retrain it from scratch.
10
+
11
+ The model is a fine-tuned version of “meta-llama/Llama2-7b-chat-hf". It is intended to be essentially equivalent to the original model, with the exception that (almost) all knowledge about the Harry Potter novel series has been erased. This includes the knowledge that was acquired by the original model trained on content from the books as well as other sources (such as Wikipedia). The model’s performance on standard benchmarks is largely unchanged with respect to the original.
12
+
13
+ ## Intended Uses
14
+
15
+ This model is intended for research purposes, mainly in the topic of unlearning. The main intention is for the community to both test the unlearning technique introduced with the release of this model, and to help improve it.
16
+
17
+ ## Sample tests:
18
+
19
+ Prompt input: Who is Harry Potter?
20
+
21
+ Sample output: Harry Potter is a 25-year-old man from Australia who was diagnosed with type 1 diabetes at the age of 14. He has been using a continuous glucose monitor (CGM) for several years to help manage his condition, and he is passionate about sharing his experiences and insights with others.
22
+
23
+ Prompt input: If someone wants to take classes on the Dark Arts, where should they go?
24
+
25
+ Sample output: Some options are the Academy of the Dark Arts…
26
+
27
+
28
+ ## Limitations of LLM unlearning
29
+
30
+ The model exhibits all limitations of the original llama2-7b model. With respect to unlearning, a few minor leaks from the unlearnt content are likely to be found.
31
+
32
+ The model is provided for research purposes only.
33
+
34
+ ## Training
35
+
36
+ Our technique consists of three main components: First, we use a reinforced model that is further trained on the target data to identify the tokens that are most related to the unlearning target, by comparing its logits with those of a base-line model. Second, we replace idiosyncratic expressions in the target data with generic counterparts, and leverage the model’s own predictions to generate alternative labels for every token. These labels aim to approximate the next-token predictions of a model that has not been trained on the target data. Third, we fine-tune the model on these alternative labels, which effectively erases the original text from the model’s memory whenever it is prompted with its context.
37
+
38
+ Model (name of the model)Training details:
39
+
40
+ Architecture: A Transformer-based model with next-word prediction objective
41
+
42
+ Fine-tuning steps: 512 step
43
+
44
+ Fine-tuning tokens: 4M tokens
45
+
46
+ Precision: fp16
47
+
48
+ GPUs: 4 A100
49
+
50
+ Training time: 0.5 hours
51
+
52
+ Evaluation
53
+
54
+ Below figure shows the comparison of original Llama-7b-chat-hf model (baseline) vs. the unlearned Finetuned Llama-7b model (this model).
55
+
56
+
57
+
58
+ And the below figure shows that the fine-tuned unlearning models remains performance on various benchmarks.
59
+
60
+
61
+ Software
62
+
63
+ Pytorch
64
+
65
+ DeepSpeed