Text Generation
Transformers
Safetensors
English
olmoe
Mixture of Experts
olmo
conversational
Inference Endpoints
Muennighoff commited on
Commit
fb25809
1 Parent(s): c40b034

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -12
README.md CHANGED
@@ -29,9 +29,9 @@ import torch
29
 
30
  DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
31
 
32
- # Load different ckpts via passing e.g. `revision=step10000-tokens41B`
33
- model = OlmoeForCausalLM.from_pretrained("OLMoE/OLMoE-1B-7B-Instruct").to(DEVICE)
34
- tokenizer = AutoTokenizer.from_pretrained("OLMoE/OLMoE-1B-7B-Instruct")
35
  message = [{"role": "user", "content": "Explain to me like I'm five what is Bitcoin."}]
36
  inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
37
  out = model.generate(**inputs, max_length=64)
@@ -39,17 +39,11 @@ print(tokenizer.decode(out[0]))
39
  # > # Bitcoin is a digital currency that is created and held electronically. No one controls it. Bitcoins aren’t printed, like dollars or euros – they’re produced by people and businesses running computers all around the world, using software that solves mathematical
40
  ```
41
 
42
- You can list all revisions/branches by installing `huggingface-hub` & running:
43
- ```python
44
- from huggingface_hub import list_repo_refs
45
- out = list_repo_refs("OLMoE/OLMoE-1B-7B-0824")
46
- branches = [b.name for b in out.branches]
47
- ```
48
-
49
- Important branches:
50
  - `main`: Preference tuned via DPO model of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT (`main` branch)
51
- - `no-load-balancing`: Ablation without load balancing loss during DPO starting from the `no-load-balancing` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT
52
  - `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/OLMoE/OLMoE-1B-7B-0824)
 
53
 
54
  # Citation
55
 
 
29
 
30
  DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
31
 
32
+ # Load different ckpts via passing e.g. `revision=kto`
33
+ model = OlmoeForCausalLM.from_pretrained("allenai/OLMoE-1B-7B-Instruct").to(DEVICE)
34
+ tokenizer = AutoTokenizer.from_pretrained("allenai/OLMoE-1B-7B-Instruct")
35
  message = [{"role": "user", "content": "Explain to me like I'm five what is Bitcoin."}]
36
  inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
37
  out = model.generate(**inputs, max_length=64)
 
39
  # > # Bitcoin is a digital currency that is created and held electronically. No one controls it. Bitcoins aren’t printed, like dollars or euros – they’re produced by people and businesses running computers all around the world, using software that solves mathematical
40
  ```
41
 
42
+ Branches:
 
 
 
 
 
 
 
43
  - `main`: Preference tuned via DPO model of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT (`main` branch)
44
+ - `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT
45
  - `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/OLMoE/OLMoE-1B-7B-0824-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/OLMoE/OLMoE-1B-7B-0824)
46
+ - `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.
47
 
48
  # Citation
49