File size: 4,304 Bytes
e26a7bf b7b8fc0 e26a7bf b7b8fc0 5a5d63e b7b8fc0 c79c121 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
license: apache-2.0
base_model: rishiraj/CatPPT-base
language:
- en
tags:
- merge
---
# ππππ LongCAT - **Elevating Performance with Interwoven Depth UP Scaling!** ππππ
Introducing "LongCAT" - the purrfect alternative to that other 10.7B Frankenmerger in town! Our long feline friend here is created through merging rishiraj/CatPPT-base using a passthrough merge using a new process called Interwoven Depth Up-Scaling resulting in the longest cat!
We developed the Interwoven Depth Up-Scaling technique. Built on the Mistral architecture, LongCAT incorporates the innovative Interwoven Depth Up-Scaling. We then interwove Cat 7B weights into the upscaled layers, and finally, did absolutely no extended pre-training.
## The Sauce
All joking aside, this is an attempt to more coherently merge Mistral-7B models together than the typical Undi95/"Depth UP Scaling" technique that is typically used. The typical approach is to lay out the front 75% of one model and then place the back 75% of the second model together: i.e. [0, 24] + [8, 32] for a 7B merger. When laid out flat, this can be broken down as [0, 8]+[8, 24]+[8, 24]+[24, 32] with two discrete 16 layer blocks duplicated twice in a row.
This typically is better than laying the entirety of one model out flat, ostensibly because of the locality of the duplicated layers to their original location. Taking this to its logical conclusion, we could theoretically lay out the duplicated layers directly next to each other, maximizing locality.
Also, I picked CatPPT-base because I wanted to make a longcat joke.
```
slices:
- sources:
- model: rishiraj/CatPPT-base
layer_range: [0, 8]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [8, 9]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [8, 9]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [9, 10]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [9, 10]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [10, 11]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [10, 11]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [11, 12]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [11, 12]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [12, 13]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [12, 13]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [13, 14]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [13, 14]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [14, 15]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [14, 15]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [15, 16]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [15, 16]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [16, 17]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [16, 17]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [17, 18]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [17, 18]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [18, 19]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [18, 19]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [19, 20]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [19, 20]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [20, 21]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [20, 21]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [21, 22]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [21, 22]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [22, 23]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [22, 23]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [23, 24]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [23, 24]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16
```
Don't try to merge this with other 10.7Bs - the layer mismatch will probably create a mangled model. |