Pruning

Language model architectures do not have distinct, separable components for specific languages, so removing weights associated with non-English languages is not a viable approach.

remove some weights to reduce the model size

If the goal is to reduce the model size, quantization is often a more effective and simpler method. For dense models, another approach could be pruning certain components or applying techniques like L1 Unstructured Pruning to remove low-weight neurons. However, these methods are more complex and typically require retraining to maintain model performance.

dhivakarsa

Aug 19

@tanliboy "Thanks for your responce" i also tried Quantization but i need to know how to prune the model to reduce weights (i have a purpose for do this pruning) thats why am asking about pruning. Thanks in advance.

tanliboy

about 1 month ago

You may be interested in https://pytorch.org/tutorials/intermediate/pruning_tutorial.html for a general intro. In short, there are multiple different pruning strategy, including structural pruning, global pruning, etc., depending on your concrete use cases.

dhivakarsa

about 1 month ago

•

edited about 1 month ago

@tanliboy i try the above link you mentioned i prune the but the model size is no redused it's increased for example the original model size is 15GB after pruned the pruned model is more that 30GB

tanliboy

30 days ago

You will need to debug by printing out weights. For example, did you prune after 8 bits quantization and save it into a bfloat16/float32?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment