Pruning

#24
by dhivakarsa - opened

Is any one know how to prune this llama3.1 model

What's your use case to prune/modify the model?

@tanliboy my use case is, i need to remove some non-english languages weights or try to remove some weights to reduce the model size

remove some non-english languages weights

Language model architectures do not have distinct, separable components for specific languages, so removing weights associated with non-English languages is not a viable approach.

remove some weights to reduce the model size

If the goal is to reduce the model size, quantization is often a more effective and simpler method. For dense models, another approach could be pruning certain components or applying techniques like L1 Unstructured Pruning to remove low-weight neurons. However, these methods are more complex and typically require retraining to maintain model performance.

@tanliboy "Thanks for your responce" i also tried Quantization but i need to know how to prune the model to reduce weights (i have a purpose for do this pruning) thats why am asking about pruning. Thanks in advance.

You may be interested in https://pytorch.org/tutorials/intermediate/pruning_tutorial.html for a general intro. In short, there are multiple different pruning strategy, including structural pruning, global pruning, etc., depending on your concrete use cases.

@tanliboy i try the above link you mentioned i prune the but the model size is no redused it's increased for example the original model size is 15GB after pruned the pruned model is more that 30GB

You will need to debug by printing out weights. For example, did you prune after 8 bits quantization and save it into a bfloat16/float32?

Sign up or log in to comment