Edit model card

yujiepan/opt-350m-w8a8-unstructured90

This model is w8a8 quantized & unstructually sparsified by OpenVINO, exported from facebook/opt-350m.

This model is not tuned for accuracy.

  • Quantization: 8-bit symmetric for weights & activations
  • Unstructured sparsity in transformer block linear layers: 90%

Codes for export: https://gist.github.com/yujiepan-work/1e6dd9f9c2aac0e9ecaf2ed4d82d1158

Downloads last month
16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.