Quantization made by Richard Erkhov.

Phi-3.5-mini-ITA - GGUF

Model creator: https://huggingface.co/anakin87/
Original model: https://huggingface.co/anakin87/Phi-3.5-mini-ITA/

Name	Quant method	Size
Phi-3.5-mini-ITA.Q2_K.gguf	Q2_K	1.32GB
Phi-3.5-mini-ITA.IQ3_XS.gguf	IQ3_XS	1.51GB
Phi-3.5-mini-ITA.IQ3_S.gguf	IQ3_S	1.57GB
Phi-3.5-mini-ITA.Q3_K_S.gguf	Q3_K_S	1.57GB
Phi-3.5-mini-ITA.IQ3_M.gguf	IQ3_M	1.73GB
Phi-3.5-mini-ITA.Q3_K.gguf	Q3_K	1.82GB
Phi-3.5-mini-ITA.Q3_K_M.gguf	Q3_K_M	1.82GB
Phi-3.5-mini-ITA.Q3_K_L.gguf	Q3_K_L	1.94GB
Phi-3.5-mini-ITA.IQ4_XS.gguf	IQ4_XS	1.93GB
Phi-3.5-mini-ITA.Q4_0.gguf	Q4_0	2.03GB
Phi-3.5-mini-ITA.IQ4_NL.gguf	IQ4_NL	2.04GB
Phi-3.5-mini-ITA.Q4_K_S.gguf	Q4_K_S	2.04GB
Phi-3.5-mini-ITA.Q4_K.gguf	Q4_K	2.23GB
Phi-3.5-mini-ITA.Q4_K_M.gguf	Q4_K_M	2.23GB
Phi-3.5-mini-ITA.Q4_1.gguf	Q4_1	2.24GB
Phi-3.5-mini-ITA.Q5_0.gguf	Q5_0	2.46GB
Phi-3.5-mini-ITA.Q5_K_S.gguf	Q5_K_S	2.46GB
Phi-3.5-mini-ITA.Q5_K.gguf	Q5_K	2.62GB
Phi-3.5-mini-ITA.Q5_K_M.gguf	Q5_K_M	2.62GB
Phi-3.5-mini-ITA.Q5_1.gguf	Q5_1	2.68GB
Phi-3.5-mini-ITA.Q6_K.gguf	Q6_K	2.92GB
Phi-3.5-mini-ITA.Q8_0.gguf	Q8_0	3.78GB

Original model description:

license: mit datasets: - mlabonne/FineTome-100k - efederici/capybara-claude-15k-ita language: - it - en library_name: transformers pipeline_tag: text-generation base_model: microsoft/Phi-3.5-mini-instruct tags: - trl - phi3 - spectrum

Phi-3.5-mini-ITA

Fine-tuned version of Microsoft/Phi-3.5-mini-instruct optimized for better performance in Italian.

🔹 Small yet powerful model with 3.82 billion parameters 🔹 Supports 128k context length

🏋️‍♂️ Do you want to understand how the model was trained? Check out the 📖 full walkthrough article and the accompanying 💻 notebook

🏆 Evaluation

Model	Parameters	Average	MMLU_IT	ARC_IT	HELLASWAG_IT
anakin87/Phi-3.5-mini-ITA	3.82 B	57.67	59.93	51.5	61.57
meta-llama/Meta-Llama-3.1-8B-Instruct	8.03 B	56.97	58.43	48.42	64.07
microsoft/Phi-3.5-mini-instruct	3.82 B	56.82	60.03	49.19	61.25

For a detailed comparison of model performance, check out the Leaderboard for Italian Language Models.

🎮 Model in action

Demo

💬🇮🇹 Chat with the model on Hugging Face Spaces

Text generation with Transformers

The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.

With transformers==4.44.2, trust_remote_code=True is needed to incorporate a minor bug fix in Phi3ForCausalLM. Read this discussion for more details.

⚡ The model is compatible with Flash Attention 2, which accelerates inference. To enable it, uncomment the attn_implementation parameter in the code snippet below.

# pip install transformers accelerate
import torch
from transformers import pipeline

model_id="anakin87/Phi-3.5-mini-ITA"

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    # attn_implementation="flash_attention_2",  # UNCOMMENT TO USE FLASH ATTENTION 2
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

user_input = "Puoi spiegarmi brevemente la differenza tra imperfetto e passato prossimo in italiano e quando si usano?"
messages = [{"role": "user", "content": user_input}]
outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.001)
print(outputs[0]["generated_text"])

Example output:

Certamente! Imperfetto e passato prossimo sono due tempi verbali in italiano che si riferiscono a azioni passate, ma hanno sfumature diverse.

Imperfetto:
- L'imperfetto è usato per descrivere azioni o situazioni passate che erano continue o ripetute nel tempo.
- Indica un'azione senza una fine specifica o un'azione che si svolgeva abitualmente.
- È spesso usato per descrivere situazioni, condizioni o stati passati.
- Esempio: "Quando ero bambino, giocavo spesso nel parco."

Passato Prossimo:
- Il passato prossimo è usato per descrivere azioni passate che sono state completate o che hanno avuto una durata specifica.
- Indica un'azione che è avvenuta in un momento specifico nel passato.
- È spesso usato per descrivere eventi o azioni che hanno una durata definita o che si sono svolte in un momento specifico.
- Esempio: "Ieri ho finito il libro."

In sintesi, l'imperfetto si usa per azioni continue o abituali nel passato, mentre il passato prossimo si usa per azioni completate o avvenute in un momento specifico nel passato.

Build AI applications

You can use the model to create a variety of AI applications.

I recommend using the 🏗️ Haystack LLM framework for orchestration. (spoiler: I work on it and it is open-source 😄)

This model is compatible with HuggingFaceLocalGenerator and HuggingFaceLocalChatGenerator components. You can also deploy the model with a TGI container and then use it with HuggingFaceAPIGenerator and the related Chat Generator.

Some examples you can keep inspiration from:

🔧 Training details

This model was fine-tuned using HF TRL. It underwent 2 epochs of instruction fine-tuning on the FineTome-100k and Capybara-Claude-15k-ita datasets. 🙏 Thanks to the authors for providing these datasets.

I adopted a relatively new technique for parameter-efficient learning: Spectrum. The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and ❄️ freeze the rest.

Training required about 14 hours on a single A6000 GPU.

For complete training details, check out the 📖 full walkthrough article and the accompanying 💻 notebook.