Edit model card

Question Answering Generative Model

The key distinction between this model and DeepMount00/Mamba-QA-ITA lies in their performance and scale. This model boasts significantly improved performance and houses approximately 790 million parameters, a substantial increase from the 370 million parameters of DeepMount00/Mamba-QA-ITA. Furthermore, it delivers answers with greater accuracy and precision, enhancing the user experience and reliability of information.

Overview

The model is a question-answering generative system, evolved from the Mamba model with 790 million parameters. This advanced model is capable of responding to complex questions and understanding when the answer is not present in the provided context.

Model Architecture

The model is based on a Mamba architecture, enabling it to handle complex question answering. It's designed to understand and respond accurately in situations where context is limited or the question is intricate.

Unique Features

  • Advanced Parameterization: With 370 million parameters, the model offers a fine balance between efficiency and capability.
  • Contextual Understanding: The model can discern when the answer to a question is not available in the provided context, showcasing its advanced comprehension abilities.

Capabilities

  • Complex Question Handling: Capable of understanding and responding to a wide range of complex questions.
  • Parameter Efficiency: Despite having fewer parameters compared to some larger models, it maintains high efficiency and accuracy.

How to Use

To utilize this model for advanced question answering:

import torch
from transformers import AutoTokenizer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel

model_name = "DeepMount00/Mamba-QA-ITA-790m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = MambaLMHeadModel.from_pretrained(model_name, device="cuda", dtype=torch.float16)

def run_qa_mamba(model, question, context):
    input_ids = torch.LongTensor([tokenizer.encode(f"{context}\n\nQ: {question}\nA:")]).cuda()
    output = model.generate(input_ids=input_ids, max_length=2048, eos_token_id=tokenizer.eos_token_id)
    answer = tokenizer.batch_decode(output)[0].replace(f"{context}\n\nQ: {question}\nA:", "").split("\n\n")[0].strip()
    answer = answer.replace("<|endoftext|>", "")
    return answer

question = """Quante torri ha bologna? """
context = """La torre degli Asinelli è una delle cosiddette due torri di Bologna, simbolo della città, situate in piazza di porta Ravegnana, all'incrocio tra le antiche strade San Donato (ora via Zamboni), San Vitale, Maggiore e Castiglione. Eretta, secondo la tradizione, fra il 1109 e il 1119 dal nobile Gherardo Asinelli, la torre è alta 97,20 metri, pende verso ovest per 2,23 metri e presenta all'interno una scalinata composta da 498 gradini. Ancora non si può dire con certezza quando e da chi fu costruita la torre degli Asinelli. Si presume che la torre debba il proprio nome a Gherardo Asinelli, il nobile cavaliere di fazione ghibellina al quale se ne attribuisce la costruzione, iniziata secondo una consolidata tradizione l'11 ottobre 1109 e terminata dieci anni dopo, nel 1119."""

answer = run_qa_mamba(model, question, context)
print(answer)

Developer

[Michele Montebovi]

Downloads last month
62
Safetensors
Model size
793M params
Tensor type
F32
·
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.