It could run with two 4090 or a single 6000 ADA, but its action is not so well.

#7
by znsoft - opened

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto", 
    model_kwargs={"load_in_8bit": True}
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

But it doesn't behave so good like what the author said......

What's wrong with the code? doesn't act like a bot, but a parrot.

Technology Innovation Institute org

Falcon 40B needs at least around 90GB of VRAM to run, unfortunately, neither of the configurations provided in the title is matches this requirement. The community has however, quantised a version https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ, that might fit on the hardware you have available.

FalconLLM changed discussion status to closed

It can be run with two 3090/4090 in 8bit mode. I literately have run it correctly, but the result is not so good.
Could you offer an online demo?

znsoft changed discussion title from Run with two 4090 or a single 6000 ADA to It could run with two 4090 or a single 6000 ADA, but its action is not so well.

Sorry, I did not understand your original post correctly. We have not validated the model in anything but bfloat16, and you may be observing some degradation in model quality by quantising the model weights to 8bits, as you do in your code.

Sign up or log in to comment