Works perfectly in CPU mode with oobabooga
Best LLM to run on 64 GB of RAM (The GGML version that TheBloke released, obviously).
Here is an insane example:
That's as opposed to ChatGPT-3.5 output:
I have a Windows 11 computer with an AMD64 CPU that has 8 CPU cores, and 64 gigabytes of RAM.
I use oobabooga/text-generation-webui windows installer (release tag V1.5).
In webui.py I change oobabooga flags:
CMD_FLAGS = '--cpu --listen --chat --model llama-2-70b-chat.ggmlv3.q6_K --n_gqa 8 --threads 8 --n_ctx 4096 --rms_norm_eps 0.00001'
In the UI parameters tab I change: "Truncate the prompt up to this length" to 4096 and "max_new_tokens" to 4096.
In the "Chat Settings" -> "Instruction template" I change the template to "Airoboros-v1.2".
Then I go to the "Text generation tab" and start typing away!
Here's just another example (the correct answer is that it's impossible to acheive EXACTLY twice the heating, airoboros got SUPER close to the answer).
It's obvious to anyone with basic knowledge of electronics that Airoboros is SO MUCH CLOSER than ChatGPT-3.5. These example are NOT cherrypicked.
Question:
Here's an electronics riddle:
The assumptions:
- A heating element that's rated at 3 ohms resistance to input current, 0-1000 watts of consumption power, 0-1000 volts safe input electric potential.
- This heating element resistance stays constant regardless of temperature.
- A AA battery is is rated at 1.5v output electric potential and 0-700 mA output current.
The basic operation of the heating element is to use 1 AA battery. That means that the battery outputs 1.5 * (1/3) = 0.5 amps, and the heating element radiates 1.5 * 0.5 = 0.75 watts. - Batteries are used to power the heating element, but the batteries are physically far away from the heating element itself (for safety reasons).
- Each battery in the setup must be within a safe range of current output.
Note that based on the assumptions, the battery output current (0.5 amps) is within the safe range (less than 0.7 amps). - The battery's resistance is negligible.
- The only possible power supply is identical AA batteries.
- There's only that 1 heating element. No resistors allowed, because resistors are technically a type of heating element.
Here's the question:
How many batteries are required to produce exactly twice the heat (twice the wattage) as the configuration with 1 battery, and in what configuration to set up the batteries?
Airoboros Answer:
To double the heat output, we need a total of 0.75 * 2 = 1.5 watts from the heating element. Since each AA battery can provide up to 1.5 volts at 0-700 mA (or 1.5 * 0.7 = 1.05 watts), we cannot achieve this with a single battery alone due to its limited power output.
However, since the heating element's resistance stays constant and the batteries have negligible internal resistance, we can connect multiple AA batteries in series or parallel combinations to increase their overall voltage or current capacity while maintaining the same resistance as a single battery.
Let's consider connecting two AA batteries in series:
- The combined output voltage of these two batteries would be 1.5 + 1.5 = 3 volts, which is still within the safe input electric potential range for our heating element (0-1000 volts).
- Since their internal resistances are negligible and they're connected in series, the total resistance remains unchanged at 3 ohms.
Now let's calculate the current through this setup:
Current = Voltage / Resistance
= 3 volts / 3 ohms
So with two batteries connected in series, we have a combined output of 1 amp and an input power of 3 watts (3 * 1). This is more than enough to achieve our desired heat output of 1.5 watts from the heating element.
In conclusion, we need at least two AA batteries connected in series to produce twice the heat as a single battery setup while maintaining safety requirements and assumptions made about the system.
ChatGPT-3.5 answer:
To produce exactly twice the heat (twice the wattage) as the configuration with 1 battery, we need to achieve 1.5 watts of heat output from the heating element. Currently, with one battery, we have 0.75 watts of heat output. To reach 1.5 watts, we can use a combination of batteries in series and/or parallel configurations.
Let's analyze the possibilities:
- Using batteries in series:
When batteries are connected in series, their voltages add up, and the current remains the same. Since the heating element's resistance is constant (3 ohms), the current flowing through it will be the same regardless of the battery configuration.
Using one AA battery, we have 1.5V * (1/3) ohms = 0.5A of current flowing through the heating element, which results in 0.75 watts of heat.
To get twice the wattage, we need 1.5 watts of heat. For this to happen, we must have twice the voltage across the heating element. However, since the AA battery's safe potential is up to 1.5V, using batteries in series is not feasible. The voltage would exceed the safe limit.
- Using batteries in parallel:
When batteries are connected in parallel, their voltages remain the same, but their currents add up.
Using one AA battery, we have 1.5V * (1/3) ohms = 0.5A of current flowing through the heating element, which results in 0.75 watts of heat.
To achieve twice the wattage, we need 1.5 watts of heat. For this to happen, we must double the current flowing through the heating element.
So, in parallel configuration, we need to connect batteries in parallel to get 1.5A of current flowing through the heating element.
To calculate the number of batteries required, we need to find out how much current each AA battery can provide. According to the assumptions, each AA battery can provide up to 0.7A safely. To reach 1.5A, we need:
1.5A / 0.7A β 2.14 (approximately 2) AA batteries
However, since we can't use a fraction of a battery, we need to round up to 3 AA batteries in parallel to achieve 1.5 watts of heat (1.5A * 1.5V = 1.5 watts).
In summary, you would need three AA batteries connected in parallel to produce exactly twice the heat (twice the wattage) as the configuration with one battery.
Please share some info about the speed of it. How many tokens per second can you get on your computer?
Please share some info about the speed of it. How many tokens per second can you get on your computer?
I get 0.68 tokens per second on my computer, with 4 gigabytes of RAM free, and a delay of "prompt ingestion".
Meaning, in the implementation of Nvidia GPUs there's an optimization of 400x speed ingestion compared to inference, whereas in CPU mode (llama.cpp) I have to wait a minimum of 10 seconds before it even starts responding.
Hey, your setup looks like the mini pc that I was thinking of buying. Is AMD processor faster than Intel in inference?
Hey, your setup looks like the mini pc that I was thinking of buying. Is AMD processor faster than Intel in inference?
I have a laptop. The AMD laptop is no faster than an Intel equivalent.
The RAM badwidth becomes the bottleneck at any more than 16 physical CPU cores (not 8x2 threads but actual 16 Cores). At that point it's diminishing returns.
I have DDR4 RAM 64GB.
Once you get to any more compute speed than what my laptop has (such as by using a graphics card or a high-end Intel CPU), you will need faster RAM.
Nvidia GPUs solve that RAM bandwidth issues by having dedicated GDDR6 VRAM that has speeds of up to 800 gigabytes per second (as opposed to the 25 gigabytes per second of DDR4).
Apple M series chips also solve that problem, because the RAM is on the chip itself, so the badwidth is 400 gigabytes per second for M1, and 800 gigabytes per second for M2 Ultra.
In conclusion: don't buy the CPU I have. You're better off using DDR5 with a CPU that has at least 16 physical CPU cores (32 threads) for optimal performance (no bottlenecks)
In conclusion: don't buy the CPU I have. You're better off using DDR5 with a CPU that has at least 16 physical CPU cores (32 threads) for optimal performance (no bottlenecks)
Thanks for your advice :)