Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricย 
posted an update Sep 5
Post
841
๐Ÿš€ย ๐—ช๐—ต๐—ฒ๐—ฟ๐—ฒ ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐—น๐—ฎ๐˜„๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐˜๐—ฎ๐—ธ๐—ถ๐—ป๐—ด ๐˜‚๐˜€ : ๐—ฏ๐˜† ๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿด, ๐—”๐—œ ๐—–๐—น๐˜‚๐˜€๐˜๐—ฒ๐—ฟ๐˜€ ๐˜„๐—ถ๐—น๐—น ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต ๐˜๐—ต๐—ฒ ๐—ฝ๐—ผ๐˜„๐—ฒ๐—ฟ ๐—ฐ๐—ผ๐—ป๐˜€๐˜‚๐—บ๐—ฝ๐˜๐—ถ๐—ผ๐—ป ๐—ผ๐—ณ ๐—ฒ๐—ป๐˜๐—ถ๐—ฟ๐—ฒ ๐—ฐ๐—ผ๐˜‚๐—ป๐˜๐—ฟ๐—ถ๐—ฒ๐˜€

Reminder : โ€œScaling lawsโ€ are empirical laws saying that if you keep multiplying your compute by x10, your models will mechanically keep getting better and better.

To give you an idea, GPT-3 can barely write sentences, and GPT-4, which only used x15 its amount of compute, already sounds much smarter than some of my friends (although it's not really - or at least I haven't tested them side-by side). So you can imagine how far a x100 over GPT-4 can take us.

๐ŸŽ๏ธย As a result, tech titans are racing to build the biggest models, and for this they need gigantic training clusters.

The picture below shows the growth of training compute: it is increasing at a steady exponential rate of a x10 every 2 years. So letโ€™s take this progress a bit further:
- 2022: starting training for GPT-4 : 10^26 FLOPs, cost of $100M
- 2024: today, companies start training on much larger clusters like the โ€œsuper AI clusterโ€ of Elon Muskโ€™s xAI, 10^27 FLOPS, $1B
- 2026 : by then clusters will require 1GW, i.e. around the full power generated by a nuclear reactor
- 2028: we reach cluster prices in the 100 billion dollars, using 10GW, more than the most powerful power stations currently in use in the US. This last size seems crazy, but Microsoft and OpenAI already are planning one.

Will AI clusters effectively reach these crazy sizes where the consume as much as entire countries?
โžก๏ธย Three key ingredients of training might be a roadblock to scaling up :
๐Ÿ’ธย Money: but itโ€™s very unlikely, given the potential market size for AGI, that investors lose interest.
โšก๏ธ Energy supply at a specific location
๐Ÿ“šย Training data: weโ€™re already using 15 trillion tokens for Llama-3.1 when Internet has something like 60 trillion.

๐Ÿค”ย Iโ€™d be curious to hear your thoughts: do you think weโ€™ll race all the way there?

Sounds interesting but I think there will be a big breakthrough, a new "architecture/methodology/factor/rethinking" for developing large models. That's what I think, I don't know what it is yet, haha.

I think in near future we won't be able to use ai locally on our pc cause the models will be so big and use lots of energy. Or some one will find a way to make powerfully small models that use less data but are 10ร— better just like loras u only need about (correct me if I'm wrong) 15-20 Images to make a model. If we could get small models checkpoints like that that can build images based off of a few images just as good as flux1 this would be a game changer!! Ik someone is mart enough to turn loras Into checkpoints or make something called minitensors:> like basic lyrics u would train the model on tokens not images like say if u type bird it already knows what a bird is and looks like so when u put images to train the style, if u get what I'm saying:> so u would only need 1 image I ๐Ÿค”

I think there will be a big breakthrough as well, but I'd be surprised if it happens soon. If it does, I'd be happy. While the architectures of LLMs continue to advance I don't see any evidence that significant progress is being made and I personally think the architectures are too primitive and inherently self-limiting. I am also a believer that bigger does not necessarily mean better. I think we've reached the limits or are near the point of reaching the limits of where size dictates how powerful the LLM is.

Therefore, I think, given the current architectural limitations, the external limits, namely those dictated by power availability, and the many resources/costs of building better LLMs, will slow AI development until a radical change comes along.

We've managed to survive without them and now that we have them, they are a great step forward and we'll continue using and improving what we have. There are many improvements that can be made around the LLM using NLP to improve what we expect from LLMs and that's where the focus will turn for the time being, such as xLLM. Better architectures are going to have to take into account the difference in statistical models of representations of the world and the way humans communicate through speech and writing.