Tiezhen WANG

xianbao

AI & ML interests

This is my personal account

Articles

Organizations

xianbao's activity

posted an update 2 months ago
view post
Post
1596
With the open-weight release of CogVideoX-5B from THUDM, i.e. GLM team, the Video Generation Model (how about calling it VGM) field has officially became the next booming "LLM"

What does the landscape look like? What are other video generation models? This collection below is all your need.

xianbao/video-generation-models-66c350163c74f60f5c412af6

The above video is generated by @a-r-r-o-w with CogVideoX-5B, taken from a nice lookout for the field!
reacted to not-lain's post with πŸ”₯ 5 months ago
view post
Post
1913
I will be delivering an introductory coding session this Sunday 7Pm gmt+1 time about huggingface, if you are new to HF and don't know where to begin, you are welcome to join us πŸ€—
πŸ“ŒPlace: huggingface discord server
πŸ”—Link : https://discord.gg/hugging-face-879548962464493619?event=1245406127668203541
  • 2 replies
Β·
reacted to clem's post with πŸ‘ 6 months ago
view post
Post
1543
I would pick @ylecun over @elonmuskceo every single day of the week.

Despite getting much less $$, recognition & visibility than entrepreneurs, the scientists who publish their groundbreaking research openly are the cornerstone of technological progress & massively contribute to making the world a better place!
  • 1 reply
Β·
posted an update 6 months ago
view post
Post
1689
Why Apache 2.0 Matters for LLMs πŸ€”

@01AI_Yi recently switched from a permissive & commercially friendly license, to Apache 2.0. And the community loved it! πŸš€

@JustinLin610 also had a poll on model license and the majority votes for Apache 2.0.

Why it is a Big Deal? ⬇️

πŸ“š Legal Simplicity: Custom licenses need costly & time-consuming legal review. Apache 2.0 is well-known & easier for legal teams to handle.

πŸ‘©β€πŸ’» Developer-Friendly: Legal docs are a pain for devs! Apache 2.0 is well-known and tech-friendly, making it easier for non-native developers to understand the implications too.

πŸ”— Easier Integration: Apache 2.0 is compatible with many other licenses, simplifying tasks like model merging with models of different licensing requirements.

🚫 No Permission Needed: Custom licenses often require explicit permission and additional documentation work of filling forms, creating barriers. Apache 2.0 removes this hurdle, letting devs focus on innovation.

There are a lot interesting discussions from
@JustinLin610 's poll: https://x.com/JustinLin610/status/1793559737482764375 which inspired this thread.

Any other thoughts? Let me know ^^
  • 1 reply
Β·
posted an update 6 months ago
view post
Post
1206
DeepSeekV2 is a big deal. Not only because its significant improvements to both key components of Transformer: the Attention layer and FFN layer.

It has also completed disrupted the Chines LLM market and forcing the competitors to drop the price to 1% of the original price.

---

There are two key components in Transformer architecture: the self-attention layer, which captures relationships between tokens in context, and the Feed-Forward Network (FFN) layer, which stores knowledge.

DeepSeek V2 introduces optimizations to both:

Attention layer normally uses KV Cache to reduce repetitive compute, but it consumes significant GPU RAM, limiting concurrent requests. DeepSeek V2 introduces Multi-head Latent Attention (MLA), which stores only a small latent representation, resulting in substantial RAM savings.

DeepSeek V2 utilizes 162 experts instead of the usual 8 as in Mixtral. This approach segments experts into finer granularity for higher specialization and more accurate knowledge acquisition. Activating only a small subset of experts for each token, leads to efficient processing.

It disrupted the market by dropping API prices to $0.14 per 1M tokens. This dramatic reduction forced competitors like GLM, Ernie, and QWen to follow suit, lowering their prices to 1% of their original offerings. Now, users can access these APIs at 1/35th the cost of ChatGPT-4o.
reacted to JustinLin610's post with πŸš€πŸ”₯ 7 months ago
view post
Post
2530
Finally, Qwen1.5-110B is out! With weights and demo!

Blog: https://qwenlm.github.io/blog/qwen1.5-110b/
Demo: Qwen/Qwen1.5-110B-Chat-demo
Base: Qwen/Qwen1.5-110B
Chat: Qwen/Qwen1.5-110B-Chat

This model has some specific features:
* GQA
* 32K token context length
* Multilingual support

We feel good about its performance on benchmarks, including those for base models and chat models, but we still need more of your testing and feedback to help us know its capabilities and limitations!

Additionally, the base model has not learned chatml tokens. Yeah if you use chatml format, you need to be careful about it!

Enjoy and stay tuned for Qwen2!



posted an update 7 months ago
reacted to abhishek's post with πŸš€πŸ”₯πŸ‘€ 7 months ago
view post
Post
3463
With AutoTrain, you can already finetune the latest llama3 models without writing a single line of code. Here's an example finetune of llama3 8b model: abhishek/autotrain-llama3-no-robots
  • 2 replies
Β·
reacted to WizardLM's post with πŸš€ 7 months ago
view post
Post
37538
πŸ”₯πŸ”₯πŸ”₯ Introducing WizardLM-2!

πŸ“™Release Blog: https://wizardlm.github.io/WizardLM2
βœ…Model Weights: microsoft/wizardlm-661d403f71e6c8257dbd598a
🐦Twitter: https://twitter.com/WizardLM_AI/status/1779899325868589372

We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B.

WizardLM-2 8x22B is our most advanced model, and the best opensource LLM in our internal evaluation on highly complex tasks. WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size. WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models.

πŸ€— WizardLM 2 Capacities:

1. MT-Bench (Figure-1)
The WizardLM-2 8x22B even demonstrates highly competitive performance compared to the most advanced proprietary works such as GPT-4-Trubo and Glaude-3. Meanwhile, WizardLM-2 7B and WizardLM-2 70B are all the top-performing models among the other leading baselines at 7B to 70B model scales.

2. Human Preferences Evaluation (Figure 2)
Through this human preferences evaluation, WizardLM-2's capabilities are very close to the cutting-edge proprietary models such as GPT-4-1106-preview, and significantly ahead of all the other open source models.

πŸ”Method Overview:
As the natural world's human-generated data becomes increasingly exhausted through LLM training, we believe that: the data carefully created by AI and the model step-by-step supervised by AI will be the sole path towards more powerful AI.

In the past one year, we built a fully AI powered synthetic training system. (As shown in the Figure 3).
Β·
reacted to chiphuyen's post with β€οΈπŸš€ 8 months ago
posted an update 9 months ago
view post
Post
Welcome Bunny! A family of lightweight but powerful multimodal models from BAAI

With detailed work on dataset curation, the Bunny-3B model built upon SigLIP and Phi-2 achieves performance on par with 13B models.

Model: BAAI/bunny-phi-2-siglip-lora

  • 2 replies
Β·
posted an update 9 months ago
view post
Post
There appears to be a huge misunderstanding regarding the licensing requirements for open sourced Chinese speaking speaking LLMs on
@huggingface


I initially shared this misconception too, but after conducting some research, I came up with the list below.

Veryimpressive!

replied to victor's post 9 months ago
replied to victor's post 9 months ago
reacted to JustinLin610's post with πŸ‘πŸ€― 9 months ago
view post
Post
Yesterday we just released Qwen1.5. Maybe someday I can tell more about the experience. But this is is at least a good release even if it is not yet SOTA. There is not so many SOTA by the way. This time, we actually fixed a lot of problems.

1. Context lengths are finally unified for all sizes. Previously, a lot of users kept telling us that 14B only supports 2K (Yeah even dynamic NTK does not work that well and it can only be extended to around 4-5K. Let alone those know nothing about how to use dynamic NTK).

2. If you carefully use our base language models, you will find that they understand special tokens of ChatML, which means that you can directly use LoRA to train on data with ChatML format. Why you can't do this before? This is because if the base language model does not understand the special tokens, you need to make them trained, which means that you should turn on the training of embedding. This is disgusting and it often leads to problems when you use ZeRO3.

3. We did strengthen our base language models except for 72. You should find better base language models, especially for 7 and 14. Why not 72? Nah, hard to say, but will make it better.

4. About the multilingual capabilities. Yes we finally build up our multilingual evaluation system and find out that our new base language models have nice performance in multilingual evaluation for base language models. This tells us that we should pay more attention to the post-training with multilingual data. And we did that too. This is why this time we tell you something about multilingual performance. It is for sure much much better than our models before this release.

5. Chat models are the most promising stuff. Before this release, we gave you the SFT models. But this time, we had very nice SFT+DPO models. Yeah not only annotators like them but also users like them. I am sure you developers will feel that way too.

Β·