atayloraerospace

Taylor658

AI & ML interests

Computer Vision ๐Ÿ”ญ | Multimodal Gen AI ๐Ÿค–| AI in Healthcare ๐Ÿฉบ | AI in Aerospace ๐Ÿš€

Organizations

Taylor658's activity

posted an update 18 days ago
view post
Post
2118
The Mystery Bot ๐Ÿ•ต๏ธโ€โ™‚๏ธ saga I posted about from earlier this week has been solved...๐Ÿค—

Cohere for AI has just announced its open source Aya Expanse multilingual model. The Initial release supports 23 languages with more on the way soon.๐ŸŒŒ ๐ŸŒ

You can also try Aya Expanse via SMS on your mobile phone using the global WhatsApp number or one of the initial set of country specific numbers listed below.โฌ‡๏ธ

๐ŸŒWhatsApp - +14313028498
Germany - (+49) 1771786365
USA โ€“ +18332746219
United Kingdom โ€” (+44) 7418373332
Canada โ€“ (+1) 2044107115
Netherlands โ€“ (+31) 97006520757
Brazil โ€” (+55) 11950110169
Portugal โ€“ (+351) 923249773
Italy โ€“ (+39) 3399950813
Poland - (+48) 459050281
  • 1 reply
ยท
posted an update 20 days ago
view post
Post
2414
Spent the weekend testing out some prompts with ๐Ÿ•ต๏ธโ€โ™‚๏ธMystery Bot๐Ÿ•ต๏ธโ€โ™‚๏ธ on my mobile... exciting things are coming soon for the following languages:

๐ŸŒArabic, Chinese, Czech, Dutch, English French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese!๐ŸŒ
reacted to fdaudens's post with ๐Ÿš€ about 2 months ago
view post
Post
836
๐Ÿš€ Your AI toolkit just got a major upgrade! I updated the Journalists on Hugging Face community's collection with tools for investigative work, content creation, and data analysis.

Sharing these new additions with the links in case itโ€™s helpful:
- @wendys-llc 's excellent 6-part video series on AI for investigative journalism https://www.youtube.com/playlist?list=PLewNEVDy7gq1_GPUaL0OQ31QsiHP5ncAQ
- @jeremycaplan 's curated AI Spaces on HF https://wondertools.substack.com/p/huggingface
- @Xenova 's Whisper Timestamped (with diarization!) for private, on-device transcription Xenova/whisper-speaker-diarization & Xenova/whisper-word-level-timestamps
- Flux models for image gen & LoRAs autotrain-projects/train-flux-lora-ease
- FineGrain's object cutter finegrain/finegrain-object-cutter and object eraser (this one's cool) finegrain/finegrain-object-eraser
- FineVideo: massive open-source annotated dataset + explorer HuggingFaceFV/FineVideo-Explorer
- Qwen2 chat demos, including 2.5 & multimodal versions (crushing it on handwriting recognition) Qwen/Qwen2.5 & Qwen/Qwen2-VL
- GOT-OCR integration stepfun-ai/GOT_official_online_demo
- HTML to Markdown converter maxiw/HTML-to-Markdown
- Text-to-SQL query tool by @davidberenstein1957 for HF datasets davidberenstein1957/text-to-sql-hub-datasets

There's a lot of potential here for journalism and beyond. Give these a try and let me know what you build!

You can also add your favorite ones if you're part of the community!

Check it out: https://huggingface.co/JournalistsonHF

#AIforJournalism #HuggingFace #OpenSourceAI
reacted to Wauplin's post with ๐Ÿ”ฅ about 2 months ago
view post
Post
4477
๐Ÿš€ Exciting News! ๐Ÿš€

We've just released ๐š‘๐šž๐š๐š๐š’๐š—๐š๐š๐šŠ๐šŒ๐šŽ_๐š‘๐šž๐š‹ v0.25.0 and it's packed with powerful new features and improvements!

โœจ ๐—ง๐—ผ๐—ฝ ๐—›๐—ถ๐—ด๐—ต๐—น๐—ถ๐—ด๐—ต๐˜๐˜€:

โ€ข ๐Ÿ“ ๐—จ๐—ฝ๐—น๐—ผ๐—ฎ๐—ฑ ๐—น๐—ฎ๐—ฟ๐—ด๐—ฒ ๐—ณ๐—ผ๐—น๐—ฑ๐—ฒ๐—ฟ๐˜€ with ease using huggingface-cli upload-large-folder. Designed for your massive models and datasets. Much recommended if you struggle to upload your Llama 70B fine-tuned model ๐Ÿคก
โ€ข ๐Ÿ”Ž ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐—”๐—ฃ๐—œ: new search filters (gated status, inference status) and fetch trending score.
โ€ข โšก๐—œ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ๐—–๐—น๐—ถ๐—ฒ๐—ป๐˜: major improvements simplifying chat completions and handling async tasks better.

Weโ€™ve also introduced tons of bug fixes and quality-of-life improvements - thanks to the awesome contributions from our community! ๐Ÿ’ช

๐Ÿ’ก Check out the release notes: Wauplin/huggingface_hub#8

Want to try it out? Install the release with:

pip install huggingface_hub==0.25.0

  • 1 reply
ยท
posted an update about 2 months ago
reacted to aaditya's post with ๐Ÿš€ 2 months ago
view post
Post
2999
Last Week in Medical AI: Top Research Papers/Models
๐Ÿ… (August 25 - August 31, 2024)

- MultiMed: Multimodal Medical Benchmark
- A Foundation model for generating chest X-ray images
- MEDSAGE: Medical Dialogue Summarization
- Knowledge Graphs for Radiology Report Generation
- Exploring Multi-modal LLMs for Chest X-ray
- Improving Clinical Note Generation
...

Check the full thread : https://x.com/OpenlifesciAI/status/1829984701324448051
  • 1 reply
ยท
reacted to vilarin's post with โค๏ธ 2 months ago
view post
Post
5965
๐Ÿคฉ Amazing day. AWPortrait-FL finally here!
๐Ÿฆ– AWPortrait-FL is finetuned on FLUX.1-dev using the training set of AWPortrait-XL and nearly 2,000 fashion photography photos with extremely high aesthetic quality.

๐Ÿค—Model: Shakker-Labs/AWPortrait-FL

๐Ÿ™‡Demo: vilarin/flux-labs

ยท
posted an update 2 months ago
view post
Post
2344
๐Ÿ’กAndrew Ng recently gave a strong defense of Open Source AI models and the need to slow down legislative efforts in the US and the EU to restrict innovation in Open Source AI at Stanford GSB.

๐ŸŽฅSee video below
https://youtu.be/yzUdmwlh1sQ?si=bZc690p8iubolXm_
ยท
reacted to mmhamdy's post with ๐Ÿš€ 3 months ago
view post
Post
3627
๐Ÿš€ Introducing The Open Language Models List

This is a work-in-progress list of open language models with permissive licenses such as MIT, Apache 2.0, or other similar licenses.

The list is not limited to only autoregressive models or even only transformers models, and it includes many SSMs, and SSM-Transformers hybrids.

๐Ÿค— Contributions, corrections, and feedback are very welcome!

The Open Language Models List: https://github.com/mmhamdy/open-language-models
  • 2 replies
ยท
reacted to not-lain's post with ๐Ÿ”ฅ 3 months ago
reacted to m-ric's post with ๐Ÿš€ 4 months ago
view post
Post
2263
๐—”๐—ด๐—ฒ๐—ป๐˜๐—ถ๐—ฐ ๐——๐—ฎ๐˜๐—ฎ ๐—ฎ๐—ป๐—ฎ๐—น๐˜†๐˜€๐˜: ๐—ฑ๐—ฟ๐—ผ๐—ฝ ๐˜†๐—ผ๐˜‚๐—ฟ ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—ณ๐—ถ๐—น๐—ฒ, ๐—น๐—ฒ๐˜ ๐˜๐—ต๐—ฒ ๐—Ÿ๐—Ÿ๐—  ๐—ฑ๐—ผ ๐˜๐—ต๐—ฒ ๐—ฎ๐—ป๐—ฎ๐—น๐˜†๐˜€๐—ถ๐˜€ ๐Ÿ“Šโš™๏ธ

Need to make quick exploratory data analysis? โžก๏ธ Get help from an agent.

I was impressed by Llama-3.1's capacity to derive insights from data. Given a csv file, it makes quick work of exploratory data analysis and can derive interesting insights.

On the data from the Kaggle titanic challenge, that records which passengers survived the Titanic wreckage, it was able by itself to derive interesting trends like "passengers that paid higher fares were more likely to survive" or "survival rate was much higher for women than men".

The cookbook even lets the agent built its own submission to the challenge, and it ranks under 3,000 out of 17,000 submissions: ๐Ÿ‘ not bad at all!

Try it for yourself in this Space demo ๐Ÿ‘‰ m-ric/agent-data-analyst
  • 2 replies
ยท
reacted to lhoestq's post with ๐Ÿš€ 4 months ago
view post
Post
2934
โœจ Easy Synthetic Dataset File Generation using LLM DataGen ! Link: https://huggingface.co/spaces/lhoestq/LLM_DataGen

features + how it works:

โœ๏ธ Generate the dataset content you want just by entering a file name
๐Ÿ’ก Optionally specify the column names you need
๐Ÿ’จ The dataset is streamed and generated on-the-fly in JSON Lines format
โœ… Generation is constrained to always output valid JSON

How does this work ?
1/ Enter a file name
2/ The model generates column names for such a file. Using structured generation, it can generate 2 to 5 column names using lower case characters and underscores. I use a prompt that asks to generate column names for a realistic dataset and low temperature.
3/ The columns are used to update the Finite State Machine for the dataset content structured generation, so that it is used to generate JSON objects using those columns
4/ The model generates JSON objects using structured generation again, using the updated Finite State Machine. I use a prompt that asks for realistic data and a temperature of 1.

> Why update a Finite State Machine instead of re-creating one ?

Creating one can take up to 30sec, while updating one takes 0.1s (though it requires to manipulate a graph which is not easy to implement)

> Batched generation is faster, why not use it ?

Generate in batches is faster but tends to generate duplicates for this demo.
Further work can be to provide different prompts (one per sequence in the batch) to end up with a different distribution of sequences in each batch. Or implement a custom sampler that would forbid generating the same data in sequences of the same batch.

> How does structured generation work ?

I used the outlines library with transformers to to define a JSON schema that the generation has to follow. It uses a Finite State Machine with token_id as transitions.

Let me know what you think ! And feel free to duplicate/modify it to try other models/prompts or sampling methods :)
reacted to sequelbox's post with ๐Ÿ‘€ 4 months ago
view post
Post
1327
JUST RELEASED: Fireplace 2 for Llama 3.1 8b Instruct!

Fireplace 2 is an 'expansion pack' of structured outputs you can request during your chat, using special request tokens to let Llama know you're looking for specific types of responses:
Inline function calls
SQL queries
JSON objects
Data visualization with matplotlib

ValiantLabs/Llama3.1-8B-Fireplace2
  • 2 replies
ยท
reacted to as-cle-bert's post with ๐Ÿ‘ 4 months ago
view post
Post
2598
Hi HF community!๐Ÿค—
Hope y'all are as excited as me for the release of Llama 3.1! ๐Ÿฆ™
Following the release, I built a space exploiting HF Inference API, thanks to a recipe you can find in this awesome GitHub repo (https://github.com/huggingface/huggingface-llama-recipes/): you can now run Llama-3.1-405B customizing its system instructions and other parameters, for free! ๐Ÿ˜‡
Follow this link: as-cle-bert/Llama-3.1-405B-FP8 and let the fun begin!๐Ÿ•
  • 1 reply
ยท
reacted to mrm8488's post with โค๏ธ 4 months ago
view post
Post
4221
๐ŸšจExciting news for the Multilingual Synthetic Data Community!๐Ÿšจ

Iโ€™ve taken inspiration from the MAGPIE paper on Llama-3-8B-instruct and extended its capabilities. Hereโ€™s whatโ€™s new!

๐Ÿ—ž The MAGPIE paper showcased that if you use the instruction-tuned version (Llama-3-8B-instruct) to generate synthetic instructions and then fine-tune the base version (Llama-3-8B) on this dataset, you can improve even the it-tuned version

๐Ÿค” While reading a script by Sebastian Raschka, PhD, I wondered: Could these advancements be replicated in other languages? Specifically, could they benefit non-English datasets?

๐ŸŽ‰ And the answer is YES! At least for Spanish. I've successfully adapted the techniques for Spanish, proving the model's flexibility and multilingual capabilities.

๐Ÿ‘ฉโ€๐Ÿ’ป To make this accessible, I created a basic script (heavily inspired by the Sebastian Raschka one) that allows you to generate similar datasets using ollama models (initially phi and llama3) automatically and upload it to the Hugging Face Hub!
[Script](https://gist.github.com/mrm8488/4650a5e3cc45523798a527a3446eb312)


๐Ÿ” Explore the datasets ๐Ÿ“š generated using our new script!

- [Llama-3-8B](https://huggingface.co/datasets/mrm8488/dataset_llama3_5000_samples_es_4231_filtered)
- [Phi-3-medium](https://huggingface.co/datasets/mrm8488/dataset_phi3-medium_5000_samples_es_3906_filtered)
- [Phi-3-mini](https://huggingface.co/datasets/mrm8488/dataset_phi3_5000_samples_es_3282_filtered)


Note: These datasets have basic filtering. Apply additional quality filters before using them to fine-tune large language models.

Inspiration and base script:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/05_dataset-generation/llama3-ollama.ipynb
https://www.linkedin.com/feed/update/urn:li:activity:7210982019751661568/
ยท
reacted to not-lain's post with ๐Ÿค— 4 months ago
view post
Post
7553
I am now a huggingface fellow ๐Ÿฅณ
ยท
reacted to 1aurent's post with ๐Ÿ‘ 4 months ago
reacted to SivilTaram's post with ๐Ÿ‘ 4 months ago
view post
Post
2412
Still following your human intuition to mix corpora from different sources for pre-training ๐Ÿง ? Everyone says that data mixture has a big impact on model performance, but how - and why๐Ÿ•ต๏ธ? Did you know that web corpora are actually highly impactful for downstream tasks ๐Ÿ†?

Check out our preprint "RegMix: Data Mixture as Regression for Language Model Pre-training" ๐Ÿ“„

๐Ÿ”ฌ In this paper, we've proposed an automatic data mixture method RegMix that achieves a 6.3% improvement over human selection on the widely used HellaSwag benchmark - and it only needs a 2% extra training FLOPs! ๐Ÿ“ˆ

๐Ÿ“„ Paper: RegMix: Data Mixture as Regression for Language Model Pre-training (2407.01492)
๐Ÿ’ป Code: https://github.com/sail-sg/regmix
๐Ÿ“Š Collection: sail/regmix-data-mixture-as-regression-6682b6caab37b9442877f0ce
๐ŸŽฎ Demo: https://huggingface.co/spaces/sail/RegMix
reacted to whitphx's post with ๐Ÿ”ฅ 4 months ago
view post
Post
2266
Have you looked at the Gemini Nano local LLM?
Gradio-Lite, the in-browser ver. of Gradio, gives it a rich interface using only Python code, even for such an in-browser AI app!

Try out a chat app that runs completely inside your browser ๐Ÿ‘‡
https://www.gradio.app/playground?demo=Hello_World&code=IyBOT1RFOiBHZW1pbmkgTmFubyBtdXN0IGJlIGVuYWJsZWQgaW4geW91ciBicm93c2VyLiBTZWUgYXJ0aWNsZXMgbGlrZSBodHRwczovL3dyaXRpbmdtYXRlLmFpL2Jsb2cvYWNjZXNzLXRvLWdlbWluaS1uYW5vLWxvY2FsbHkKaW1wb3J0IGdyYWRpbyBhcyBncgpmcm9tIGpzIGltcG9ydCBzZWxmICAjIFB5b2RpZGUgcHJvdmlkZXMgYWNjZXNzIHRvIHRoZSBKUyBzY29wZSB2aWEgYGpzYCBtb2R1bGUuIFNlZSBodHRwczovL3B5b2RpZGUub3JnL2VuL3N0YWJsZS91c2FnZS9hcGkvcHl0aG9uLWFwaS5odG1sI3B5dGhvbi1hcGkKCiMgSW5pdGlhbGl6ZSBQcm9tcHQgQVBJCnNlc3Npb24gPSBOb25lCnRyeToKICAgIGNhbl9haV9jcmVhdGUgPSBhd2FpdCBzZWxmLmFpLmNhbkNyZWF0ZVRleHRTZXNzaW9uKCkKICAgIGlmIGNhbl9haV9jcmVhdGUgIT0gIm5vIjoKICAgICAgICBzZXNzaW9uID0gYXdhaXQgc2VsZi5haS5jcmVhdGVUZXh0U2Vzc2lvbigpCmV4Y2VwdDoKICAgIHBhc3MKCgpzZWxmLmFpX3RleHRfc2Vzc2lvbiA9IHNlc3Npb24KCgphc3luYyBkZWYgcHJvbXB0KG1lc3NhZ2UsIGhpc3RvcnkpOgogICAgc2Vzc2lvbiA9IHNlbGYuYWlfdGV4dF9zZXNzaW9uCiAgICBpZiBub3Qgc2Vzc2lvbjoKICAgICAgICByYWlzZSBFeGNlcHRpb24oIkdlbWluaSBOYW5vIGlzIG5vdCBhdmFpbGFibGUgaW4geW91ciBicm93c2VyLiIpCgogICAgc3RyZWFtID0gc2Vzc2lvbi5wcm9tcHRTdHJlYW1pbmcobWVzc2FnZSkKICAgIGFzeW5jIGZvciBjaHVuayBpbiBzdHJlYW06CiAgICAgICAgeWllbGQgY2h1bmsKCgpkZW1vID0gZ3IuQ2hhdEludGVyZmFjZShmbj1wcm9tcHQpCgpkZW1vLmxhdW5jaCgp

Note: Gemini Nano is currently only available on Chrome Canary, and you need to opt-in.
Follow the "Installation" section in https://huggingface.co/blog/Xenova/run-gemini-nano-in-your-browser .
reacted to Wauplin's post with ๐Ÿš€ 4 months ago
view post
Post
3340
๐Ÿš€ I'm excited to announce that huggingface_hub's InferenceClient now supports OpenAI's Python client syntax! For developers integrating AI into their codebases, this means you can switch to open-source models with just three lines of code. Here's a quick example of how easy it is.

Why use the InferenceClient?
๐Ÿ”„ Seamless transition: keep your existing code structure while leveraging LLMs hosted on the Hugging Face Hub.
๐Ÿค— Direct integration: easily launch a model to run inference using our Inference Endpoint service.
๐Ÿš€ Stay Updated: always be in sync with the latest Text-Generation-Inference (TGI) updates.

More details in https://huggingface.co/docs/huggingface_hub/main/en/guides/inference#openai-compatibility
ยท