santiviquez (Santiago Viquez)

posted an update about 1 month ago

Post

1487

Professors should ask students to write blog posts based on their final projects instead of having them do paper-like reports.

A single blog post, accessible to the entire internet, can have a greater career impact than dozens of reports that nobody will read.

posted an update about 1 month ago

Post

464

Some exciting news...

We are open-sourcing The Little Book of ML Metrics! 🎉

The book that will be on every data scientist's desk is open source.

What does that mean?

It means hundreds of people can review it, contribute to it, and help us improve it before it's finished!

This also means that everyone will have free access to the digital version!

Meanwhile, the high-quality printed edition will be available for purchase as it has been for a while.

Revenue from printed copies will help us support further development and maintenance of the book. Not to mention that reviewers and contributors will receive revenue sharing through their affiliate links. 🙌

Check out the book repo (make sure to leave a star 🌟):

https://github.com/NannyML/The-Little-Book-of-ML-Metrics

replied to their post 2 months ago

Exactly. But now, try to do the same, but this time by imagining/drawing an extra dimension perpendicular to the three spatial dimensions we see.

posted an update 2 months ago

Post

442

We can’t think in more than three dimensions.

But we have no problem doing math and writing computer programs in many dimensions. It just works.

I find that extremely crazy.

4 replies

·

posted an update 3 months ago

Post

426

ML people on a long flight

(See picture)

1 reply

·

replied to their post 3 months ago

Oh thanks! I really appreciate it 🫶

posted an update 3 months ago

Post

466

Some personal and professional news ✨

I'm writing a book on ML metrics.

Together with Wojtek Kuberski, we’re creating the missing piece of every ML university program and online course: a book solely dedicated to Machine Learning metrics!

The book will cover the following types of metrics:
• Regression
• Classification
• Clustering
• Ranking
• Vision
• Text
• GenAI
• Bias and Fairness

👉 check out the book: https://www.nannyml.com/metrics

2 replies

·

reacted to dvilasuero's post with ❤️🔥 5 months ago

Post

7941

Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!

28 replies

·

posted an update 6 months ago

Post

1044

They: you need ground truth to measure performance! 😠

NannyML: hold my beer...

posted an update 6 months ago

Post

949

Just published a new article 😊

https://huggingface.co/blog/santiviquez/data-drift-estimate-model-performance

reacted to lunarflu's post with 🔥 6 months ago

Post

2309

By popular demand, HF activity tracker v1.0 is here! 📊 let's build it together!🤗

Lots of things to improve, feel free to open PRs in the community tab!

good PR ideas:
- track more types of actions that include date+time
- bigger plot
- track discord activity too 🤯
- link github? ⚡

https://huggingface.co/spaces/huggingface-projects/LevelBot

2 replies

·

posted an update 6 months ago

Post

1567

I ran 580 experiments (yes, 580 🤯) to check if we can quantify data drift's impact on model performance using only drift metrics.

For these experiments, I built a technique that relies on drift signals to estimate model performance. I compared its results against the current SoTA performance estimation methods and checked which technique performs best.

The plot below summarizes the general results. It measures the quality of performance estimation versus the absolute performance change. (The lower, the better).

Full experiment: https://www.nannyml.com/blog/data-drift-estimate-model-performance

In it, I describe the setup, datasets, models, benchmarking methods, and the code used in the project.

posted an update 7 months ago

Post

1569

Looking for someone with +10 years of experience training Deep Kolmogorov-Arnold Networks.

Any suggestions?

posted an update 8 months ago

Post

2049

More open research updates 🧵

Performance estimation is currently the best way to quantify the impact of data drift on model performance. 💡

I've been benchmarking performance estimation methods (CBPE and M-CBPE) against data drift signals.

I'm using drift results as features for many regression algorithms, and then I'm taking those to estimate the model's performance. Finally, I'm measuring the Mean Absolute Error (MAE) between the regression models' predictions and actual performance.

So far, for all my experiments, performance estimation methods do better than drift signals. 👨‍🔬

Bear in mind that these are some early results, I'm running the flow on more datasets as we speak.

Hopefully, by next week, I will have more results to share 👀

posted an update 8 months ago

Post

1350

How would you benchmark performance estimation algorithms vs data drift signals?

I'm working on a benchmarking analysis, and I'm currently doing the following:

- Get univariate and multivariate drift signals and measure their correlation with realized performance.
- Use drift signals as features of a regression model to predict the model's performance.
- Use drift signals as features of a classification model to predict a performance drop.
- Compare all the above experiments with results from Performance Estimation algorithms.

Any other ideas?

replied to gsarti's post 8 months ago

Nicee, I'll take a look 👀

reacted to gsarti's post with ❤️ 8 months ago

Post

2212

Our 🐑 PECoRe 🐑 method to detect & attribute context usage in LM generations finally has an official Gradio demo! 🚀

gsarti/pecore

Highlights:
🔍 Context attribution for several decoder-only and encoder-decoder models using convenient presets
🔍 Uses only LM internals to faithfully reflect context usage, no additional detector involved
🔍 Highly parametrizable, export Python & Shell code snippets to run on your machine using 🐛 Inseq CLI (https://github.com/inseq-team/inseq)

Want to use PECoRe for your LMs? Feedback and comments are welcome! 🤗

3 replies

·

posted an update 9 months ago

Post

People in Paris 🇫🇷 🥐

Next week we'll be hosting our first Post-Deployment Data Science Meetup in Paris!

My boss will be talking about Quantifying the Impact of Data Drift on Model
Performance. 👀

The event is completely free, and there's only space for 50 people, so if you are interested, RSVP as soon as possible 🤗

🗓️ Thursday, March 14
🕠 5:30 PM - 8:30 PM GMT+1
🔗 RSVP: https://lu.ma/postdeploymentparis

posted an update 9 months ago

Post

Where I work, we are obsessed with what happens to a model's performance after it has been deployed. We call this post-deployment data science.

Let me tell you about a post-deployment data science algorithm that we recently developed to measure the impact of Concept Drift on a model's performance.

How can we detect Concept Drift? 🤔

All ML models are designed to do one thing: learning a probability distribution in the form of P(y|X). In other words, they try to learn how to model an outcome 'y' given the input variables 'X'. 🧠

This probability distribution, P(y|X), is also called Concept. Therefore, if the Concept changes, the model may become invalid.

❓But how do we know if there is a new Concept in our data?
❓Or, more important, how do we measure if the new Concept is affecting the model's performance?

💡 We came up with a clever solution where the main ingredients are a reference dataset, one where the model's performance is known, and a dataset with the latest data we would like to monitor.

👣 Step-by-Step solution:

1️⃣ We start by training an internal model on a chunk of the latest data. ➡️ This allows us to learn the new possible Concept presented in the data.

2️⃣ Next, we use the internal model to make predictions on the reference dataset.

3️⃣ We then estimate the model's performance on the reference dataset, assuming the model's predictions on the monitoring data as ground truth.

4️⃣ If the estimated performance of the internal model and the actual monitored model are very different, we then say that there has been a Concept Drift.

To quantify how this Concept impacts performance, we subtract the actual model's performance on reference from the estimated performance and report a delta of the performance metric. ➡️ This is what the plot below shows. The change of the F1-score due to Concept drift! 🚨

This process is repeated for every new chunk of data that we get. 🔁

Santiago Viquez

AI & ML interests

Recent Activity

Articles

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking at data drift results

Are your NLP models deteriorating post-deployment? Let’s use unlabelled data to find out

Organizations

santiviquez's activity