[FEEDBACK] Collections

#12
by victor HF staff - opened
Hugging Face org
β€’
edited Sep 9, 2023

This discussion is dedicated to feedback about the new collection feature.
Docs are available here. Your feedback is really valuable so don't hesitate to share anything about it πŸ€—.

Collection-announcement (1).gif

osanseviero pinned discussion

It would be really nice to be able to add freeform text blocks to collections as well, so that we can write contextual information that's not a note on a specific object.

Hugging Face org

@stellaathena so at the collection level (at the top, right below the title), not on a specific item?

victor changed discussion title from [DRAFT] [FEEDBACK] Collections to [FEEDBACK] Collections

It would be great to add some features to improve collaboration & usability:

  • Suggest additions to someone's collection (possibly via PR)
  • Preserve main page filters (datasets/languages/licenses) on the collections view to enable further filtering (e.g. select only models of a specific type in a collection of language specific models)
Hugging Face org

I think it would be nice

  • search for a collection
  • see top trending collections
  • follow a collection and get notified when it gets updated

It would be great to move collections too!

It would be great to have this information on the huggingface_hub library !
I'd love the ability to list all collections and utilize them as filters in DatasetFilter and ModelFilter.

@stellaathena so at the collection level (at the top, right below the title), not on a specific item?

Yes, but also between blocks of items. I'm thinking of something like the following:

Pythia

Pythia is a language model suite that blah blah blah.
[paper block]

The following models form the the Pythia suite. Models with the suffix -deduped have been trained on a deduplicated copy of the Pile for 1.5 epochs blah blah blah
[Model block 1]
[Model block 2]
[...]

Looking at this again, I wonder if my main piece of feedback is that often times I want to comment on a group of objects rather than a single object?

Hugging Face org

had missed your reply @stellaathena , yes, we're thinking about how to do this

Hugging Face org
β€’
edited Oct 10, 2023

@LihiShalmon the new release of huggingface_hub, v0.18.0, introduces programmatic access to Collections: see release notes here: https://github.com/huggingface/huggingface_hub/releases/tag/v0.18.0

Excerpt:

Collection API is now fully supported in huggingface_hub!

A collection is a group of related items on the Hub (models, datasets, Spaces, papers) that are organized together on the same page. Collections are useful for creating your own portfolio, bookmarking content in categories, or presenting a curated list of items you want to share. Check out this guide to understand in more detail what collections are and this guide to learn how to build them programmatically.

Feedback is welcome. cc @Wauplin

Hugging Face org
β€’
edited Oct 10, 2023

@LihiShalmon Looking forward to see new Collections created via huggingface_hub! Thanks @julien-c for the visibility. The link to the guide is broken and should be this one: https://huggingface.co/docs/huggingface_hub/guides/collections (I just fixed it in the release notes as well). (EDIT: forgot I could update your comment myself 😊)

Hugging Face org

I also created an example collection of small models on the hub (<50MB) using the new Collections feature in the huggingface_hub library.
Two notebooks that show this process:

Hugging Face org

✨Upcoming feature in Collections ✨

Image galleries!

Hugging Face org

It'd be nice if it was possible to write longer notes. Apparently, we can write only up to 500 characters in notes.

https://huggingface.co/collections/hysts/diffusion-model-spaces-64f9a061a92957174668a105

Hugging Face org
β€’
edited Oct 12, 2023

It would be nice if videos were supported for image galleries. It would be useful to showcase sample results of text-to-video pipelines etc.
For example, I'd like to add samples like this in a image gallery of text-to-video Space:

Also, It can be used to post a short video to show how to use the Space.

Recently, there are models that generate 3D models, so maybe supporting .glb etc. would also be useful. 3D models can be shown in dataset, so maybe it's possible to show in image gallery as well?

Hugging Face org

There seems to be some issue with image reordering: I only changed the position of one image, but somehow the position of other two images was swapped as well.

Hugging Face org

It'd be nice if it was possible to write longer notes. Apparently, we can write only up to 500 characters in notes.

Yes it was to keep collections repos centric but we may introduce a new text node soon cc @julien-c

It would be nice if videos were supported for image galleries. It would be useful to showcase sample results of text-to-video pipelines etc.

Yes let's see if this gets a bit of usage and we'll expand to new medias.

There seems to be some issue with image reordering: I only changed the position of one image, but somehow the position of other two images was swapped as well.

Thanks for reporting @hysts

I'll use Collections to document our upcoming hackathon in pseudolab, you're truly awesome @Wauplin and HF team!

@victor Another feature for the text node would be markdown for better readability. It'd also be nice if collections can be tagged with team names within an organization, for example 'Hugging Face KREW' within 'PseudoLab'.

image.png

Is it possible to change the URL of collections to be something more readable? Currently it seems to be randomly generated strings after the username.

Hugging Face org

cc @Sylvestre ^

Hello, would it be possible to add options for sorting collections based on the paper publication date? Right now, the only way to change the order is manual, as I understand it, and if you just keep appending, new things are at the bottom of the collection. I think I would like new things to be at the top, but I would be happy to just re-sort by paper publication date.

Besides this comment, I just mass-converted my 1-year-old collection of biomedical nlp papers from https://sigmoid.social/@ArxivHealthcareNLP into https://huggingface.co/collections/FremyCompany/biomedical-nlp-papers-6557998218320e0e3fdb5e57 and the only piece of feedback I would add is that if a paper is currently not in HuggingFace, it is not possible to add it via the Collections API (but, in my opinion, this is actually a good thing).

Hugging Face org

Hello, would it be possible to add options for sorting collections based on the paper publication date?

Hi @FremyCompany , while this is not yet possible via the UI (cc @victor ), you can use the API to programmatically reorder items in your collections. This requires a little script but might be easier to maintain for you for now. Here is a guide on how to do that in Python using the huggingface_hub library: https://huggingface.co/docs/huggingface_hub/guides/collections#reorder-items.

Besides this comment, I just mass-converted my 1-year-old collection of biomedical nlp papers from https://sigmoid.social/@ArxivHealthcareNLP into https://huggingface.co/collections/FremyCompany/biomedical-nlp-papers-6557998218320e0e3fdb5e57

Nice one! πŸŽ‰

Lol, I actually didn't know about the Python API πŸ˜† I did my usual of converting the REST calls from my browser to cURL commands ^_^ I'll definitely take a look at doing this in Python when I've some time.

Hello, would it be possible to add options for sorting collections based on the paper publication date?

For reference, this is the script I ended up using:

from huggingface_hub import get_collection, update_collection_item

collection_slug = "gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9"
collection = get_collection(collection_slug)

for pos, item in enumerate(sorted(collection.items, key=lambda x: x.item_id, reverse=True)):
    print(f"Assigning position {pos} to {item.item_id}")
    update_collection_item(
        collection_slug=collection_slug,
        item_object_id=item.item_object_id,
        position=pos,
    )

Another Q: on the UI I can access the collection history, but afaik it's not available via the Python API. Is there a way to get such information? It would be useful e.g. to sort the collection by added time of its items.

Hugging Face org

Pinging @Wauplin for @gsarti question :)

Hugging Face org
β€’
edited Feb 5

we discussed implementing some Auto-sort feature in the UI too. IMO could be cool

EDIT: in the meantime doing it programmatically works well

EDIT: in the meantime doing it programmatically works well

But it's slow. On my collection, the sorting step takes 11.2s and it only has like 70ish items. The code I use is very similar to the one @gstarti posted:

sorted_collection = sorted(collection.items, key=lambda i: i.item_id, reverse=True)

# 3. Update the collection items one by one, to give them the correct position in the collection
for index, item in enumerate(sorted_collection):
    update_collection_item(
        collection.slug,
        item_object_id=item.item_object_id,
        position=index
    )

I realize I could make this better by deducing which items need a new position rather than updating every position, but this is tedious.

For now, that I did, is to only run this step when I have added new papers in the previous steps (and assume everything is already sorted if I didn't).

Hugging Face org

Another Q: on the UI I can access the collection history, but afaik it's not available via the Python API. Is there a way to get such information?

There is a GET /api/collections/<collection-id>/history endpoint but it is not integrated in the Python library at the moment. For example: https://huggingface.co/api/collections/gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9/history. I never pogrammatically used this API myself but feel free to use it :)

Another point: a big limiting factor for me is the inability to organize contents in sub-collections (kinda like subfolders, they probably do not need to have all collections functionalities).

For example, organizing many model checkpoints in a collection might be more reasonably done if they can be arranged in folders based on model size.

Another usage e.g. for my collection linked above is to have a folder per daily pick in which I could include checkpoints/datasets/spaces together with papers, without making the collection as a whole a mess.

Thoughts?

That sounds like a good idea! Maybe subfolders could just be regular collections, such that the same API works on them as on other collections, with the exception that they would not be indexed in public pages, and just linked from a main collection?

Hi! I have a question! I currently have 3 collections and all of them are public and I can see them from my account just fine. But other users can only see one of my collections on my account. The collection is accessible from the link. What could be the problem?
True
False

Hugging Face org

@DmitryRyumin our spam detection system wrongly flagged your collections, it should be fixed now, sorry for the inconvenience!

@DmitryRyumin our spam detection system wrongly flagged your collections, it should be fixed now, sorry for the inconvenience!

Oh, did I do something wrong that caused the system to flag my collections as spam?
P.S.: Everything is great now, thanks πŸ€—

@pierric I think I know what the problem is: if I add a paper from arxiv that is not yet in Daily Papers, and then add that paper to the collection, the collection is automatically blocked and disappears from public access.

Does this mean I can't do open collections for the scientific community with papers and descriptions from arxiv that didn't make it into the Daily Papers collection?

@DmitryRyumin Are you adding all these papers without even looking at them? If so, please don't do that, the whole point of HuggingFace papers is that the papers are curated by the community. If you add them because you have read them and think they're valuable, that would be a good use case, I would hope that this is supported.

@DmitryRyumin Are you adding all these papers without even looking at them? If so, please don't do that, the whole point of HuggingFace papers is that the papers are curated by the community. If you add them because you have read them and think they're valuable, that would be a good use case, I would hope that this is supported.

@FremyCompany Of course only the ones that I have read or for example once reviewed and so on, that also have open repositories or models and so on. Additionally, I am referring to articles presented at top A* level conferences or Q1 level journals.

That should absolutely be allowed then.

Hugging Face org

@DmitryRyumin I cannot enter into the details of our spam detection system since it would give information that could help bypassing it, but we had a bug related to collections that caused any change to a collection previously flagged as potential spam to be flagged again - this is now fixed.

You are free to add any paper to collections, even ones not featured in the Daily Papers, and to add as many of them as you want in a collection - this won't make you a spammer, and this is not what triggered the system.

Also know that all the automatic spam detection from our system is then reviewed by humans, but it might not be immediate since it depends on their working hours :)

Hi @pierric ,

Thanks for the clarification! I appreciate the assurance that I can confidently add high quality papers to collections without worrying about being flagged as spam. It's reassuring to know that I can share the best scientific content from top conferences and journals with community members.

Hugging Face org

Would it make sense to support markdown in the notes section of an individual collection item?

I have this collection, for example: https://huggingface.co/collections/sayakpaul/optimizing-diffusion-models-659f481b2bb9a1311e6f845d. I would like to make the notes for each item appear little better.

@victor Is there any way to copy all the datasets from one collection to another?

Hugging Face org

Is there any update regarding the possibility of creating subfolders in collections, and the possibility to automatically add elements on top rather than on the bottom of the collection? Thanks in advance! πŸ€—

Sign up or log in to comment