[NEW] WebSearch 2.0 - feedback welcome!

#254
by victor HF staff - opened
Hugging Chat org
β€’
edited Mar 15

image.png

March 15th update: 🌐Internet Access for Assistants

Hi HuggingChat community!

We've just released a big update to the WebSearch feature, it now uses Retrieval-augmented generation (RAG) to extract relevant information from multiple web pages! From our tests it's much more powerful than before πŸš€.

We would love to get your feedback on it! Also, if you want to check the details or even contribute, take a look at the PR on Github.

See you soon!

victor changed discussion title from [NEW] - Updated WebSearch - feedback welcome! to [DRAFT][NEW] - Updated WebSearch - feedback welcome!
victor changed discussion title from [DRAFT][NEW] - Updated WebSearch - feedback welcome! to [NEW] - Updated WebSearch - feedback welcome!
victor pinned discussion
victor changed discussion title from [NEW] - Updated WebSearch - feedback welcome! to [NEW] WebSearch 2.0 - feedback welcome!

I think I've asked this elsewhere but I'm not sure what the answer was. Do you use a paid API to query Google search? I'm asking because I can imagine that if it's through something hacky like selenium, Google won't like it (and they'll miss ad revenue). So, in short what does the technical pipeline look like for this from user query to generated output?

Hugging Chat org

I think I've asked this elsewhere but I'm not sure what the answer was. Do you use a paid API to query Google search? I'm asking because I can imagine that if it's through something hacky like selenium, Google won't like it (and they'll miss ad revenue). So, in short what does the technical pipeline look like for this from user query to generated output?

You can check out the feature here!

@BramVanroy , they use the SerpAPI which, as far as I understand, is paid and legal. see the source code here: https://github.com/huggingface/chat-ui/blob/main/src/lib/server/websearch/searchWeb.ts

Question regarding sources/citations: Do I understand correctly that you currently display all URLs as sources, which the retriever retrieved and gave the LLM as context (regardless of whether the LLM actually used/refers to the source)? In Bing chat, they somehow managed to attribute sources to specific parts/sentences of the generated output and not only the generated output as a whole. I've always been wondering how they made that work (direct citations / attributing sources to specific sub-parts of a generated output). does anyone know?

Hugging Chat org
β€’
edited Sep 14, 2023

Question regarding sources/citations: Do I understand correctly that you currently display all URLs as sources, which the retriever retrieved and gave the LLM as context

Yes

In Bing chat, they somehow managed to attribute sources to specific parts/sentences.

Interesting. I guess one simple way would be: for every generated sentence calculate its similarity against the sources and decide the highest scoring source as the source of that sentence. Not sure if that's what they are doing

Any chances for a search API that isn't Google's? IE: DuckDuckGo or Bing :)
Awesome work though! Really cool :D

In Bing chat, they somehow managed to attribute sources to specific parts/sentences.

Interesting. I guess one simple way would be: for every generated sentence calculate its similarity against the sources and decide the highest scoring source as the source of that sentence. Not sure if that's what they are doing

@mishig , yeah true. My first intuition was that I would be hesitant to trust embeddings from a bi-encoder to be reliable enough for this. But if you took a cross encoder, that should actually work quite well. especially if you set a high enough threshold to avoid false positives (then it could even work with a bi-encoder sentence transformer). could be a nice feature to have direct citations :)

https://hf.co/chat/assistant/66375dcb24d425b77de8fc7e

Hi, it works on my assistant model. I just write "extract the html code"

Screenshot_17.jpg

See everyone build their own websearch tool and most likely this is free like duckduck go search but there is a catch if we put a prompt like "search for a specific product in various websites and compare their prices" in this case the large model have to use the search function more then one time and the llm have to collect the prices of product from various e-commerce websites and then llm have to study them and then they have to compare them but it is not possible in current time no ai search agent can do this as per I know

Remember this is not a agentic work it is more focusing on improving the function calling ability of a llm model this they can call search function to study not to just providing the answer

@victor @nsarrazin search is not working it says "out of credits", can you please help us out

Hugging Chat org

It's back up sorry about it @acharyaaditya26

Any chances for a search API that isn't Google's? IE: DuckDuckGo or Bing :)

@DarwinAnim8or there has been updates since the last comment here. You can see this PR on how you.com is being added as a search engine. And do same for DuckDuckGo or Bing :)

Is there a way to do this in https://huggingface.co/chat/ ? i would prefer to use searxng or DuckDuckGo as the search engine instead of google.

Recently web search start activating even then there is default option on. Is that a bug?

Hugging Chat org

Recently web search start activating even then there is default option on. Is that a bug?

Maybe you are using Tools models where the model choose itself to search the web or not?

Recently web search start activating even then there is default option on. Is that a bug?

Maybe you are using Tools models where the model choose itself to search the web or not?

Im using assistants (models either cohere or llama) with default option on in assistants settings (Assistant will not use internet to do information retrieval and will respond faster. Recommended for most Assistants.) but sometimes bot start responding slower and it uses web search. Does tools have to do something with that?

nsarrazin unpinned discussion

Sign up or log in to comment