Spaces:

huggingchat
/

chat-ui

Running

App Files Files Community

610

[NEW] WebSearch 2.0 - feedback welcome!

#254

by victor HF staff - opened Sep 13, 2023

Discussion

victor

Hugging Chat org Sep 13, 2023

•

edited Mar 15

March 15th update: 🌐Internet Access for Assistants

Hi HuggingChat community!

We've just released a big update to the WebSearch feature, it now uses Retrieval-augmented generation (RAG) to extract relevant information from multiple web pages! From our tests it's much more powerful than before 🚀.

We would love to get your feedback on it! Also, if you want to check the details or even contribute, take a look at the PR on Github.

See you soon!

victor changed discussion title from [NEW] - Updated WebSearch - feedback welcome! to [DRAFT][NEW] - Updated WebSearch - feedback welcome! Sep 13, 2023

victor changed discussion title from [DRAFT][NEW] - Updated WebSearch - feedback welcome! to [NEW] - Updated WebSearch - feedback welcome! Sep 13, 2023

victor pinned discussion Sep 13, 2023

victor changed discussion title from [NEW] - Updated WebSearch - feedback welcome! to [NEW] WebSearch 2.0 - feedback welcome! Sep 13, 2023

BramVanroy

Sep 13, 2023

I think I've asked this elsewhere but I'm not sure what the answer was. Do you use a paid API to query Google search? I'm asking because I can imagine that if it's through something hacky like selenium, Google won't like it (and they'll miss ad revenue). So, in short what does the technical pipeline look like for this from user query to generated output?

nsarrazin

Hugging Chat org Sep 13, 2023

I think I've asked this elsewhere but I'm not sure what the answer was. Do you use a paid API to query Google search? I'm asking because I can imagine that if it's through something hacky like selenium, Google won't like it (and they'll miss ad revenue). So, in short what does the technical pipeline look like for this from user query to generated output?

You can check out the feature here!

MoritzLaurer

Sep 13, 2023

@BramVanroy , they use the SerpAPI which, as far as I understand, is paid and legal. see the source code here: https://github.com/huggingface/chat-ui/blob/main/src/lib/server/websearch/searchWeb.ts

MoritzLaurer

Sep 13, 2023

•

edited Sep 13, 2023

Question regarding sources/citations: Do I understand correctly that you currently display all URLs as sources, which the retriever retrieved and gave the LLM as context (regardless of whether the LLM actually used/refers to the source)? In Bing chat, they somehow managed to attribute sources to specific parts/sentences of the generated output and not only the generated output as a whole. I've always been wondering how they made that work (direct citations / attributing sources to specific sub-parts of a generated output). does anyone know?

mishig

Hugging Chat org Sep 14, 2023

•

edited Sep 14, 2023

Question regarding sources/citations: Do I understand correctly that you currently display all URLs as sources, which the retriever retrieved and gave the LLM as context

Yes

In Bing chat, they somehow managed to attribute sources to specific parts/sentences.

Interesting. I guess one simple way would be: for every generated sentence calculate its similarity against the sources and decide the highest scoring source as the source of that sentence. Not sure if that's what they are doing

DarwinAnim8or

Sep 14, 2023

Any chances for a search API that isn't Google's? IE: DuckDuckGo or Bing :)
Awesome work though! Really cool :D

MoritzLaurer

Sep 14, 2023

In Bing chat, they somehow managed to attribute sources to specific parts/sentences.

Interesting. I guess one simple way would be: for every generated sentence calculate its similarity against the sources and decide the highest scoring source as the source of that sentence. Not sure if that's what they are doing

@mishig , yeah true. My first intuition was that I would be hesitant to trust embeddings from a bi-encoder to be reliable enough for this. But if you took a cross encoder, that should actually work quite well. especially if you set a high enough threshold to avoid false positives (then it could even work with a bi-encoder sentence transformer). could be a nice feature to have direct citations :)

132 hidden messages

Expand all

jdoexbox10

Oct 5

But how do they send a websearch request as closedai proomts chatgpt to use a websearch request. So is there a specific isntruction the ai uses to search the web, or even call tools for image generation etc. eg (defualt and 3rd party)?