Post
636
Websites slam doors on AI data harvesting 🚪🔒
New study "Consent in Crisis: The Rapid Decline of the AI Data Commons" reveals a rapid decline in open web access.
Key findings from 14,000 web domains audit:
- +5% of three common data sets (C4, RefinedWeb and Dolma) now fully restricted, +25% of the highest-quality sources now fully restricted
- 45% of C4 restricted by Terms of Service
Noteworthy trends:
🚫🔄 OpenAI banned 2x more than any other company
📰🔐 News sites leading restrictions: 45% of tokens off-limits
Two quotes in the NYT piece to ponder:
“Unsurprisingly, we’re seeing blowback from data creators after the text, images and videos they’ve shared online are used to develop commercial systems that sometimes directly threaten their livelihoods.” — @yjernite
“Major tech companies already have all of the data. Changing the license on the data doesn’t retroactively revoke that permission, and the primary impact is on later-arriving actors, who are typically either smaller start-ups or researchers.” — @stellaathena
👉 Dive into the research: https://www.dataprovenance.org/consent-in-crisis-paper
👉 Read the NYT story: https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html
#AIEthics #DataPrivacy
New study "Consent in Crisis: The Rapid Decline of the AI Data Commons" reveals a rapid decline in open web access.
Key findings from 14,000 web domains audit:
- +5% of three common data sets (C4, RefinedWeb and Dolma) now fully restricted, +25% of the highest-quality sources now fully restricted
- 45% of C4 restricted by Terms of Service
Noteworthy trends:
🚫🔄 OpenAI banned 2x more than any other company
📰🔐 News sites leading restrictions: 45% of tokens off-limits
Two quotes in the NYT piece to ponder:
“Unsurprisingly, we’re seeing blowback from data creators after the text, images and videos they’ve shared online are used to develop commercial systems that sometimes directly threaten their livelihoods.” — @yjernite
“Major tech companies already have all of the data. Changing the license on the data doesn’t retroactively revoke that permission, and the primary impact is on later-arriving actors, who are typically either smaller start-ups or researchers.” — @stellaathena
👉 Dive into the research: https://www.dataprovenance.org/consent-in-crisis-paper
👉 Read the NYT story: https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html
#AIEthics #DataPrivacy