Announcing Finance Commons and the Bad Data Toolbox: Pioneering Open Data and Advanced Document Processing Jul 19 • 17
view article Article The case for specialized pre-training: ultra-fast foundation models for dedicated tasks By Pclanglais • Aug 4 • 24
Common Corpus Collection The largest public domain dataset for training LLMs. • 27 items • Updated Jul 17 • 111