Scaling AI-based Data Processing with Hugging Face + Dask
β’
23
outlines
library with transformers
to to define a JSON schema that the generation has to follow. It uses a Finite State Machine with token_id
as transitions.Generate synthetic dataset files (JSON Lines)
Visualize HDF5 data of the dataset