Spaces:

lhoestq
/

Spark-on-HF-JupyterLab

Running

lhoestq HF staff commited on Aug 28

Commit

88ad24a

•

1 Parent(s): c103b37

Update data/hf_spark_utils.py

Files changed (1) hide show

data/hf_spark_utils.py CHANGED Viewed

@@ -177,7 +177,7 @@ def write_parquet(df: DataFrame, path: str, **kwargs) -> None:
     df.mapInArrow(
         partial(_preupload, path=path, schema=to_arrow_schema(df.schema), filesystem=filesystem, **kwargs),
         from_arrow_schema(pa.schema({"addition": pa.binary()})),
-    ).coalesce(1).mapInArrow(
         partial(_commit, path=path, filesystem=filesystem),
         from_arrow_schema(pa.schema({"path": pa.string()})),
     ).collect()

     df.mapInArrow(
         partial(_preupload, path=path, schema=to_arrow_schema(df.schema), filesystem=filesystem, **kwargs),
         from_arrow_schema(pa.schema({"addition": pa.binary()})),
+    ).repartition(1).mapInArrow(
         partial(_commit, path=path, filesystem=filesystem),
         from_arrow_schema(pa.schema({"path": pa.string()})),
     ).collect()