Daft 0.0.19 Release Notes#

The Daft 0.0.19 release packs a bunch of new features and bugfixes from our new contributors! The highlights are:

  • New list aggregation for aggregating items into a list (#346)

New Features#

New List aggregation#

Users can now groupby and then aggregate each group into a Python list!

See: #346

Enhancements#

  • Add visualizations to Dataframe repr #359

  • Allow subscripting of GroupedDataFrame to access its columns #285

  • Support wider ray version range in requirements #234

  • Rename from_parquet and from_csv to read_*, deprecate the former #218

  • Use a simple disk-based cache for remote file scans #329

  • Fix daft install during cluster warm-up #341

  • Cache files locally during setup phase in benchmarking #330

  • Add pipelined script for generating parquet files in s3 #328

  • Fix broken links in documentation using relative links #327

  • Add new benchmarking fields and remove –output_csv_headers #326

  • Fix Broken Link Checker #323

  • Rename “unstructured” data to “complex” data #321

Bug Fixes#

  • .show on an empty dataframe should return a friendlier output #307

  • Fix DataFrame.show() display of null integers #241

  • Fix DataFrameDisplay to take in a vPartition instead of pandas dataframe #334

  • Drop use of backspace to render explain correctly in notebook #362

  • add drop projections pass to drop no-op projections #349

  • Add support for merging NullType Arrowblocks with regular ArrowTypes #343

  • Support empty dataframes, with and without schema info. #342

Build Changes#

Daft now is tested against a matrix of Ray versions:

  • Pin Daft requirements to Ray >= 1.10.0 #337

  • Add CI nightly job for checking compatibility with a list of Ray versions #336

Added nightly builds!

  • Publish nightly releases #354

Associated PRs:

  • Daft nightly fix – add correct git clone for tag pickup #357

  • Fixes for anaconda publishing and enable CRON publishing #356

  • workflow for anaconda upload to daft / daft-nightly org #355

Deprecations#

  • from_parquet and from_csv are deprecated in favor of read_parquet and read_csv (#218)