Daft 0.0.13 Release Notes#

The Daft 0.0.13 release fixes some issues with typing and adds new functionality for loading from files on disk. The highlights are:

  • Improved unified API + User documentation published on www.getdaft.io

  • Adds support for multi-column DataFrame.sort

  • Adds DataFrame.explode which explodes a Python column of iterable objects into multiple rows

  • Adds DataFrame.from_files which loads a DataFrame of filepaths and file metadata

New Features#

Polars UDFs#

@polars_udf added which works similarly to @udf, but provides function inputs as a Polars Series instead of Numpy array. Polars Series is a more efficient format to cast our underlying Arrow data representation and handles NaN vs Null semantics correctly.

See: #204

DataFrame Explodes#

DataFrame.explode explodes a Python column of iterable objects into multiple rows.

See: #225

DataFrame creation from files#

DataFrame.from_files loads a DataFrame of filepaths and file metadata.

See: #220

Multi-column Sorts#

DataFrame.sort can now run on multiple columns.

See: Multi-column DataFrame sorting #212

Enhancements#

  • Refactor ExpressionTypes in daft.execution.operators to daft.types module #231

  • Allow .with_column to override an existing column name #226

  • Fixes DataFrame.write_* to be blocking calls #215

  • Refactor HTML repr code for prettier colab display #205

Bug Fixes#

  • Arrow Negative Slice Bug Fix #229

  • Fix ExpressionExecutor eval’s dispatching of OperatorEvaluator #227

  • Fix bug in search sorted when table is empty and has no chunks #224

  • Fix random spaces appearing in long strings in tables #210

  • Allow RayRunner to proceed when Ray context has already been initialized #203

  • Refactor UDFs to create properly typed Blocks #232

Build Changes#

  • Downgrade minimum Arrow version to 6.0 #222

  • MacOS build bug when multiprocessing method is set to spawn #207

Closed Issues#

  • Downgrade pyarrow for compatibility with Ray Data #221

  • Read files from storage with DataFrame.from_files #214

  • DataFrame.explode for splatting sequences of data into rows #208

  • Use Polars as the user-interface for UDFs #200

  • Sphinx Documentation on GitHub Pages #186

  • Selection and configuration of backend (PyRunner vs RayRunner) #178