Daft 0.0.13 Release Notes

Daft 0.0.13 Release Notes#

The Daft 0.0.13 release fixes some issues with typing and adds new functionality for loading from files on disk. The highlights are:

Improved unified API + User documentation published on www.getdaft.io
Adds support for multi-column DataFrame.sort
Adds DataFrame.explode which explodes a Python column of iterable objects into multiple rows
Adds DataFrame.from_files which loads a DataFrame of filepaths and file metadata

New Features#

Polars UDFs#

@polars_udf added which works similarly to @udf, but provides function inputs as a Polars Series instead of Numpy array. Polars Series is a more efficient format to cast our underlying Arrow data representation and handles NaN vs Null semantics correctly.

See: #204

DataFrame Explodes#

DataFrame.explode explodes a Python column of iterable objects into multiple rows.

See: #225

DataFrame creation from files#

DataFrame.from_files loads a DataFrame of filepaths and file metadata.

See: #220

Multi-column Sorts#

DataFrame.sort can now run on multiple columns.

See: Multi-column DataFrame sorting #212

Enhancements#

Refactor ExpressionTypes in daft.execution.operators to daft.types module #231
Allow .with_column to override an existing column name #226
Fixes DataFrame.write_* to be blocking calls #215
Refactor HTML repr code for prettier colab display #205

Bug Fixes#

Arrow Negative Slice Bug Fix #229
Fix ExpressionExecutor eval’s dispatching of OperatorEvaluator #227
Fix bug in search sorted when table is empty and has no chunks #224
Fix random spaces appearing in long strings in tables #210
Allow RayRunner to proceed when Ray context has already been initialized #203
Refactor UDFs to create properly typed Blocks #232

Build Changes#

Downgrade minimum Arrow version to 6.0 #222
MacOS build bug when multiprocessing method is set to spawn #207

Closed Issues#

Downgrade pyarrow for compatibility with Ray Data #221
Read files from storage with DataFrame.from_files #214
DataFrame.explode for splatting sequences of data into rows #208
Use Polars as the user-interface for UDFs #200
Sphinx Documentation on GitHub Pages #186
Selection and configuration of backend (PyRunner vs RayRunner) #178