Daft 0.0.13 Release Notes
Contents
Daft 0.0.13 Release Notes#
The Daft 0.0.13 release fixes some issues with typing and adds new functionality for loading from files on disk. The highlights are:
Improved unified API + User documentation published on www.getdaft.io
Adds support for multi-column
DataFrame.sort
Adds
DataFrame.explode
which explodes a Python column of iterable objects into multiple rowsAdds
DataFrame.from_files
which loads a DataFrame of filepaths and file metadata
New Features#
Polars UDFs#
@polars_udf
added which works similarly to @udf
, but provides function inputs as a Polars Series instead of Numpy array. Polars Series is a more efficient format to cast our underlying Arrow data representation and handles NaN vs Null semantics correctly.
See: #204
DataFrame Explodes#
DataFrame.explode
explodes a Python column of iterable objects into multiple rows.
See: #225
DataFrame creation from files#
DataFrame.from_files
loads a DataFrame of filepaths and file metadata.
See: #220
Multi-column Sorts#
DataFrame.sort
can now run on multiple columns.
Enhancements#
Bug Fixes#
Arrow Negative Slice Bug Fix #229
Fix ExpressionExecutor eval’s dispatching of OperatorEvaluator #227
Fix bug in search sorted when table is empty and has no chunks #224
Fix random spaces appearing in long strings in tables #210
Allow RayRunner to proceed when Ray context has already been initialized #203
Refactor UDFs to create properly typed Blocks #232
Build Changes#
Closed Issues#
Downgrade pyarrow for compatibility with Ray Data #221
Read files from storage with DataFrame.from_files #214
DataFrame.explode for splatting sequences of data into rows #208
Use Polars as the user-interface for UDFs #200
Sphinx Documentation on GitHub Pages #186
Selection and configuration of backend (PyRunner vs RayRunner) #178