Daft 0.0.20 Release Notes#

The Daft 0.0.20 release adds many exciting new features and refactors. The highlights are:

  • Dynamic Scheduling

  • Rust Builds

New Features#

Dynamic Scheduling#

Great work by @xcharleslin to introduce a new Dynamic scheduler for running Daft dataframe queries. This version of the scheduler allows Daft to perform much more intelligent pull-based scheduling, dramatically speeding up operations such as .limit(5).show() which will now only require materializing the first partition, even in the presence of other global operations such as a .where.

  • Add tests for #368 #375

  • Implement dynamic scheduling. #387

  • Genericize PartitionT in DynamicSchedule components #408

The Dynamic scheduler is under active development and will be the default scheduler in the next release. You may use the dynamic scheduler locally with the environment flag DAFT_RUNNER=dynamic.

See: #204

Rust Builds#

Foundational work by @sammysidhu to move Daft’s internals to Rust. This release moves our current custom C++ kernels to Rust, already yielding some speedups in local benchmarking. More refactors to Rust are planned for the next release, providing increased type-safety and execution efficiency for Daft users.

  • Move to Rust from C++ for internal compute #385

  • Move rust kernels to module #389

  • Rust Wheel Publishing to PYPI and Anaconda Nightly #388

  • build rust library for building docs #391

  • Drop Poetry for doc building #390

  • [Rust wheel publish] Set manylinux version to auto for max compatibility #395

  • Add daft version from rust build #397

  • Fix build for Ray compatibility CI #402

  • Clean up Ray compatiblity CI jobs Ray and Protobuf installations #407

Enhancements#

  • Add memory_bytes to ResourceRequest and pass into Ray. #368

  • Move evaluation of UDF expression to ExpressionExecutor #380

  • Refactors to isolate code that uses PyArrow into vPartition #383

  • move RayPartitionSet import out of DataFrame top level and into ray_dataset_conversion #392

  • Fix retrieval of daft version in benchmarking suite #396

  • File read refactors #400

  • Remove caching from filesystems and benchmarking #411

Bug Fixes#

  • Pin numpy < 1.24 to avoid partition bug for pylist #399

Build Changes#

  • Add ray[default] to dev dependencies. #369

  • Add psutil as daft dep #376

  • Remove Pillow as an (optional) dependency #379

  • Clean up Polars imports to eventually be optional dependency #386

  • drop boto from aws extras; #394

  • Add CI step to test imports without optional deps #393

Documentation#

  • Install nightly version of daft in notebooks #365

  • Demo notebook for top N most red images #363