Daft 0.0.21 Release Notes#

The Daft 0.0.21 release features many bugfixes and the introduction of the DynamicRayRunner from @xcharleslin.

New Features#

Dynamic Ray Runner#

Continuation of work by @xcharleslin to implement a dynamic runner for the Ray backend! This scheduler is under active development and will be the default scheduler for Ray in future Daft releases.

See: #412

Telemetry#

Telemetry was added to help Daft development. To disable, use the environment variable DAFT_ANALYTICS_ENABLED=0.

See: #413 and Telemetry docs for more details

Enhancements#

  • Rust Series Basic math #437

  • Simplify Analytics Client to single-threaded implementation #436

  • Dynamic scheduler mega cleanup: Refactors, renames, docstrings, comments #434

  • Fix DynamicRayRunner performance issues. Warmup more works in benchmarking script. #430

  • Move Instructions to dataclasses. #428

  • Add analytics module for telemetry #413

  • DynamicRayRunner optimizations: metadata fetch on demand; ray.wait for task management #418

  • Add profiling to benchmarks/tpch. Improve profiler to prevent VizTracer nesting. #423

  • Base Rust Skeleton #421

Bug Fixes#

  • Fix alias evaluation #448

  • Fix self-referencing if-else test #446

  • Add support and tests for columns created only from literals #445

  • Sorts on non-column expression #299

Build Changes#

  • Daft nightly publishing prepend 0s to dev version to fix pip lexical ordering #453

  • run notebooks check in separate working dir #452

  • Check doc notebooks daily at noon #451

  • add http to fsspec package #450

  • Remove Mamba from publishing CI #449

  • Cookbook test cleanups #438

  • Change envvar DAFT_PKG_BUILD_TYPE to RUST_DAFT_PKG_BUILD_TYPE #432

  • Add DAFT_PKG_BUILD_TYPE to python-publish CI job #424

Documentation#

  • Typos in telemetry docs #429

  • Refreshes our 10 minute tutorial #431

  • API Documentation refactors #427

Closed Issues#

  • Remove Sentry #435

  • add profiling flag to benchmark #422

  • DynamicRayRunner optimization: spread strategy for reduce #420

  • Make DynamicRayRunner strictly faster than RayRunner #419

  • Benchmark DynamicSchedule. #406

  • RayRunner for DynamicSchedule #405

  • PartitionCache for DynamicSchedule #404

  • Refactor DynamicSchedule to be generators. #403

  • Alias doesn’t work on main branch #447

  • Fix creation of literal columns #444

  • Fix aggregations on a self-referencing if_else #441

  • .sort() breaks on non-column Expression #439

  • Repr broken for boolean columns #425