Daft 0.0.22 Release Notes#

The Daft 0.0.22 release adds much more testing and bug fixes. The highlights are:

  • Dynamic Runner speedups - our dynamic runners now outperform the existing runners and will be the default in the next release.

  • Property-based tests with Hypothesis for .sort

  • Refactors to enable integration of Rust Series/vPartition/Expressions

Enhancements#

  • Always translate LocalLimit logical plan to its own physical plan. #539

  • Performance improvements for distributed Ray clusters. #537

  • Add resource request to dynamic runners #530

  • Refactor resource requests out of Expressions #528

  • Introduce Schema and Fields for logical plan schema rather than use ExpressionList #516

  • Refactor Expressions to drop column ids #508

  • DynamicRayRunner: increase inflight tasks; fix spread condition #494

  • Add DataFrame.from_glob_path, deprecating DataFrame.from_files #492

  • [Rust] Table and Series implementations that support arrow interops and expr evaluation #443

Bug Fixes#

  • Add filter step to hypothesis test and fix filter bug #551

  • Skip Ray resource request tests if ray version less than 2 #550

  • Fix Expression input_mapping query optimization bug #536

  • Remove walrus operators for Py 3.7 compatibility #489

  • [bugfix] Fix bisect left behavior when search sorting in reverse for utf8 and numeric arrays #547

  • [bugfix] Fix Behavior in Search Sorted (Partitioning) for Nulls and NaNs #545

  • Fix to_pydict to return python lists instead of arrow arrays #527

  • Fix self join where column name can conflict #521

  • Fix dynamic runners sorting #512

  • Fixes #299 to work without relying on expression IDs #510

  • Fix anonymous s3 file access in .url.download() #505

  • Add custom handling of s3 filesystem creation when no credentials are found #504

  • Fix Literal typing import for 3.7 compatibility #498

  • Dynamic runners: Assert against negative limit() #463

  • Better user messages on import errors of optional dependencies #462

  • Add handling of missing filepaths with FileNotFoundError #460

  • Add fix for duplicate URL downloads only filling out bytes for one row #361

Testing#

  • Better organize tests, separating optimizer tests from dataframe tests #552

  • Fix hypothesis test to assert that nulls greater than all values #549

  • [codecov] Enable Code Coverage for Rust From python tests #548

  • Add sort tests #544

  • Skip Ray runner tests in property based testing #543

  • Tests for dataframe repr and html repr #541

  • [Coverage] Update CodeCov Threshold to 1 percent #540

  • [Coverage] Batch upload of coverage files to CodeCov, ignore non-daft python files, update comment config for CodeCov #526

  • Add simple unit tests for Aggs #525

  • Fix hypothesis sort test #524

  • add rust to code coverage #518

  • Codecov Python Code Coverage #517

  • Property based testing #515

  • Add pytest benchmarking for benchmark suites with a simple agg test #455

Build Changes#

  • Fix Ray grpcio issues in CI #555

  • Fix cloud ray tutorial #542

  • Run tutorial notebooks in CI #532

  • Pin polars to versions <= 0.15.18 due to issue #6584 #523

  • Temporary workaround for issue #501 #502

  • downgrade python version in CI #500

  • Fix CI issue with pre-commit toml sorting #490

  • Disable telemetry in CI jobs #471

  • Daft Publishing: dont increment patch version to not clobber newer version of daft #456

Documentation#

  • Fix readme links #535

  • Refactor tutorials #509

  • Add basic CONTRIBUTING information #497

  • install nightly daft in quick start notebook #495

  • Documentation user guide refactor #461

  • v0.0.21 release notes - addendum #458

Closed Issues#

  • Unable to access public S3 buckets without credentials #503

  • Run CI in Python 3.7 to ensure compatibility #499

  • Fix walrus usage for 3.7 compatibility #488

  • More informative ImportError on failed imports of optional dependencies #459

  • MNIST tutorial broken #534

  • Dynamic Runners should respect ResourceRequests #529

  • Dataframe tests for dataframe API #522

  • DataFrame.to_pydict() should produce Python objects, not Arrow objects #467

  • IndexError instead of FileNotFoundError when attempting to read from an invalid path #457

  • Check execution-time error presentation in new Runners #440

  • Run file listing for DataFrame.from_files inside the Runner #426

  • Expanded datetime/interval expression support #373

  • Fix caching semantics #366

  • More String Expressions #333