daft.DataFrame#

class daft.DataFrame(plan: daft.logical.logical_plan.LogicalPlan)[source]#

A Daft DataFrame is a table of data. It has columns, where each column has a type and the same number of items (rows) as all other columns.

__init__(plan: daft.logical.logical_plan.LogicalPlan) → None[source]#

Constructs a DataFrame according to a given LogicalPlan. Users are expected instead to call the classmethods on DataFrame to create a DataFrame.

Parameters: plan – LogicalPlan describing the steps required to arrive at this DataFrame

Methods

`__init__`(plan)	Constructs a DataFrame according to a given LogicalPlan.
`agg`(to_agg)	Perform aggregations on this DataFrame.
`collect`([num_preview_rows])	Executes the entire DataFrame and materializes the results
`count`(*cols)	Performs a global count on the DataFrame
`count_rows`()	Executes the Dataframe to count the number of rows.
`distinct`()	Computes unique rows, dropping duplicates
`exclude`(*names)	Drops columns from the current DataFrame by name
`explain`([show_optimized])	Prints the LogicalPlan that will be executed to produce this DataFrame.
`explode`(*columns)	Explodes a List column, where every element in each row's List becomes its own row, and all other columns in the DataFrame are duplicated across rows
`from_csv`(args, *kwargs)
`from_files`(path)	Creates a DataFrame of file paths and other metadata from a glob path
`from_glob_path`(path)	Creates a DataFrame of file paths and other metadata from a glob path
`from_json`(args, *kwargs)
`from_parquet`(args, *kwargs)
`from_pydict`(data)	Creates a DataFrame from a Python dictionary
`from_pylist`(data)	Creates a DataFrame from a list of dictionaries
`groupby`(*group_by)	Performs a GroupBy on the DataFrame for aggregation
`join`(other[, on, left_on, right_on, how])	Column-wise join of the current DataFrame with an `other` DataFrame, similar to a SQL `JOIN`
`limit`(num)	Limits the rows in the DataFrame to the first `N` rows, similar to a SQL `LIMIT`
`max`(*cols)	Performs a global max on the DataFrame
`mean`(*cols)	Performs a global mean on the DataFrame
`min`(*cols)	Performs a global min on the DataFrame
`num_partitions`()
`plan`()	Returns LogicalPlan that will be executed to compute the result of this DataFrame.
`read_csv`(path[, has_headers, column_names, ...])	Creates a DataFrame from CSV file(s)
`read_json`(path)	Creates a DataFrame from line-delimited JSON file(s)
`read_parquet`(path)	Creates a DataFrame from Parquet file(s)
`repartition`(num, *partition_by)	Repartitions DataFrame to `num` partitions
`schema`()	Returns the Schema of the DataFrame, which provides information about each column
`select`(*columns)	Creates a new DataFrame from the provided expressions, similar to a SQL `SELECT`
`show`([n])	Executes enough of the DataFrame in order to display the first `n` rows
`sort`(by[, desc])	Sorts DataFrame globally
`sum`(*cols)	Performs a global sum on the DataFrame
`to_pandas`()	Converts the current DataFrame to a pandas DataFrame.
`to_pydict`()	Converts the current DataFrame to a python dictionary.
`to_ray_dataset`()	Converts the current DataFrame to a Ray Dataset which is useful for running distributed ML model training in Ray
`where`(predicate)	Filters rows via a predicate expression, similar to SQL `WHERE`.
`with_column`(column_name, expr[, ...])	Adds a column to the current DataFrame with an Expression, equivalent to a `select` with all current columns and the new one
`write_csv`(root_dir[, partition_cols])	Writes the DataFrame as CSV files, returning a new DataFrame with paths to the files that were written
`write_parquet`(root_dir[, compression, ...])	Writes the DataFrame as parquet files, returning a new DataFrame with paths to the files that were written

Attributes

`column_names`	Returns column names of DataFrame as a list of strings.
`columns`	Returns column of DataFrame as a list of Expressions.