import pointblank as pb
= pb.load_dataset("small_table")
small_table_polars
pb.preview(small_table_polars)
PolarsRows13Columns8 |
||||||||
preview(
data,
columns_subset=None,
n_head=5,
n_tail=5,
limit=50,
show_row_numbers=True,
max_col_width=250,
incl_header=None,
)
Display a table preview that shows some rows from the top, some from the bottom.
To get a quick look at the data in a table, we can use the preview()
function to display a preview of the table. The function shows a subset of the rows from the start and end of the table, with the number of rows from the start and end determined by the n_head=
and n_tail=
parameters (set to 5
by default). This function works with any table that is supported by the pointblank
library, including Pandas, Polars, and Ibis backend tables (e.g., DuckDB, MySQL, PostgreSQL, SQLite, Parquet, etc.).
The view is optimized for readability, with column names and data types displayed in a compact format. The column widths are sized to fit the column names, dtypes, and column content up to a configurable maximum width of max_col_width=
pixels. The table can be scrolled horizontally to view even very large datasets. Since the output is a Great Tables (GT
) object, it can be further customized using the great_tables
API.
data : FrameT | Any
The table to preview, which could be a DataFrame object or an Ibis table object. Read the Supported Input Table Types section for details on the supported table types.
columns_subset : str | list[str] | Column | None = None
The columns to display in the table, by default None
(all columns are shown). This can be a string, a list of strings, a Column
object, or a ColumnSelector
object. The latter two options allow for more flexible column selection using column selector functions. Errors are raised if the column names provided don’t match any columns in the table (when provided as a string or list of strings) or if column selector expressions don’t resolve to any columns.
n_head : int = 5
The number of rows to show from the start of the table. Set to 5
by default.
n_tail : int = 5
The number of rows to show from the end of the table. Set to 5
by default.
limit : int | None = 50
The limit value for the sum of n_head=
and n_tail=
(the total number of rows shown). If the sum of n_head=
and n_tail=
exceeds the limit, an error is raised.
show_row_numbers : bool = True
Should row numbers be shown? The numbers shown reflect the row numbers of the head and tail in the full table.
max_col_width : int | None = 250
The maximum width of the columns in pixels. This is 250
("250px"
) by default.
incl_header : bool = None
Should the table include a header with the table type and table dimensions? Set to True
by default.
: GT
A GT object that displays the preview of the table.
The data=
parameter can be given any of the following table types:
"polars"
)"pandas"
)"duckdb"
)*"mysql"
)*"postgresql"
)*"sqlite"
)*"parquet"
)*The table types marked with an asterisk need to be prepared as Ibis tables (with type of ibis.expr.types.relations.Table
). Furthermore, using preview()
with these types of tables requires the Ibis library (v9.5.0
or above) to be installed. If the input table is a Polars or Pandas DataFrame, the availability of Ibis is not needed.
It’s easy to preview a table using the preview()
function. Here’s an example using the small_table
dataset (itself loaded using the load_dataset()
function):
import pointblank as pb
small_table_polars = pb.load_dataset("small_table")
pb.preview(small_table_polars)
PolarsRows13Columns8 |
||||||||
date_time Datetime |
date Date |
a Int64 |
b String |
c Int64 |
d Float64 |
e Boolean |
f String |
|
---|---|---|---|---|---|---|---|---|
1 | 2016-01-04 11:00:00 | 2016-01-04 | 2 | 1-bcd-345 | 3 | 3423.29 | True | high |
2 | 2016-01-04 00:32:00 | 2016-01-04 | 3 | 5-egh-163 | 8 | 9999.99 | True | low |
3 | 2016-01-05 13:32:00 | 2016-01-05 | 6 | 8-kdg-938 | 3 | 2343.23 | True | high |
4 | 2016-01-06 17:23:00 | 2016-01-06 | 2 | 5-jdo-903 | None | 3892.4 | False | mid |
5 | 2016-01-09 12:36:00 | 2016-01-09 | 8 | 3-ldm-038 | 7 | 283.94 | True | low |
9 | 2016-01-20 04:30:00 | 2016-01-20 | 3 | 5-bce-642 | 9 | 837.93 | False | high |
10 | 2016-01-20 04:30:00 | 2016-01-20 | 3 | 5-bce-642 | 9 | 837.93 | False | high |
11 | 2016-01-26 20:07:00 | 2016-01-26 | 4 | 2-dmx-010 | 7 | 833.98 | True | low |
12 | 2016-01-28 02:51:00 | 2016-01-28 | 2 | 7-dmx-010 | 8 | 108.34 | False | low |
13 | 2016-01-30 11:23:00 | 2016-01-30 | 1 | 3-dka-303 | None | 2230.09 | True | high |
This table is a Polars DataFrame, but the preview()
function works with any table supported by pointblank
, including Pandas DataFrames and Ibis backend tables. Here’s an example using a DuckDB table handled by Ibis:
small_table_duckdb = pb.load_dataset("small_table", tbl_type="duckdb")
pb.preview(small_table_duckdb)
DuckDBRows13Columns8 |
||||||||
date_time timestamp |
date date |
a int64 |
b string |
c int64 |
d float64 |
e boolean |
f string |
|
---|---|---|---|---|---|---|---|---|
1 | 2016-01-04 11:00:00 | 2016-01-04 | 2 | 1-bcd-345 | 3 | 3423.29 | True | high |
2 | 2016-01-04 00:32:00 | 2016-01-04 | 3 | 5-egh-163 | 8 | 9999.99 | True | low |
3 | 2016-01-05 13:32:00 | 2016-01-05 | 6 | 8-kdg-938 | 3 | 2343.23 | True | high |
4 | 2016-01-06 17:23:00 | 2016-01-06 | 2 | 5-jdo-903 | NULL | 3892.4 | False | mid |
5 | 2016-01-09 12:36:00 | 2016-01-09 | 8 | 3-ldm-038 | 7 | 283.94 | True | low |
9 | 2016-01-20 04:30:00 | 2016-01-20 | 3 | 5-bce-642 | 9 | 837.93 | False | high |
10 | 2016-01-20 04:30:00 | 2016-01-20 | 3 | 5-bce-642 | 9 | 837.93 | False | high |
11 | 2016-01-26 20:07:00 | 2016-01-26 | 4 | 2-dmx-010 | 7 | 833.98 | True | low |
12 | 2016-01-28 02:51:00 | 2016-01-28 | 2 | 7-dmx-010 | 8 | 108.34 | False | low |
13 | 2016-01-30 11:23:00 | 2016-01-30 | 1 | 3-dka-303 | NULL | 2230.09 | True | high |
The blue dividing line marks the end of the first n_head=
rows and the start of the last n_tail=
rows.
We can adjust the number of rows shown from the start and end of the table by setting the n_head=
and n_tail=
parameters. Let’s enlarge each of these to 10
:
PolarsRows13Columns8 |
||||||||
date_time Datetime |
date Date |
a Int64 |
b String |
c Int64 |
d Float64 |
e Boolean |
f String |
|
---|---|---|---|---|---|---|---|---|
1 | 2016-01-04 11:00:00 | 2016-01-04 | 2 | 1-bcd-345 | 3 | 3423.29 | True | high |
2 | 2016-01-04 00:32:00 | 2016-01-04 | 3 | 5-egh-163 | 8 | 9999.99 | True | low |
3 | 2016-01-05 13:32:00 | 2016-01-05 | 6 | 8-kdg-938 | 3 | 2343.23 | True | high |
4 | 2016-01-06 17:23:00 | 2016-01-06 | 2 | 5-jdo-903 | None | 3892.4 | False | mid |
5 | 2016-01-09 12:36:00 | 2016-01-09 | 8 | 3-ldm-038 | 7 | 283.94 | True | low |
6 | 2016-01-11 06:15:00 | 2016-01-11 | 4 | 2-dhe-923 | 4 | 3291.03 | True | mid |
7 | 2016-01-15 18:46:00 | 2016-01-15 | 7 | 1-knw-093 | 3 | 843.34 | True | high |
8 | 2016-01-17 11:27:00 | 2016-01-17 | 4 | 5-boe-639 | 2 | 1035.64 | False | low |
9 | 2016-01-20 04:30:00 | 2016-01-20 | 3 | 5-bce-642 | 9 | 837.93 | False | high |
10 | 2016-01-20 04:30:00 | 2016-01-20 | 3 | 5-bce-642 | 9 | 837.93 | False | high |
11 | 2016-01-26 20:07:00 | 2016-01-26 | 4 | 2-dmx-010 | 7 | 833.98 | True | low |
12 | 2016-01-28 02:51:00 | 2016-01-28 | 2 | 7-dmx-010 | 8 | 108.34 | False | low |
13 | 2016-01-30 11:23:00 | 2016-01-30 | 1 | 3-dka-303 | None | 2230.09 | True | high |
In the above case, the entire dataset is shown since the sum of n_head=
and n_tail=
is greater than the number of rows in the table (which is 13).
The columns_subset=
parameter can be used to show only specific columns in the table. You can provide a list of column names to make the selection. Let’s try that with the "game_revenue"
dataset as a Pandas DataFrame:
game_revenue_pandas = pb.load_dataset("game_revenue", tbl_type="pandas")
pb.preview(game_revenue_pandas, columns_subset=["player_id", "item_name", "item_revenue"])
PandasRows2000Columns3 |
|||
player_id String |
item_name String |
item_revenue Float64 |
|
---|---|---|---|
1 | ECPANOIXLZHF896 | offer2 | 8.99 |
2 | ECPANOIXLZHF896 | gems3 | 22.49 |
3 | ECPANOIXLZHF896 | gold7 | 107.99 |
4 | ECPANOIXLZHF896 | ad_20sec | 0.76 |
5 | ECPANOIXLZHF896 | ad_5sec | 0.03 |
1996 | NAOJRDMCSEBI281 | ad_survey | 1.332 |
1997 | NAOJRDMCSEBI281 | ad_survey | 1.35 |
1998 | RMOSWHJGELCI675 | ad_5sec | 0.03 |
1999 | RMOSWHJGELCI675 | offer5 | 26.09 |
2000 | GJCXNTWEBIPQ369 | ad_5sec | 0.12 |
Alternatively, we can use column selector functions like starts_with()
and matches()
to select columns based on text or patterns:
PandasRows2000Columns3 |
|||
session_id String |
session_start Datetime |
session_duration Float64 |
|
---|---|---|---|
1 | ECPANOIXLZHF896-eol2j8bs | 2015-01-01 01:31:03+00:00 | 16.3 |
2 | ECPANOIXLZHF896-eol2j8bs | 2015-01-01 01:31:03+00:00 | 16.3 |
1999 | RMOSWHJGELCI675-vbhcsmtr | 2015-01-21 02:39:48+00:00 | 8.4 |
2000 | GJCXNTWEBIPQ369-9elq67md | 2015-01-21 03:59:23+00:00 | 18.5 |
Multiple column selector functions can be combined within col()
using operators like |
and &
:
pb.preview(
game_revenue_pandas,
n_head=2,
n_tail=2,
columns_subset=pb.col(pb.starts_with("item") | pb.matches("player"))
)
PandasRows2000Columns4 |
||||
player_id String |
item_type String |
item_name String |
item_revenue Float64 |
|
---|---|---|---|---|
1 | ECPANOIXLZHF896 | iap | offer2 | 8.99 |
2 | ECPANOIXLZHF896 | iap | gems3 | 22.49 |
1999 | RMOSWHJGELCI675 | iap | offer5 | 26.09 |
2000 | GJCXNTWEBIPQ369 | ad | ad_5sec | 0.12 |