Checking for Duplicate Values
To check for duplicate values down a column, use rows_distinct()
with a columns_subset=
value.
Pointblank Validation |
2025-01-20|18:17:46Polars |
|
|
STEP |
COLUMNS |
VALUES |
TBL |
EVAL |
UNITS |
PASS |
FAIL |
W |
S |
N |
EXT |
#4CA64C66 |
1 |
rows_distinct()
|
b |
— |
|
✓ |
13 |
11 0.85 |
2 0.15 |
— |
— |
— |
— |
2025-01-20 18:17:46 UTC< 1 s2025-01-20 18:17:46 UTC |
import pointblank as pb
validation = (
pb.Validate(
data=pb.load_dataset(dataset="small_table", tbl_type="polars")
)
.rows_distinct(columns_subset="b") # expect no duplicate values in 'b'
.interrogate()
)
validation
Preview of Input Table
|
|
|
|
|
|
|
|
|
|
1 |
2016-01-04 11:00:00 |
2016-01-04 |
2 |
1-bcd-345 |
3 |
3423.29 |
True |
high |
2 |
2016-01-04 00:32:00 |
2016-01-04 |
3 |
5-egh-163 |
8 |
9999.99 |
True |
low |
3 |
2016-01-05 13:32:00 |
2016-01-05 |
6 |
8-kdg-938 |
3 |
2343.23 |
True |
high |
4 |
2016-01-06 17:23:00 |
2016-01-06 |
2 |
5-jdo-903 |
None |
3892.4 |
False |
mid |
5 |
2016-01-09 12:36:00 |
2016-01-09 |
8 |
3-ldm-038 |
7 |
283.94 |
True |
low |
6 |
2016-01-11 06:15:00 |
2016-01-11 |
4 |
2-dhe-923 |
4 |
3291.03 |
True |
mid |
7 |
2016-01-15 18:46:00 |
2016-01-15 |
7 |
1-knw-093 |
3 |
843.34 |
True |
high |
8 |
2016-01-17 11:27:00 |
2016-01-17 |
4 |
5-boe-639 |
2 |
1035.64 |
False |
low |
9 |
2016-01-20 04:30:00 |
2016-01-20 |
3 |
5-bce-642 |
9 |
837.93 |
False |
high |
10 |
2016-01-20 04:30:00 |
2016-01-20 |
3 |
5-bce-642 |
9 |
837.93 |
False |
high |
11 |
2016-01-26 20:07:00 |
2016-01-26 |
4 |
2-dmx-010 |
7 |
833.98 |
True |
low |
12 |
2016-01-28 02:51:00 |
2016-01-28 |
2 |
7-dmx-010 |
8 |
108.34 |
False |
low |
13 |
2016-01-30 11:23:00 |
2016-01-30 |
1 |
3-dka-303 |
None |
2230.09 |
True |
high |