O

Services

Case studies

Contact

07.02.22

Siuba: Dplyr style dataframes in Python

featured image thumbnail for post Siuba: Dplyr style dataframes in Python

Pandas' group by operations are a pain to use. Siuba fixes that.

Python's success has been down to picking the best bits of other languages and stealing them for itself. Originally, Python had no array library, so it copied Matlab's to make Numpy. Pandas' data frames where inspired by R's. Since my general move to Python a few years ago, the main things I have missed are good graphics and the dplyr library group by operations.

A new library siuba addresses this:

from siuba import group_by, summarize, _ from siuba.data import mtcars (mtcars >> group_by(_.cyl) >> summarize( hp_mean = _.hp.mean(), hp_sd = _.hp.std()) ) Out[2]: cyl hp_mean hp_sd 0 4 82.636364 20.934530 1 6 122.285714 24.260491 2 8 209.214286 50.976886

I really like this, as I do a lot of this type of manipulation when discovering a new dataset. Good work Micheal!

←Previous: Healthcare AI is stuck in POC hell

Next: IBM Watson obituary→


Keep up with the latest developments in data science. One email per month.

ortom logoortom logoortom logoortom logo

©2025

LINKEDIN

CLUTCH.CO

TERMS & PRIVACY