O

Services

Case studies

Journal

Contact

07.02.22

Siuba: Dplyr style dataframes in Python

featured image thumbnail for post Siuba: Dplyr style dataframes in Python

Pandas' group by operations are a pain to use. Siuba fixes that.

Python's success has been down to picking the best bits of other languages and stealing them for itself. Originally, Python had no array library, so it copied Matlab's to make Numpy. Pandas' data frames where inspired by R's. Since my general move to Python a few years ago, the main things I have missed are good graphics and the dplyr library group by operations.

A new library siuba addresses this:

from siuba import group_by, summarize, _ from siuba.data import mtcars (mtcars >> group_by(_.cyl) >> summarize( hp_mean = _.hp.mean(), hp_sd = _.hp.std()) ) Out[2]: cyl hp_mean hp_sd 0 4 82.636364 20.934530 1 6 122.285714 24.260491 2 8 209.214286 50.976886

I really like this, as I do a lot of this type of manipulation when discovering a new dataset. Good work Micheal!

←Previous: Healthcare AI is stuck in POC hell

Next: IBM Watson obituary→


If you enjoyed this then please sign up for a weekly 5 mins briefing in AI and data strategy.

ortom logoortom logoortom logoortom logo

©2021

LINKEDIN

CLUTCH.CO

TERMS & PRIVACY