PyPIversion

tidypandas#

A grammar of data manipulation for pandas inspired by tidyverse

tidypandas python package provides minimal, pythonic API for common data manipulation tasks:

  • tidyframe class (wrapper over pandas dataframe) provides a dataframe with simplified index structure (no more resetting indexes and multi indexes)

  • Consistent ‘verbs’ (select, arrange, distinct, …) as methods to tidyframe class which mostly return a tidyframe

  • Unified interface for summarizing (aggregation) and mutate (assign) operations across groups

  • Utilites for pandas dataframes and series

  • Uses simple python data structures, No esoteric classes, No pipes, No Non-standard evaluation

  • No copy data conversion between tidyframe and pandas dataframes

  • An accessor to apply tidyframe verbs to simple pandas datarames

Example#

  • tidypandas code:

df.filter(lambda x: x['col_1'] > x['col_1'].mean(), by = 'col_2')
  • equivalent pandas code:

(df.groupby('col2')
   .apply(lambda x: x.loc[x['col_1'] > x['col_1'].mean(), :])
   .reset_index(drop = True)
   )

Why use tidypandas#

tidypandas is for you if:

  • you frequently write data manipulation code using pandas

  • you prefer to have stay in pandas ecosystem (see accessor)

  • you prefer to remember a limited set of methods

  • you do not want to write (or be surprised by) reset_index, rename_axis often

  • you prefer writing free flowing, expressive code in dplyr style

tidypandas relies on the amazing pandas library and offers a consistent API with a different philosophy.

Presentation#

Learn more about tidypandas (presentation)

Installation#

  1. Install release version from Pypi using pip:

    pip install tidypandas
    
  2. For offline installation, use whl/tar file from the releases page on github.

Contribution/bug fixes/Issues:#

  1. Open an issue/suggestion/bugfix on the github issues page.

  2. Use the master branch from github repo to submit your PR.