Welcome, guest | Sign In | My Account | Store | Cart

Notice! PyPM is being replaced with the ActiveState Platform, which enhances PyPM’s build and deploy capabilities. Create your free Platform account to download ActivePython or customize Python with the packages you require and get automatic updates.

Download
ActivePython
INSTALL>
pypm install pydataframe

How to install pydataframe

  1. Download and install ActivePython
  2. Open Command Prompt
  3. Type pypm install pydataframe
 Python 2.7Python 3.2Python 3.3
Windows (32-bit)
0.1.6.150
0.1.6.180Never BuiltWhy not?
0.1.6.150 Available View build log
0.1.4 Available View build log
0.1.3 Available View build log
0.1.1 Available View build log
0.1 Available View build log
Windows (64-bit)
0.1.6.150
0.1.6.180Never BuiltWhy not?
0.1.6.150 Available View build log
0.1.4 Available View build log
0.1.3 Available View build log
0.1.1 Available View build log
0.1 Available View build log
Mac OS X (10.5+)
0.1.6.150
0.1.6.180Never BuiltWhy not?
0.1.6.150 Available View build log
0.1.4 Available View build log
0.1.3 Available View build log
0.1.1 Available View build log
0.1 Available View build log
Linux (32-bit)
0.1.6.150
0.1.6.180Never BuiltWhy not?
0.1.6.150 Available View build log
0.1.4 Available View build log
0.1.3 Available View build log
0.1.1 Available View build log
0.1 Available View build log
Linux (64-bit)
0.1.6.180 Available View build log
0.1.6.150 Available View build log
0.1.4 Available View build log
0.1.3 Available View build log
0.1.1 Available View build log
0.1 Available View build log
 
License
BSD
Dependencies
Imports
Lastest release
version 0.1.6.180 on Jan 9th, 2014

An implemention of an almost R like DataFrame object. Usage:

System Message: ERROR/3 (<string>, line 3)

Unexpected indentation.
u = DataFrame( { "Field1": [1, 2, 3],

"Field2": ['abc', 'def', 'hgi']}, optional:

System Message: ERROR/3 (<string>, line 6)

Unexpected indentation.
['Field1', 'Field2'] ["rowOne", "rowTwo", "thirdRow"])

A DataFrame is basically a table with rows and columns.

Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they're converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.get_item) (it also works for setting values). Please note that any slice get's you another DataFrame, to access individual entries use get_row(), get_column(), get_value().

DataFrames also understand basic arithmetic and you can either add (multiply,...) a constant value, or another DataFrame of the same size / with the same column names, like this: #multiply every value in ColumnA that is smaller than 5 by 6. my_df[my_df[:,'ColumnA'] < 5, 'ColumnA'] *= 6

System Message: WARNING/2 (<string>, line 13); backlink

Inline emphasis start-string without end-string.

#you always need to specify both row and column selectors, use : to mean everything my_df[:, 'ColumnB'] = my_df[:,'ColumnA'] + my_df[:, 'ColumnC']

#let's take every row that starts with Shu in ColumnA and replace it with a new list (comprehension) select = my_df.where(lambda row: row['ColumnA'].startswith('Shu')) my_df[select, 'ColumnA'] = [row['ColumnA'].replace('Shu', 'Sha') for row in my_df[select,:].iter_rows()]

Dataframes talk directly to R via rpy2 (rpy2 is not a prerequiste for the library!) from dataframe import DataFrame from rpy2 import robjects as ro my_df = DataFrame({"ColumnA": [1,2,3], 'ColumnB': ['sha','sha','shu']}) ro.r['print'](my_df)

Combine DataFrames on rows or columns: my_df = a.rbind_copy(b) # a and b have the same columns my_df = a.cbind_view(b) # my_df is a composite sharing numpy arrays (columns) with a and b my_d = a.join_columns_on(b, 'Name_in_A', 'Name_in_B") #join on common values

#manipulate DataFrame columns my_df.insert_column("new_column_name", [1,2, 3]) my_df.drop_column('dropped_column_name') my_df.drop_all_columns_except('keep_me_please', 'keep_me_as_well') my_df.rename_column("old","new") print my_df.get_column_names() my_df.impose_partial_column_order(['FirstColumn','Second_Column'],['pen_ultimate_column','ultimate_column']) # set the column order. Everything between the first and the second list (unspecified columns) get's sorted alphabetically

#access data my_df[100, "ColumnA"] #a new DataFrame with one column and one row my_df.get_value(100, 'ColumnA') #whatever was in in row 100, column 'ColumnA' (string, int, object...) my_df.get_row(100) # -> {"ColumnA": value, "ColumnB": another_value} my_df.get_row_as_list(100) # -> [value, another_value], in order of my_df.columns_ordered my_df.get_column('columnA') # numpy array of the column (a copy) my_df.get_column_view('columnA') # the actual underlying numpy array

#iterate across the data my_df.iter_rows() # iter rows as dictionarys my_df.iter_rows_as_list() # iter rows as lists (see get_row_as_list()) my_df.iter_values_columns_first() #value by value, first column row 1, first column row 2... my_df.iter_values_rows_first() #value by value, first column, row 1, second column, row 1

#turn into boolean array for subselection my_df.where(lambda row: row['ColumnA'].startswith("Hello") and row['ColumnB'] >=5) my_df[:,"Just_one_column"] > 5 # any comparison

#sort sorted_df = my_df.sort_by("ColumnA") # copy sorted by ColumnA ascending sorted_df = my_df.sort_by("ColumnA", False) # copy sorted by ColumnA descending sorted_df = my_df.sort_by(["ColumnA", 'ColumnB'], [False, True]) # copy sorted by ColumnA descending, then Column B ascending

#aggregation functions my_df.mean('ColumnA') # - average (mean) of the values in ColumnA my_df.mean_and_std('ColumnA') # - mean and standard deviation of the values in ColumnA

#translate columns my_df.turn_into_level('ColumnA') #turns into R compatible factor. Optional: order of levels my_df.digitize_column('ColumnA') # bin the values my_df.rankify_column('ColumnA', True) # turn into ranks, Ascending (0.5, 0.6, 0.55) -> 0, 2, 1 my_df.rescale_column_0_1('ColumnA') # rescales a column to lie within 0..1 (inclusive)

#import and export my_df = pydataframe.DF2CSV().read("filename", dialect=pydataframe.TabDialect(), handle_quotes=True) #read a tab seperated value file. Lot's of options, please check the code pydataframe.DF2CSV().write(my_df, filename, dialect=pydataframe.TabDialect()) # write a tab seperated value file pydataframe.DF2Excel().read(filename) pydataframe.DF2Excel().write(my_filename)

Subscribe to package updates

Last updated Jan 9th, 2014

Download Stats

Last month:1

What does the lock icon mean?

Builds marked with a lock icon are only available via PyPM to users with a current ActivePython Business Edition subscription.

Need custom builds or support?

ActivePython Enterprise Edition guarantees priority access to technical support, indemnification, expert consulting and quality-assured language builds.

Plan on re-distributing ActivePython?

Get re-distribution rights and eliminate legal risks with ActivePython OEM Edition.