R

Cumulative Distribution Plots for Frequency Data in R

R has some great tools for generating and plotting cumulative distribution functions. However, they are suited for raw data, not when the data is summarized in frequency counts. However, reducing to frequency counts is often necessary when processing data at the scale of tens of gigabytes or more. Here I describe a convenient two-liner in R to plot CDFs in R based on aggregated frequency data. For example, suppose you want to analyze the number of times people exercise in a month.