View on GitHub

ICTD-Data-Analysis

Over the summer of 2019, I worked in the ICTD Lab at the University of Washington analyzing data coming back from a remote cellular network in Bokondini, Indonesia that our lab deployed. These are some of the results I found. I used Jupyter Notebook, Pandas, and Altair to create these graphs.

Data distributed over Users

Timeline of Traffic per User (Daily)

Select a user from the dropdown menu to see how much traffic they use per day.

Traffic from Users per Day as a Distribution over All Users:

Violin Plot

The amount of traffic used by each user per day, plotted over all users and all days in the dataset. The thicker the line, the more users have used an amount of traffic close to the number given (exact numbers would take up too much space, so many of the numbers here are binned into categories to plot over).

Box Plot

The amount of traffic used by each user per day, plotted over all users and all days in the dataset. Each day has its own boxplot with each datapoint in the boxplot being a user that has used a specific amount of data.

Ridgeline Plot

Similar to the violin plot but slightly easier to visualize (and looking less like Christmas ornaments), this ridgeline plot takes the data of each user’s total traffic per day and bins it, then graphs all the bins over each day. The bigger bumps in each day’s plot represent more users with total daily traffic in that area.

Traffic by Users as Distribution over All Users:

Violin Plot (Hourly)

The amount of traffic used by each user, per hour, categorized into a plot of how often they use each amount of traffic and separated out by user. The thicker the line, the more often the user has used that amount of traffic in a day, and vice versa.

Violin Plot (Daily)

The amount of traffic used by each user, per day, categorized into a plot of how often they use each amount of traffic and separated out by user. The thicker the line, the more often the user has used that amount of traffic in a day, and vice versa.

Boxplot (Hourly)

A distribution of traffic used by each user, per hour. The boxes represent the 25th to the 75th percentile of traffic for that user, and the lines below and above represent the 0-25th and 75th-100th percentiles, respectively.

Boxplot (Daily)

A distribution of traffic used by each user, per day. The boxes represent the 25th to the 75th percentile of traffic for that user, and the lines below and above represent the 0-25th and 75th-100th percentiles, respectively.