The effort to understand data by placing it in a visual context
Source: Timer Higher Education
Tufte, E. R. (2001). The visual display of quantitative information.
Resources for 424 Info Vis. Course at University of Washington By. Prof. Maureen Stone and Prof. Polle Zellweger.
# using altair
import pandas as pd
import altair as alt
# you need a dataset
cars_df = pd.read_json("https://github.com/vega/vega-datasets/raw/gh-pages/data/cars.json")
# you can also load the sample data provided with altair using
# cars_df = alt.load_dataset('cars')
# for list of data sets, run the following command in jupyter:
# alt.datasets.list_datasets()
# Build the chart and configure it
chart = alt.Chart(cars_df).mark_circle().encode(
x='Horsepower',
y='Miles_per_Gallon',
color='Origin',
)
# display it
chart
# Same chart on matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
# use loop to plot each circle in a different color
for (origin), group in cars_df.groupby('Origin'):
plt.plot(group['Horsepower'], group['Miles_per_Gallon'],
'o', label=origin)
# set the legend and labels
plt.legend(title='Origin')
plt.xlabel('Horsepower')
plt.ylabel('Miles_per_Gallon');
# enable grid
plt.grid(True)
Chart( data ).mark_type( options ).encode( channels )
1 2 3 4 5 6
# alternatively you can reverse mark and encode
Chart( data ).encode( channels ).mark_type( options )
Chart( data ).mark_type( options ).encode( channels )
1 2 3 4 5 6
Construct a chart object (OOP), can be:
Chart( data ).mark_type( options ).encode( channels )
1 2 3 4 5 6
Tells Altair what data set to use for the plot, can be:
# url also works
url = 'https://vega.github.io/vega-datasets/data/cars.json'
alt.Chart(url).mark_circle().encode(
x='Horsepower',
y='Miles_per_Gallon',
#color="Origin", # bug, does not work with url
)
Chart( data ).mark_type( options ).encode( channels )
1 2 3 4 5 6
Tells Altair how to represent values on the chart, includes:
# we can use this command to display multiple charts from a single cell
chart.display()
# let's modify our chart
chart.mark_area() # this mutated chart
# try other mark_* types
chart.display() # this will show the mutated plot
# These are options that affect all the points
chart.mark_square(opacity=0.3, size=100)
Chart( data ).mark_type( options ).encode( channels )
1 2 3 4 5 6
Chart( data ).mark_type( options ).encode( channels )
1 2 3 4 5 6
These are the options to tell altair how to:
These options are referred to as Channels
# plot Displacement vs Cylinders
chart.encode(x="Displacement", y="Cylinders")
# notice how previous options remain if not changed (like color)
# it's better to create a new chart object for new charts
# so that it is not affected by previous changes
alt.Chart(cars_df).mark_circle().encode(x="Displacement", y="Cylinders")
# Notice how values are no longer colored
alt.Chart(cars_df).mark_bar().encode(
x="Cylinders",
y="count(*)",)
You can use the following functions to describe the aggregation for the axes values in the following format:
'aggregation(variable)'
Use * in place of variable to mean for any row/observation
The functions include: sum, mean, media, variance, stdev, distinct .. and more
#
alt.Chart(cars_df).mark_bar().encode(
x="Cylinders:N",
y="count(*)",)
'sales:Q'
tells Altair that the sales column is a quantitative value.Data Type | Letter | Description |
---|---|---|
quantitative | Q | a continuous real-valued quantity |
ordinal | O | a discrete ordered quantity |
nominal | N | a discrete unordered category |
temporal | T | a time or date value |
# you can use column or row to split the graphs based on group
# this is called a trellis plot
alt.Chart(cars_df).mark_bar().encode(
column="Origin",
x="Cylinders:N",
y="count(*)",)
alt.Chart(cars_df).mark_bar().encode(
color="Origin",
x="Cylinders:N",
y="count(*)",)
alt.Chart(cars_df).mark_circle().encode(
color="Origin",
size="Cylinders",
x="Miles_per_Gallon",
y="Horsepower",)
alt.Chart(cars_df).mark_circle().encode(
color="Origin",
size="Weight_in_lbs",
x="Miles_per_Gallon",
y="Horsepower",)
cart.max_rows = 10000