Step 24¶
In this step, we explore fine-tuning Matplotlib visualizations for enhanced customizability. We'll use the all_data
dataset generated in Step 21 to create overlapping histograms with Probability Density Functions (PDFs) and legends. Along the way, we'll explain specific parts of the code you'll modify or customize in each section.
1. Setting Up Our Data¶
We start by filtering the all_data
dataframe to create two datasets based on circle_color
. These will serve as the reaction time data for the two groups (orange and blue).
Dataset Setup:
orange_reactiontime_df
: Contains reaction times for trials wherecircle_color
is'orange'
.blue_reactiontime_df
: Contains reaction times for trials wherecircle_color
is'blue'
.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
# Assuming all_data is already loaded
# If you do not have it yet, go back to Step 21 and load it as instructed
all_data = pd.read_csv('reaction_time_data.csv') # Replace with actual file path
# If you've already got `all_data` loaded, you can skip this step by commenting it out
# Create datasets for the two groups
orange_reactiontime_df = all_data[all_data['circle_color'] == 'orange']['reaction_time']
blue_reactiontime_df = all_data[all_data['circle_color'] == 'blue']['reaction_time']
# Preview the datasets
print("Orange Reaction Times:\n", orange_reactiontime_df.head())
print("Blue Reaction Times:\n", blue_reactiontime_df.head())
Key Parts to Check:
- Ensure
all_data
is loaded correctly and contains the expected columns (reaction_time
,circle_color
). - Replace
'reaction_time_data.csv'
with the actual file name/path if needed.
2. Creating Independent Histograms¶
Now that we have two different dataframes, lSet's create histograms for the orange and blue reaction times individually.
# Orange Reaction Time Histogram
plt.figure(figsize=(8, 6))
plt.hist(orange_reactiontime_df, bins=20, color='orange', alpha=0.6, label='Orange')
plt.title('Histogram of Reaction Times: Orange')
plt.xlabel('Reaction Time (ms)')
plt.ylabel('Frequency')
plt.show()
# Blue Reaction Time Histogram
plt.figure(figsize=(8, 6))
plt.hist(blue_reactiontime_df, bins=20, color='blue', alpha=0.6, label='Blue')
plt.title('Histogram of Reaction Times: Blue')
plt.xlabel('Reaction Time (ms)')
plt.ylabel('Frequency')
plt.show()
Key Parts to Modify:
bins=20
: Adjust the number of bins (granularity) if needed.color='orange'
andcolor='blue'
: Customize colors as desired.alpha=0.6
: Adjust transparency (range: 0 to 1).
IpyWidget to Play with alpha
value
You can use an IpyWidget to interactively adjust the alpha
value and see the visual effect it has on the histogram.
This will be cooler to do when we overlap our histograms later, but for now here's an example:
from ipywidgets import interact
@interact(alpha=(0, 1, 0.05))
def plot_histogram(alpha=0.6):
plt.figure(figsize=(8, 6))
plt.hist(orange_reactiontime_df, bins=20, color='orange', alpha=alpha, label='Orange')
plt.title('Histogram of Reaction Times: Orange')
plt.xlabel('Reaction Time (ms)')
plt.ylabel('Frequency')
plt.show()
This code snippet creates an interactive slider for the alpha
value, allowing you to see the histogram change in real-time.
What other ways could you use an IpyWidgets to interact with your data?
3. Combining into a Two-Paneled Plot¶
Two histograms are cool, but it would be nice if they were attached/connected in some way. We can create a side-by-side comparison of the two histograms using subplots!
To do this, we'll set up a fig
figure object with a corresponding axs
array to hold the two subplots and then plot the histograms for orange and blue reaction times in each panel.
fig, axs = plt.subplots(1, 2, figsize=(12, 6), constrained_layout=True)
# Panel 1: Orange
axs[0].hist(orange_reactiontime_df, bins=20, color='orange', alpha=0.6, label='Orange')
axs[0].set_title('Orange Reaction Times')
axs[0].set_xlabel('Reaction Time (ms)')
axs[0].set_ylabel('Frequency')
# Panel 2: Blue
axs[1].hist(blue_reactiontime_df, bins=20, color='blue', alpha=0.6, label='Blue')
axs[1].set_title('Blue Reaction Times')
axs[1].set_xlabel('Reaction Time (ms)')
plt.suptitle('Reaction Times by Circle Color')
plt.show()
Key Parts to Modify:
figsize=(12, 6)
: Change plot size if necessary.- Customize subplot titles and labels as needed.
figure fun
You can also use the fig
object to save your figure to a file. For example, fig.savefig('reaction_time_histograms.png')
will save the figure as a PNG file in the current directory.
You can also adjust the layout using fig.tight_layout()
, or set a global title with fig.suptitle('Overall Title')
.
4. Overlapping Histograms with PDFs¶
One of my favorite ways to display/compare histograms is to combine the two datasets into a single plot, overlaying histograms. We can then calculate and plot probability density function for each dataset to visualize the distribution more clearly.
What's a Probability Density Function (PDF)?
A Probability Density Function (PDF) is a statistical function that describes the likelihood of a continuous random variable falling within a particular range. In this case, we're using the normal distribution PDF to model the reaction time data.
The norm.pdf()
function from scipy.stats
calculates the PDF of a normal distribution given a set of values, mean, and standard deviation.
The normal curve helps us visualize how the data is distributed around the mean and how likely certain values are.
plt.figure(figsize=(10, 6))
# Orange Histogram and Curve
plt.hist(orange_reactiontime_df, bins=20, color='orange', alpha=0.4, label='Orange', density=True)
plt.plot(
np.linspace(orange_reactiontime_df.min(), orange_reactiontime_df.max(), 500),
norm.pdf(
np.linspace(orange_reactiontime_df.min(), orange_reactiontime_df.max(), 500),
np.mean(orange_reactiontime_df),
np.std(orange_reactiontime_df),
),
color='orange',
linestyle='--',
linewidth=2,
label='Orange PDF',
)
# Blue Histogram and Curve
plt.hist(blue_reactiontime_df, bins=20, color='blue', alpha=0.4, label='Blue', density=True)
plt.plot(
np.linspace(blue_reactiontime_df.min(), blue_reactiontime_df.max(), 500),
norm.pdf(
np.linspace(blue_reactiontime_df.min(), blue_reactiontime_df.max(), 500),
np.mean(blue_reactiontime_df),
np.std(blue_reactiontime_df),
),
color='blue',
linestyle='-',
linewidth=2,
label='Blue PDF',
)
plt.title('Overlapping Reaction Time Histograms with PDF Curves')
plt.xlabel('Reaction Time (ms)')
plt.ylabel('Density')
plt.legend()
plt.show()
Key Parts to Modify:
- Adjust the
np.linspace
range to fit the specific dataset more tightly. - Modify
linestyle='--'
orlinewidth=2
for the curves as needed.
Breaking down the PDF Calculation:¶
The following code snippet is what calculates the PDF line for the orange histogram:
plt.plot(
np.linspace(orange_reactiontime_df.min(), orange_reactiontime_df.max(), 500),
norm.pdf(
np.linspace(orange_reactiontime_df.min(), orange_reactiontime_df.max(), 500),
np.mean(orange_reactiontime_df),
np.std(orange_reactiontime_df),
),
color='orange',
linestyle='--',
linewidth=2,
label='Orange PDF Curve',
)
Let's break it down piece by piece:
np.linspace(orange_reactiontime_df.min(), orange_reactiontime_df.max(), 500)
: This creates 500 evenly spaced points between the minimum and maximum reaction times in the orange dataset. This range is used to calculate the PDF values.norm.pdf(...)
: This calculates the PDF values for the given range using the mean and standard deviation of the orange dataset.np.linspace(orange_reactiontime_df.min(), orange_reactiontime_df.max(), 500)
: This is the range of x-values for the PDF curve, we repeat it here to ensure the x-values match the PDF values.np.mean(orange_reactiontime_df)
: The mean of the orange reaction times. This is used as the center of the normal distribution.np.std(orange_reactiontime_df)
: The standard deviation of the orange reaction times. This controls the spread of the normal distribution.- color, linestyle, linewidth, label: These are used to customize the appearance of the PDF curve and how it appears in the legend (respectively).
5. Fine-Tuning and Customizing¶
Add legends, gridlines, and vertical lines indicating mean reaction times.
plt.figure(figsize=(10, 6))
# Histograms
plt.hist(orange_reactiontime_df, bins=20, color='orange', alpha=0.5, label='Orange', density=True)
plt.hist(blue_reactiontime_df, bins=20, color='blue', alpha=0.5, label='Blue', density=True)
# PDFs
plt.plot(
np.linspace(orange_reactiontime_df.min(), orange_reactiontime_df.max(), 500),
norm.pdf(
np.linspace(orange_reactiontime_df.min(), orange_reactiontime_df.max(), 500),
np.mean(orange_reactiontime_df),
np.std(orange_reactiontime_df),
),
color='orange',
linestyle='--',
linewidth=2,
label='Orange PDF Curve',
)
plt.plot(
np.linspace(blue_reactiontime_df.min(), blue_reactiontime_df.max(), 500),
norm.pdf(
np.linspace(blue_reactiontime_df.min(), blue_reactiontime_df.max(), 500),
np.mean(blue_reactiontime_df),
np.std(blue_reactiontime_df),
),
color='blue',
linestyle='-',
linewidth=2,
label='Blue PDF Curve',
)
# Customizations
plt.title('Customized Reaction Time Histograms')
plt.xlabel('Reaction Time (ms)')
plt.ylabel('Density')
plt.grid(visible=True, which='both', linestyle='--', alpha=0.6)
plt.axvline(np.mean(orange_reactiontime_df), color='orange', linestyle='-', label='Mean Orange')
plt.axvline(np.mean(blue_reactiontime_df), color='blue', linestyle='-', label='Mean Blue')
plt.legend(loc='upper right', frameon=True, fontsize=10)
plt.show()
Key Parts to Modify:
- Adjust
plt.grid(...)
for appearance. - Add or adjust vertical lines (
plt.axvline(...)
) for additional statistical markers.
6. Cleaning Up Your Figure¶
One thing you might notice is that at this point, our figure is pretty bloated. We really don't need all that information hanging out in the legend, for example.
We can spend some time going back into our figure and cleaning up the code to make it appear a bit nicer, and more publication-ready. We can...
- Remove unnecessary labels from the legend.
- Adjust the font size of the legend so that they're all legible.
- Remove the gridlines if they're not necessary.
- Adjust the bin size of the histograms so that they're the same width for both groups.
- Remove tick marks on the y-axis to clean up the figure.
First, we're going to calculate bin sizes for the histograms so that they're the same width for both groups.¶
# Define bin size
bin_size = 50 # for example, 50 ms
# Calculate bin edges
orange_bins = np.arange(min(orange_reactiontime_df), max(orange_reactiontime_df) + bin_size, bin_size)
blue_bins = np.arange(min(blue_reactiontime_df), max(blue_reactiontime_df) + bin_size, bin_size)
Now, we'll implement these bin sizes in our histograms.¶
plt.figure(figsize=(10, 6))
# Histograms
plt.hist(orange_reactiontime_df, bins=orange_bins, color='orange', alpha=0.5, label='Orange', density=True)
plt.hist(blue_reactiontime_df, bins=blue_bins, color='blue', alpha=0.5, label='Blue', density=True)
# PDFs
plt.plot(
np.linspace(orange_reactiontime_df.min(), orange_reactiontime_df.max(), 500),
norm.pdf(
np.linspace(orange_reactiontime_df.min(), orange_reactiontime_df.max(), 500),
np.mean(orange_reactiontime_df),
np.std(orange_reactiontime_df),
),
color='orange',
linestyle='--',
linewidth=2,
# remove unnecessary legend label; orange = color orange, blue = blue.
)
plt.plot(
np.linspace(blue_reactiontime_df.min(), blue_reactiontime_df.max(), 500),
norm.pdf(
np.linspace(blue_reactiontime_df.min(), blue_reactiontime_df.max(), 500),
np.mean(blue_reactiontime_df),
np.std(blue_reactiontime_df),
),
color='blue',
linestyle='-',
linewidth=2,
)
# Customizations
plt.title('Customized Reaction Time Histograms', fontsize=20)
plt.xlabel('Reaction Time (ms)', fontsize=16)
plt.ylabel('Density', fontsize=16)
plt.grid(visible=True, which='both', linestyle='-', alpha=0.6) # Change linestyle to '-'
plt.axvline(np.mean(orange_reactiontime_df), color='orange', linestyle='-')
plt.axvline(np.mean(blue_reactiontime_df), color='blue', linestyle='-')
# Clean up the legend
plt.legend(loc='upper right', frameon=True, fontsize=12) # Increase font size
plt.tick_params(axis='both', which='both', length=0) # Hide tick marks on the edge
plt.show()
Other things to consider¶
There are more ways you could customize this plot...
- You could add texture to the bars of the histogram to make them more visually distinct
- You could make sure your histograms start at the same value so that the bins are perfectly aligned
- ... and more!
Summary¶
In this final step of our learning path, we've explored fine-tuning Matplotlib visualizations to create publication-quality figures. By layering histograms, PDFs, and annotations, we've enhanced the clarity and visual appeal of our data representations. These techniques can be applied to a wide range of datasets and research questions, allowing you to create compelling visualizations that effectively communicate your findings.
Now you should have everything you need to create a publication-quality figure in Python!