Matplotlib vs Plotly Express: The Ultimate Python Data Visualization Brawl 🥊
👑 Imagine the world of data visualization as a friendly game of chess. On one side sits the seasoned champion, Matplotlib, with years of experience and a loyal following. On the other side, the enthusiastic newcomer, Plotly, bringing fresh ideas and modern flair.
Join us as we watch this friendly face-off between the old guard and the rookie, exploring their unique strengths and styles. Whether you’re a data aficionado or a curious explorer, this match will help you pick the perfect partner for your data storytelling adventures.
Read till the end for some beautiful interactive charts and cheat sheets!
📊What is Matplotlib?
Matplotlib is often considered the foundational library for data visualization in Python and has been a stalwart companion of data scientists, analysts and researchers since 2003. The charts produced are simple yet powerful, static and publication-quality.
✨Key Features and Pros of Matplotlib:
- Wide Range of Plot Types: Supports an extensive range of plot types, ensuring that you can depict your data in the most suitable format.
- Customization: Offers granular control over every aspect of the visualizations.
- Publication quality: One of the standout features of Matplotlib is its ability to generate publication-quality images that meet the standards of scientific publications and presentations.
- Integration with other libraries: Matplotlib can be easily integrated with other Python libraries such as NumPy, Pandas and SciPy, allowing seamless data manipulation and analysis before plotting.
- Steep learning curve: The style of coding is influenced by MATLAB which can be challenging for newcomers who are not familiar with MATLAB’s style of code.
- Less visually appealing graphs compared to Plotly: Charts are not interactive and are not very usable for web apps.
- Verbose code: Matplotlib requires multiple lines of code for slight customizations. This might make visualizations more complex than they should be.
- Limited Support for Dashboards. Creating interactive dashboards or web apps using Matplotlib might not be a good choice.
📊What is Plotly?
Plotly, a relative newcomer to the world of data visualization in Python has been making waves since its inception. Born out of the need for dynamic and interactive data visualizations in 2013, Plotly brings a breath of fresh air to the Python ecosystem. The charts it produces are not just informative; they’re engaging, interactive, and web-ready.
While Matplotlib has long been the foundation for static visualizations, Plotly steps onto the stage with a focus on interactivity and modern web-based charts, offering a vibrant alternative to traditional data plotting tools.
✨Key Features and Pros of Plotly:
- Interactivity: Plotly excels in creating interactive visualizations, enabling user interactions like hovering, clicking, zooming, and panning.
- Web Integration: Plotly is designed for seamless integration into web applications, blogs, and dashboards, making it ideal for online data sharing.
- Versatile Charts: Plotly supports a wide range of chart types, from scatter plots to 3D plots, allowing for diverse and engaging visualizations.
- Easy Customization: Plotly offers extensive customization options for colours, fonts, markers, and labels, making it suitable for users of all skill levels.
- Documentation: Plotly’s documentation is years ahead of Matplotlib. It’s easy to find information about the kind of plot that you are trying to create. Meanwhile, Matplotlib’s website is confusing. If you are a beginner, it will probably take a while to find what you are looking for.
- Web Dependencies: While Plotly is excellent for web-based projects, it may not be the most suitable choice for creating static, publication-quality images without additional steps or libraries.
- Performance with Large Datasets: Plotly may experience performance issues with extremely large datasets or complex visualizations, potentially leading to slower rendering times.
- Dependency on Plotly Cloud: While Plotly provides an open-source library for local usage, some advanced features and functionalities are available only through Plotly’s online cloud service. Users who prefer to work offline or in secure environments may face limitations when utilizing certain Plotly features that are tightly coupled with the cloud platform.
🎨 Visualization Comparison
I know you’re not here to just read about these libraries but to see real-life comparisons. And that’s what we will be doing now. To do so, I will use the World University Rankings 2023 from Kaggle.
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import numpy as np
#Setting the dataset to the variable data
data = pd.read_csv("Rankings2023.csv")
#see first few rows of the dataframe
I preprocessed the data by removing duplicates, changing data types, etc. I’ll skip over this step since the essence of this article is to compare the charts, not discuss data-preprocessing.
Let’s start off with a bar chart. Bar charts are the most popular form of chart and are relatively easy to make. Let's look into the top 5 universities by rank and the total number of students each institution has.
top_5_universities = data.head(5)
fig = px.bar(top_5_universities, x='Name of University',y='No of student', color='Location')
Plotly Express generated an elegant visualization with minimal code. To create the graph, you only need the data frame, x-axis, and y-axis. I also added a command to segment universities by location. Hovering over the bars displays the exact population values.
top_5_universities = data.head(5)
plt.bar(x=top_5_universities['Name of University'], height=top_5_universities['No of student'])
Well… the result is kind of disappointing. I attempted to use the same amount of code as I did with Plotly, but the outcome is unsatisfactory. The labels are overlapping, making it challenging to interpret the graph. To improve its presentation, additional lines of code are needed.
A histogram visually displays the distribution of data by counting data points within predefined intervals or bins. It helps identify patterns, central tendencies, and outliers, and in this case, we’ll use it to explore student-to-staff ratios in the top 30 universities based on their locations.
Plotly delivers once again. With minimal code, we create an interactive histogram that simplifies data interpretation. It’s evident that the USA has the most top-ranked universities in the top 30, and some of these also exhibit the lowest student-to-staff ratios.
top_30_universities = data.head(30)
px.histogram(top_30_universities, x='No of student per staff',color='Location')
Matplotlib produced a satisfactory result, but we had to rely on another library, Seaborn, to achieve it. Additionally, it required writing multiple lines of code for a similar outcome. While lacking interactivity, it still provides a decent visualization. The bin size is different in this case.
import seaborn as sns
sns.histplot(data=top_30_universities, x='No of student per staff', hue='Location')
Scatterplots are used to visualize and assess the potential relationships or correlations between two variables in a dataset. For this chart, let's find the relationship between University rank and the research score (on a scale of 1–100).
Transitioning from one chart to another is fairly simple, and the code is very consistent across all charts. Moreover, the axis labels are automatically adjusted. The documentation is very clear on the website. The interactive charts are a game changer when looking at the visualizations. It’s clearly understandable that there’s an inverse relationship between research score and university rank.
px.scatter(data, x='University Rank', y='Research Score')
Matplotlib lacks automatic interval adjustment, resulting in a cluttered x-axis. While the code is reusable and adjustable, addressing the clutter requires additional lines of code, which can be challenging for beginners. For datasets with a narrower value range on the axes, Matplotlib may offer a cleaner visualization option.
sns.scatterplot(data=data, x='University Rank', y='Research Score')
💹Beautiful Plotly Chart Samples:
Hover over the charts and see the magic!
🚦(Use your laptop/ PC for the best experience)
Gender Ratio and Age on Dating Sites:
This one confirms your suspicions. At the age of 23 (the peak age for online dating), there are 2.5 men for every 1 woman on dating sites.
Men’s Facial Hair Trends
The data for this graph is extracted from a study on grooming trends, where the author suggests that shifts in preferences for facial hair styles are indicative of broader patterns and trends in social behaviour.
Age of Nobel Prize Winners by Field
The box plot shows the age when Nobel Laureates received their prize. Malala Yousafzai is an outlier which can be seen on the chart.
In conclusion, if I had to make a choice, I would lean towards Plotly Express. Its ability to create stunning visualizations with minimal code, additional functionalities, and interactivity makes it a compelling option. However, the right choice ultimately depends on your experience level and specific project requirements. So, whether it’s Matplotlib or Plotly Express, remember that the best library for you is the one that best aligns with your skills and the goals of your data visualization project.