Madonna

STAT 253: Statistical Machine Learning

Author

Natalia Morales, Claire Kuno, Yanxing Li, Kalid Ali

Published

May 2, 2025

Research Goals

Our goal for this report was to explore and summarize the musical characteristics of Madonna’s most successful songs—those that reached the Billboard Top 100. As part of a deep dive for a radio station, we aimed to identify patterns across her chart-topping hits using data-driven clustering, helping the station better understand the variety within her catalog and potentially inform future playlist decisions.

Data

We used a dataset of songs that appeared on the Billboard Top 100, focusing specifically on those performed by Madonna. The data included her songs that made the chart from across her career, along with musical and popularity-based features pulled from the Spotify API. We filtered the full dataset to include only Madonna’s songs because we were interested in analyzing how her music clusters based on audio traits.

Each song in the dataset included a variety of numeric features such as danceability, energy, acousticness, valence, tempo, and loudness, which capture different aspects of musical style and mood. Other variables included speechiness, instrumentalness, duration, popularity, and weeks on the Billboard chart, giving insight into both the sound and success of each song. There were no missing values, and we didn’t create any new variables—our only preprocessing step was to move the song names into the row names so they wouldn’t be treated as predictors in the clustering process.

Cluster Analysis

Implementation

See code below for full details.

View Code

Hierarchichal Clustering

madonna <- my_artist %>% 
  column_to_rownames("song")

# Include all clustering code in here.
# Make sure to include comments explaining what your code does.

# This is a scenario 1: ALL features x are quantitative OR logical (TRUE/FALSE)
# Use a "complete" linkage_method
hier_model <- hclust(dist(scale(madonna)), method = "complete")

dim(madonna) #use for elbow plot and finding k

[1] 52 15

Insights

We used hierarchical clustering with the complete linkage method to group Madonna’s Billboard Top 100 songs by similarity in musical features. This algorithm was chosen because it constructs a full hierarchy of relationships without requiring a predefined number of clusters. K-mean clustering was tuned and assessed with SS (sum of squares within clusters) and Average Silhouette, both matrices suggest that k=2 would be the most appropriate number of clusters. However, when we look at the dendrogram produced by hierarchical clustering, certain songs, such as “” and “” which has its very own style based on the artist’s intentions. We worry that 2 clusters would not be able to capture the nuance of Madonna and therefore hierarchical clustering is employed for this report. We included all numerical predictors in the clustering: features like danceability, energy, acousticness, loudness, valence, tempo, and popularity, among others. To ensure comparability across variables, we scaled the data so each feature had equal weight in the distance calculations.

To determine the number of clusters, we used both visual inspection of the dendrogram and the elbow plot from K-means clustering. The dendrogram revealed that with fewer than 7 clusters, large groups—especially the pink and purple branches—merged too early, combining songs with noticeably different styles. Choosing k=7 allowed those branches to split more meaningfully, forming clearer groupings. The elbow plot confirmed this choice, showing diminishing returns in within-cluster variation reduction after about 6–7 clusters. A heatmap of the scaled features further helped interpret each cluster by showing which traits (e.g., high energy, low acousticness) were most pronounced across songs.

The resulting clusters reflected patters based on levels of accousticness, energy, danceability, etc. Some groups were rich in upbeat, high-energy dance tracks, while others contained slower, more emotional ballads. It is important to notice that, while genre can play a role in dividing the songs into clusters, we are looking at the more numerical features, sometimes particular of a specific genre, other times more unique to a song. Two songs—“Don’t Cry for Me Argentina” and “Hanky Panky”—stood out by forming singleton clusters, maybe because these seem to be made for movies or theater, which might deviate certain features such as tempo or loudness, that are not the same for songs made to be part of an album or to be played on the radio. Their isolation highlights how hierarchical clustering can capture nuanced musical differences, revealing outliers that traditional genre categories might overlook.

Dimension Reduction

Implementation

See code below for full details.

View Code

# Include all dimension reduction code in here.
# Make sure to include comments explaining what your code does.

pca_small <- prcomp(madonna, scale = TRUE, center = TRUE)

Insights

The PCs that were calculated in our dimension reduction analysis gave us insight into understanding what variables helped Madonna reach the Billboard Top 100. Principal Component Analysis takes the data and reduces its columns by capturing similarities between the features, in this case features of Madonna’s Billboard Top 100. The reduced dimensions of the PCs helped us analyze what clusters of features explained a certain percentage of the variability in the data by descending arrangement. The first PC captured 0.24235, or 25% of the variance of the data with a starking negative impact of acousticness and positive impacts from danceability, energy, loudness, spotify_popularity, tempo, and valence. The second PC explained 0.11775, or around 12% of the variance with speechiness, tempo and valence impacting positively with billboard_weeks and spotify_popularity impacting negatively.

We decided to contain our analysis toward the first 2 PCs because most of the data can be explained by these two calculations. In the Elbow Plot (Figure Scree Plot) visualizes the % of variance explained by each PC. It showed us how after the second PC, the additive amount of variance explained stagnates significantly. So, to reiterate the important variables from the reduced model of Madonna’s Billboard Top 100 songs, we recommend that tempo and valence have significant benefit in boosting her song popularity to reach Billboard Top 100. The amount of the original information for which our retained PCs account for is 0.36009, or around 36% of the cumulative variance in the dataset.

Conclusions

Overall, our analysis provided insight into the musical features that contributed to Madonna’s success on the Billboard Top 100. We began by clustering her songs to uncover patterns and similarities among her most popular tracks. The resulting clusters revealed clear distinctions based on features such as acousticness, energy, danceability, and others.Two major patterns emerged: some songs were upbeat, high-energy dance tracks, while others were slower, more emotional ballads. Both types achieved commercial success, highlighting the range and versatility of Madonna’s musical style. This diversity underscores the idea that there is no single formula for a hit song in her catalog. In interpreting these patterns, it’s important to note that although genre can influence how songs are grouped, our clustering focused on numerical musical attributes. Some of these features align with genre conventions, while others capture characteristics that are unique to individual songs, allowing for a more nuanced understanding of what makes Madonna’s music stand out.

For a future playlist, we would like to acknowledge the different genres of slow and upbeat music Madonna has produced. But, to capture her essence in style, our dimension reduction analysis reveals that tempo and valence are two positively correlated features of her songs. This would mean that her songs have a high tempo and give passionate meaning to sound through valence. Ultimately, Madonna is the perfect artist to include across multiple collections of impactful playlists, Madonna’s songs give a selection of everything from movie tracks to slow emotional ballads to danceable high energy music. A possible limitation in our research could stem from the focus of our research project as the artist we chose, Madonna, is a flexible artist with successes in many genres of music. She does not produce a song under a single type of slow or fast beat. Instead, she carries a discography that plays with many different styles. So, when we investigate the patterns across her songs we saw the same features across different dimensions of her dataset in both positive and negative correlations.

Contributions

Hierarchical Clustering and Dendrograms were done by Natalia and Yanxing. The PCA was done by Kalid and Claire. Natalia then wrote the data description and the insights on the hierarchical clustering, Claire and Kalid wrote the insights on the PCA and the conclusions were written by yanxing. We all revised and edited all code and written parts on the report. Visualizations were done and chosen in group during the work time in class.

Appendix

# put code for any other methods or visualizations that you considered here
# use comments to explain what your code is doing

--- title: "Madonna" subtitle: "STAT 253: Statistical Machine Learning" date: today author: "Natalia Morales, Claire Kuno, Yanxing Li, Kalid Ali" format: html: toc: true toc-depth: 3 embed-resources: true code-tools: true ---  ```{r} #| include: false # Load packages library(tidyverse) library(cluster) library(factoextra) # if your group needs any other packages, add them here library(tidymodels) ``` # Research Goals Our goal for this report was to explore and summarize the musical characteristics of Madonna’s most successful songs—those that reached the Billboard Top 100. As part of a deep dive for a radio station, we aimed to identify patterns across her chart-topping hits using data-driven clustering, helping the station better understand the variety within her catalog and potentially inform future playlist decisions. # Data We used a dataset of songs that appeared on the Billboard Top 100, focusing specifically on those performed by Madonna. The data included her songs that made the chart from across her career, along with musical and popularity-based features pulled from the Spotify API. We filtered the full dataset to include only Madonna’s songs because we were interested in analyzing how her music clusters based on audio traits. Each song in the dataset included a variety of numeric features such as danceability, energy, acousticness, valence, tempo, and loudness, which capture different aspects of musical style and mood. Other variables included speechiness, instrumentalness, duration, popularity, and weeks on the Billboard chart, giving insight into both the sound and success of each song. There were no missing values, and we didn’t create any new variables—our only preprocessing step was to move the song names into the row names so they wouldn’t be treated as predictors in the clustering process. ```{r} #| message: false #| warning: false #| include: false # read in data music <- read.csv("https://bcheggeseth.github.io/253_spring_2024/data/billboard.csv") # Check out artists with at least 40 songs music %>% count(performer) %>% filter(n >= 40) %>% select(performer) # Pick just one of these artists to study my_artist <- music %>% filter(performer == "Madonna") %>% select(-performer) %>% group_by(song) %>% # The last rows deal w songs that appear more than once slice_sample(n = 1) %>% ungroup() # clean data, if necessary ``` ```{r} #| message: false #| warning: false #| echo: false # visualization ``` # Cluster Analysis ## Implementation See code below for full details. <details> <summary>View Code</summary> > Hierarchichal Clustering ```{r} #| message: false #| warning: false madonna <- my_artist %>% column_to_rownames("song") # Include all clustering code in here. # Make sure to include comments explaining what your code does. # This is a scenario 1: ALL features x are quantitative OR logical (TRUE/FALSE) # Use a "complete" linkage_method hier_model <- hclust(dist(scale(madonna)), method = "complete") dim(madonna) #use for elbow plot and finding k ``` </details> ## Insights We used hierarchical clustering with the complete linkage method to group Madonna's Billboard Top 100 songs by similarity in musical features. This algorithm was chosen because it constructs a full hierarchy of relationships without requiring a predefined number of clusters. K-mean clustering was tuned and assessed with SS (sum of squares within clusters) and Average Silhouette, both matrices suggest that k=2 would be the most appropriate number of clusters. However, when we look at the dendrogram produced by hierarchical clustering, certain songs, such as “” and “” which has its very own style based on the artist's intentions. We worry that 2 clusters would not be able to capture the nuance of Madonna and therefore hierarchical clustering is employed for this report. We included all numerical predictors in the clustering: features like danceability, energy, acousticness, loudness, valence, tempo, and popularity, among others. To ensure comparability across variables, we scaled the data so each feature had equal weight in the distance calculations. \ \ To determine the number of clusters, we used both visual inspection of the dendrogram and the elbow plot from K-means clustering. The dendrogram revealed that with fewer than 7 clusters, large groups—especially the pink and purple branches—merged too early, combining songs with noticeably different styles. Choosing k=7 allowed those branches to split more meaningfully, forming clearer groupings. The elbow plot confirmed this choice, showing diminishing returns in within-cluster variation reduction after about 6–7 clusters. A heatmap of the scaled features further helped interpret each cluster by showing which traits (e.g., high energy, low acousticness) were most pronounced across songs. \ \ The resulting clusters reflected patters based on levels of accousticness, energy, danceability, etc. Some groups were rich in upbeat, high-energy dance tracks, while others contained slower, more emotional ballads. It is important to notice that, while genre can play a role in dividing the songs into clusters, we are looking at the more numerical features, sometimes particular of a specific genre, other times more unique to a song. Two songs—“Don’t Cry for Me Argentina” and “Hanky Panky”—stood out by forming singleton clusters, maybe because these seem to be made for movies or theater, which might deviate certain features such as tempo or loudness, that are not the same for songs made to be part of an album or to be played on the radio. Their isolation highlights how hierarchical clustering can capture nuanced musical differences, revealing outliers that traditional genre categories might overlook. ```{r} #| echo: false #| warning: false # put code for visualizations # Heat maps: ordered by the id variable (not clustering) heatmap(scale(data.matrix(madonna)), Colv = NA, Rowv = NA) # Dendrogram using k based on elbow plot cluster_data <- madonna %>% mutate(hier_cluster_k = as.factor(cutree(hier_model, k = 7))) fviz_dend(hier_model, k = 7, horiz = TRUE, cex = 0.4) #Elbow plot to confirm choice of k for dendrogram after tibble(K = 1:51) %>% mutate(SS = map(K, ~ kmeans(scale(madonna), centers = .x)$tot.withinss)) %>% unnest(cols = c(SS)) %>% ggplot(aes(y = SS, x = K)) + geom_point() + theme_minimal() ``` # Dimension Reduction ## Implementation See code below for full details. <details> <summary>View Code</summary> ```{r} #| message: false #| warning: false # Include all dimension reduction code in here. # Make sure to include comments explaining what your code does. pca_small <- prcomp(madonna, scale = TRUE, center = TRUE) ``` </details> ## Insights The PCs that were calculated in our dimension reduction analysis gave us insight into understanding what variables helped Madonna reach the Billboard Top 100. Principal Component Analysis takes the data and reduces its columns by capturing similarities between the features, in this case features of Madonna’s Billboard Top 100. The reduced dimensions of the PCs helped us analyze what clusters of features explained a certain percentage of the variability in the data by descending arrangement. The first PC captured 0.24235, or 25% of the variance of the data with a starking negative impact of acousticness and positive impacts from danceability, energy, loudness, spotify_popularity, tempo, and valence. The second PC explained 0.11775, or around 12% of the variance with speechiness, tempo and valence impacting positively with billboard_weeks and spotify_popularity impacting negatively. \ \ We decided to contain our analysis toward the first 2 PCs because most of the data can be explained by these two calculations. In the Elbow Plot (Figure Scree Plot) visualizes the % of variance explained by each PC. It showed us how after the second PC, the additive amount of variance explained stagnates significantly. So, to reiterate the important variables from the reduced model of Madonna’s Billboard Top 100 songs, we recommend that tempo and valence have significant benefit in boosting her song popularity to reach Billboard Top 100. The amount of the original information for which our retained PCs account for is 0.36009, or around 36% of the cumulative variance in the dataset. ```{r} #| echo: false # put code for visualizations fviz_pca_var(pca_small, repel = TRUE) pca_small %>% tidy(matrix = "eigenvalues") %>% rbind(0) %>% ggplot(aes(y = cumulative, x = PC)) + geom_point(size = 2) + geom_line() + labs(y = "CUMULATIVE % of variance explained") # pca_small$rotation %>% as.data.frame() %>% select(PC1:PC6) %>% rownames_to_column(var = 'Variable') %>% pivot_longer(PC1:PC6 ,names_to = 'PC', values_to = 'Value') %>% #melt ggplot(aes(x = Variable, y = Value, fill = Variable)) + geom_bar(stat = "identity") + facet_wrap(~ PC) + labs(y = "loadings", x = "original features", fill = "original features") + scale_fill_manual(values = rainbow(18)) + theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank()) ``` # Conclusions Overall, our analysis provided insight into the musical features that contributed to Madonna’s success on the Billboard Top 100. We began by clustering her songs to uncover patterns and similarities among her most popular tracks. The resulting clusters revealed clear distinctions based on features such as acousticness, energy, danceability, and others.Two major patterns emerged: some songs were upbeat, high-energy dance tracks, while others were slower, more emotional ballads. Both types achieved commercial success, highlighting the range and versatility of Madonna’s musical style. This diversity underscores the idea that there is no single formula for a hit song in her catalog. In interpreting these patterns, it’s important to note that although genre can influence how songs are grouped, our clustering focused on numerical musical attributes. Some of these features align with genre conventions, while others capture characteristics that are unique to individual songs, allowing for a more nuanced understanding of what makes Madonna's music stand out. \ \ For a future playlist, we would like to acknowledge the different genres of slow and upbeat music Madonna has produced. But, to capture her essence in style, our dimension reduction analysis reveals that tempo and valence are two positively correlated features of her songs. This would mean that her songs have a high tempo and give passionate meaning to sound through valence. Ultimately, Madonna is the perfect artist to include across multiple collections of impactful playlists, Madonna’s songs give a selection of everything from movie tracks to slow emotional ballads to danceable high energy music. A possible limitation in our research could stem from the focus of our research project as the artist we chose, Madonna, is a flexible artist with successes in many genres of music. She does not produce a song under a single type of slow or fast beat. Instead, she carries a discography that plays with many different styles. So, when we investigate the patterns across her songs we saw the same features across different dimensions of her dataset in both positive and negative correlations. # Contributions Hierarchical Clustering and Dendrograms were done by Natalia and Yanxing. The PCA was done by Kalid and Claire. Natalia then wrote the data description and the insights on the hierarchical clustering, Claire and Kalid wrote the insights on the PCA and the conclusions were written by yanxing. We all revised and edited all code and written parts on the report. Visualizations were done and chosen in group during the work time in class. # Appendix ```{r} #| eval: false # put code for any other methods or visualizations that you considered here # use comments to explain what your code is doing ```