1,000 Songs to Hear Before you Die

A dataset from the Guardian

Filed under Data Viz


Thought I'd do something fun for my Data Analytics final project at GA, so I came up with the "1000 Songs to hear before you die Recommender tool" -- which primarily chooses songs from the Guardian list, depending on your mood. I've always wanted to try some neat tricks in Tableau, like these Visualization jedi who tinker with widgets and icons in their graphs. This was the perfect opportunity.

The dataset, as mentioned, is from the Guardian's datablog. My initial plan was to do something more ambitious, like connect to a SQL server with a larger database of songs and visualize it. But my instructor advised against it, which forced me to look up an alternative dataset. Being a super NBA fan, my second choice was to use basketball data, but in the end, I thought music was a more universal topic that my classmates would appreciate.

However, my issue was that the 1,000 songs database was that it was a little scarce. It only had 5 features: Theme, Title, Artist, Year, Spotify URL. A genre category would have been nice. Long story short, I attempted a beautiful soup webscrape of the wikipedia infobox. While I successfully scraped one, I realized it would be cumbersome to loop through all artists, particularly because they do not have standardized names. So I did the 2000's old-school method of internet searching each and every unique artist (there were 600+) and getting their genre, location, group/solo, gender. It took me one weekend to complete, and I thought it was well worth it, as I can be more enriching and flexible with my analytics.

And I'm sharing my full dataset below in the Tableau viz, so you can download make your own. At the bottom I'll also share my webscraping code which may be useful.


Components of the visual

The most complex thing I had to do with the visual was to "bin" the years for decades. That's it. Which tells you that the viz was more of a fancy application of my visual design class rather than my data analytics class. The "year highlighter" is actually a pie chart in a gradient color. The "volume slider" was just my own categorizations of which genres are loud. I thought rock was loud and jazz was not; Pop was medium. Meanwhile, I found a nice color scheme in Tableau that was stark enough to resemble a radio's equalizer bars. For the heights of the bars, I created my own recommendation metrics, which is a weighted function of the artist popularity, the type of genre, and the decade. Lastly, I utilized a lot of dashboard filters and actions (which I also used for the mood icons) in order to achieve the desired effect.

During my presentation, the second-to-the-last slide, which was a bubble chart by decade, was also quite popular among my classmates. There are some nice gems in that viz, where I utilized conditional filters for any interesting combination. Example: Given a decade, which artists were popular, or which themes dominated? Or given a theme, which artists were popular for a given decade?


Python code


from bs4 import BeautifulSoup
from urllib.request import urlopen
url= "http://en.wikipedia.org/wiki/The_Beatles"

page = urlopen(url)?-
soup = BeautifulSoup(page.read(), "lxml")
table = soup.find('table', class_='infobox vcard plainlist')
result = {}
exceptional_row_count = 0
for tr in table.find_all('tr'):
    if tr.find('th'):
        result[tr.find('th').text] = tr.find('td').text if tr.find('td') else None
    else:
        exceptional_row_count += 1
if exceptional_row_count > 1:
    print ('WARNING ExceptionalRow>1: ', table)
print (result)
soup = BeautifulSoup(page.read(), "lxml")
table = soup.find('table', class_='infobox vcard plainlist')
result = {}
exceptional_row_count = 0
for tr in table.find_all('tr'):
    if tr.find('th'):
        result[tr.find('th').text] = tr.find('td').text if tr.find('td') else None
    else:
        exceptional_row_count += 1
if exceptional_row_count > 1:
    print ('WARNING ExceptionalRow>1: ', table)
print (result)

{'The Beatles': None, 'Background information': None, 'Origin': 'Liverpool, England', 'Genres': '\n\n\nRock\npop\n\n\n', 'Years active': '1960ñ1970', 'Labels': '\n\n\nParlophone\nApple\nCapitol\n\n\n', 'Associated acts': '\n\n\nThe Quarrymen\nBilly Preston\nPlastic Ono Band\n\n\n', 'Website': 'thebeatles.com', '': None, 'Past members': '\n\nJohn Lennon\nPaul McCartney\nGeorge Harrison\nRingo Starr\n\nSee members section for others'}