Web Scraping a Table of Stanley Cup Champions Using VBA

Chris Wilson
5 min readJul 27, 2019

Have you ever needed data you found online, but when you tried to copy it into Excel it was formatted so poorly it was of no use? Well with the use of VBA we can extract that data straight from the website’s HTML into our Excel Workbook.

For this exercise we will be extracting the Stanley Cup Champions table from Wikipedia, however the same basic procedure can be followed to extract any data from a website.

Step 1: Determine the HTML Code to be Scraped

To begin with we must navigate to the table we mean to scrape, and right-click it choosing the “inspect” option.

This will allow us to view the HTML code in tandem with the website.

In the top left corner of the HTML viewer there will be an option that will highlight the HTML to whatever you have your mouse cursor over.

From here we can now see the table we wish to scrape highlighted as well as the HTML code. We can see that the table is a class “wikitable sortable jquery-tablesorter” and through inspection of the code we see it is the third instance of this, which lines up with it being the third table found on the page. This is important for us to know as every line of HTML is unique and this is how our VBA script will find the correct table.

Step 2: Create New Module in Excel VBA

The next step will be to begin writing our VBA script, which is done from the visual basic editor in Excel, which is found under the Developer Tab. Once in the editor, from the top toolbar select Insert > Module. Now before we begin writing the script, we must ensure two references are active so that VBA will be able to read HTML. From the top toolbar select Tools > References, in the pop-up window ensure that both: Microsoft Internet Controls and Microsoft HTML Object Librar y are selected. We can now begin scripting our web scraper.

Sub StanleyCupChamps1927_2019() End Sub

Step 3: Set Variables

First we need to define and set our objects so that Excel can open and navigate to the correct web page, collect and store the required HTML, and a simple integer that will represent row numbers for our loop later on. We also set excel to open a new web browser instance, and our integer equal to 1

Sub StanleyCupChamps1927_2019() 
Dim ieObj As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer
i = 1
Set ieObj = New InternetExplorer
End Sub

Step 5: Navigate to Wikipedia

The next lines of code will have Excel both navigate to Wikipedia and allow the page to load before proceeding.

Sub StanleyCupChamps1927_2019() 
Dim ieObj As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer
i = 1 Set
ieObj = New InternetExplorer

ieObj.navigate"https://en.wikipedia.org/wiki
/List_of_Stanley_Cup_champions#
NHL_champions_(since_1927)"

Application.Wait Now + TimeValue("00:00:03")
End Sub

Step 6: Create For Loop to Scrape Data

The final step is to create a For Loop that will iterate over the HTML elements and paste the contents into our Excel Workbook. This is where the need to inspect the HTML of the web page comes in. Remember that the table we wanted to scrape was the third of class “wikitable sortable jquery-tablesorter”, using this information we can use the GetElementsByClassName function to select the table, and the GetElementsByTagName function along with “tr” to get get the individual table rows.

For Each htmlEle In 
ieObj.document.getElementsByClassName("wikitable sortable jquery- tablesorter")(2).getElementsByTagName("tr")

Since VBA uses zero-based indexing (ex. the first table would be position (0)) we put (2) after our class tag. Finally, to paste the data into our workbook, we use the Children function to extract the text from the individual table cells. The positions we assign to the children represent the columns in the Wikipedia table from left to right. We will also use our integer variable so that Excel will go to the next row before pasting the next line of data.

With ActiveSheet 
.Range("A" & i).Value = htmlEle.Children(0).textContent
.Range("B" & i).Value = htmlEle.Children(1).textContent
.Range("C" & i).Value = htmlEle.Children(2).textContent
.Range("D" & i).Value = htmlEle.Children(3).textContent
.Range("E" & i).Value = htmlEle.Children(4).textContent
.Range("F" & i).Value = htmlEle.Children(5).textContent
.Range("G" & i).Value = htmlEle.Children(6).textContent
End With

We end the loop by increasing our variable i by 1 and moving to the next row in the Wikipedia table, the entire loop can be seen below.

Sub StanleyCupChamps1927_2019() 
Dim ieObj As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer
i = 1
Set ieObj = New InternetExplorer
ieObj.navigate "https://en.wikipedia.org/wiki
/List_of_Stanley_Cup_champions#NHL_champions_(since_1927)"
Application.Wait Now + TimeValue("00:00:03") For Each htmlele in
ieObj.document.getElementsByClassName("wikitable sortable jquery-tablesorter")(2).getElementsByTagName("tr")
With ActiveSheet
.Range("A" & i).Value = htmlEle.Children(0).textContent
.Range("B" & i).Value = htmlEle.Children(1).textContent
.Range("C" & i).Value = htmlEle.Children(2).textContent
.Range("D" & i).Value = htmlEle.Children(3).textContent
.Range("E" & i).Value = htmlEle.Children(4).textContent
.Range("F" & i).Value = htmlEle.Children(5).textContent
.Range("G" & i).Value = htmlEle.Children(6).textContent
End With
i = i + 1 Next htmlele End Sub

Step 7: Error Handling

If you have been following along and tried to run the above script, you would get an error when it came to the 2005 season. By Visually inspecting the table on Wikipedia we can see this is the lockout year where no Stanley Cup Final was played, and because of this, columns 1–6 are merged into a single cell.

Since this is the only occurrence of this in the table we can simply modify the code with an IF/Else statement. In other words we can tell our loop to behave a specific way for the year 2005, but normally for all other years.

Sub StanleyCupChamp1927_2019()
Dim ieObj As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer
i = 1
Set ieObj = New InternetExplorer
ieObj.navigate "https://en.wikipedia.org/wiki
/List_of_Stanley_Cup_champions#NHL_champions_(since_1927)"
Application.Wait Now + TimeValue("00:00:03") For Each htmlEle
In ieObj.document.getElementsByClassName("wikitable sortable jquery-tablesorter")(2).getElementsByTagName("tr")
If htmlEle.Children(0).textContent = "2005" Then
With ActiveSheet
.Range("A" & i).Value = htmlEle.Children(0).textContent
.Range("B" & i).Value = htmlEle.Children(1).textContent
End With
Else
With ActiveSheet
.Range("A" & i).Value = htmlEle.Children(0).textContent
.Range("B" & i).Value = htmlEle.Children(1).textContent
.Range("C" & i).Value = htmlEle.Children(2).textContent
.Range("D" & i).Value = htmlEle.Children(3).textContent
.Range("E" & i).Value = htmlEle.Children(4).textContent
.Range("F" & i).Value = htmlEle.Children(5).textContent
.Range("G" & i).Value = htmlEle.Children(6).textContent
End With
End If
i = i + 1 Next htmlEle End Sub

With that If/Else Statement in place, the script will run without error and you will have successfully copied the table from Wikipedia to Excel. For an example of what you can do with this data, check out my Jupyter Notebook, where I load and clean the data using Python’s Pandas library and visualize it with the MatPlotLib library.

Originally published at https://www.linkedin.com.

--

--