Election in BC Part 1 – Preparing the data

Yesterday was May 9th that means the election day!

There were elections and I felt like our beloved province of British Colombia didn’t get the necessary attention and doesn’t have it’s own data visualization solution! There was a small one in the Vancouver sun: but it is a custom based solution by their talented Graphic designer: Lucas Timmons (GitHub, Twitter: @lucastimmons) not a PowerBI or Tableau!  Not to worry BC Yana to the rescue!

The data sources are 2 pages with the elections results:

1st data set: Results split by candidates: http://electionsbcenr.blob.core.windows.net/electionsbcenr/GE-2017-05-09_Candidate.html

2nd data source: Results split by parties:

http://electionsbcenr.blob.core.windows.net/electionsbcenr/GE-2017-05-09_Party.html

Since Lucas did the map data visualization (I don’t like the colors but I will not be doing the same).

So What shall we do?

  • I have opened PowerBI Desktop
  • added a data source of the type Web

1st data set: Results split by candidates:

  • Added the link to the first HTML page and in the selection dialog chose: Table 0. PowerBI recognized the table on the page and uploaded it.

Data manipulation I have used: Complete missing lines in the 1st data set

The data set items look like this and I need to fill all the blanks to get the community name:

Electoral District Candidate’s Ballot Name Affiliation Total Valid Votes % of Popular Vote
Abbotsford-Mission Dan Cameron Christian Heritage Party of B.C. 607 2.60%
Andrew Murray Christie BC NDP 6,754 28.89%
Simon John Gibson BC Liberal Party 12,080 51.67%
Jennifer Holmes BC Green Party 3,940 16.85%

(referenced from the PowerBI community site)

What shall I do? how to fill the missing data? so the obvious choice might be export everything to Excel or even just copy to Excel and duplicate the result. Let us not despair, this is exactly what DAX & M are for!

I have added a new custom column with the following function written to add it in DAX:

= Table.AddColumn(#”Changed Type”, “Custom”, each if [Electoral District]=”” then null else [Electoral District])

Than I have used the Fill option, from the Transform ribbon tab to fill down all the Null cells. The problem that accord when the Fill down didn’t work in the original column was that those cells did not have the null value to begin with, means they are not empty and cannot be filled according to the definitions.

In the View Ribbon tab, I can see all the data manipulation I have completed so far using M:

let
Source = Web.Page(Web.Contents(“http://electionsbcenr.blob.core.windows.net/electionsbcenr/GE-2017-05-09_Candidate.html”)),
Data0 = Source{0}[Data],
#”Changed Type” = Table.TransformColumnTypes(Data0,{{“Electoral District”, type text}, {“Candidate’s Ballot Name”, type text}, {“Affiliation”, type text}, {“Total Valid Votes”, type number}, {“% of Popular Vote”, Percentage.Type}}),
#”Added Custom” = Table.AddColumn(#”Changed Type”, “Custom”, each if [Electoral District]=”” then null else [Electoral District]),
#”Filled Down” = Table.FillDown(#”Added Custom”,{“Custom”})
in
#”Filled Down”

Data manipulation 2-

I have also removed the total 100% ballot box number as the bottom value for each community. This value we can get later in the 2nd data set by party and connect them by community. Or simply add a measure to calculate it.

I haven’t used code this time, only the existing PowerBI options.

  1. Transform ribbon tab->Split the column [Electoral District]
  2. Used the custom split with the custom value “of” (common for all the ballot boxes numbers there)
  3. Got 2 columns when in the target columns had “null” value for all the rows without the ballot counting
  4. Used the top row filter to simply filter out all the rows that didn’t have null
  5. In PowerBI, now that I don’t need the extra data manipulation column, I just got rid of it 🙂 delete column it is.

The current M code for the whole operation looks like this: (It’s different for a data source that didn’t have to be changed from HTML, the references might contain other commands for different data source data types)

let
Source = Web.Page(Web.Contents(“http://electionsbcenr.blob.core.windows.net/electionsbcenr/GE-2017-05-09_Candidate.html”)),
Data0 = Source{0}[Data],
#”Changed Type” = Table.TransformColumnTypes(Data0,{{“Electoral District”, type text}, {“Candidate’s Ballot Name”, type text}, {“Affiliation”, type text}, {“Total Valid Votes”, type number}, {“% of Popular Vote”, Percentage.Type}}),
#”Added Custom” = Table.AddColumn(#”Changed Type”, “Custom”, each if [Electoral District]=”” then null else [Electoral District]),
#”Filled Down” = Table.FillDown(#”Added Custom”,{“Custom”}),
#”Renamed Columns” = Table.RenameColumns(#”Filled Down”,{{“Custom”, “Full Electoral District”}}),
#”Split Column by Delimiter” = Table.SplitColumn(#”Renamed Columns”, “Electoral District”, Splitter.SplitTextByDelimiter(“of”, QuoteStyle.Csv), {“Electoral District.1”, “Electoral District.2″}),
#”Changed Type1″ = Table.TransformColumnTypes(#”Split Column by Delimiter”,{{“Electoral District.1”, type text}, {“Electoral District.2″, type text}}),
#”Filtered Rows” = Table.SelectRows(#”Changed Type1″, each ([Electoral District.2] = null)),
#”Removed Columns” = Table.RemoveColumns(#”Filtered Rows”,{“Electoral District.2″})
in
#”Removed Columns”

The data set now look like this:

2nd data source: Results split by parties:

 

(referenced from the PowerBI community site)

  • I did the same entering the URL of the results by party
  • Entered to edit the data and split the column with the ballet boxes. I only need their number not the real time count since the election are over.

The data manipulation in the query mode I have used: Split cells,
the split value was: “of” since the values are: “35 of 35”. I ended up with 2 columns containing the same number. so I deleted one of them.

  • I have made sure that all the columns are from the type Decimal number, since I really don’t want to upload a column containing numbers as a text column and wonder why things keep not working? 🙂
  • Next I have uploaded the data to the data set as well.

Now I have 2 Tables containing the data – so easy now the visualization part begins!

(in part 2)

 

Leave a Reply

Your email address will not be published. Required fields are marked *