Our Methodology

On This Page

1. Introduction: Our objective

The COVID-19 pandemic generated unprecedented levels of social and economic disruptions. From an economic perspective, the complicated interplay between the health crisis and the resulting policy response affected both demand and supply sides of markets. With the start of the pandemic, families were forced to drastically change economic behaviors. Likewise, businesses had to adapt to a new economic scenario with supply chain issues, changes in regulatory frameworks, new demand profile, new labor markets, and many other factors. The resulting economic equilibrium varies between and within industries, generating a demand for information on how to identify winners and losers. This project represents an attempt to measure such impacts in the real estate industry with a focus on the city of Edmonton.

Our goal is to find answers in the data. We developed property value models to produce valuation estimates of selected housing attributes before and after the pandemic. Applying a methodology known as hedonic modeling to real estate transaction data for the city of Edmonton, we estimate the market’s valuation of attributes in three general categories: structural characteristics of the property, characteristics of the property’s location, and features of the property’s marketing strategy. 

In summary, we are interested in the market’s valuation of…

Our results represent valuation estimates and therefore depend on the empirical data and modeling assumptions. Below, we summarize the data used in this project and our modeling approach.


2. Data

Our data come from the Realtors Association of Edmonton – RAE (www.realtorsofedmonton.com) and contain Multiple Listing Service® (MLS®) information about the universe of residential real estate sold in the city of Edmonton between January 9, 2018 and September 6, 2022. In this project, we focus on three types of properties: single family detached, duplex, and townhouse.

To minimize the effects of outliers, we drop single detached houses with sold prices greater than three million dollars. We also drop observations with measurement errors where total floor area was reported to be less than 100 square feet and the distance to downtown was more than 35km.  These manipulations exclude only a very small fraction of the data (26 observations). After these adjustments we are left with a sample of 51,581 transactions.

The key MLS variables utilized in this project are:

Table 1 shows descriptive statistics of our variables. In the sections below we discuss these variables in more detail.

Table 1: Descriptive Statistics
Variable Obs Mean Std. Dev. Min Max
Sold price ($)* 51,581 444,803 207,307 50,144 3,307,200
Structural variables
Floor Area (sqft) 51,581 1,509.630 555.170 412.260 9,687.600
Bedrooms 51,581 3.553 0.963 1 10
Full bathrooms 51,559 2.177 0.751 0 9
Garage 51,579 0.878 0.327 0 1
Finished basement 51,577 0.576 0.494 0 1
Age (years) 51,581 29.537 24.516 0 119
Location and Amenities
View 51,581 0.041 0.198 0 1
Playground 51,581 0.623 0.485 0 1
Shopping 51,581 0.803 0.398 0 1
Waterfront 51,581 0.002 0.041 0 1
Park 51,581 0.189 0.391 0 1
Distance to downtown (km) 51,581 9.935 4.697 0.34 31.992
Marketing
Virtual tour 51,581 0.403 0.490 0 1
Photo count 51,581 30.652 12.134 1 50
Sentiment 51,581 18.957 3.127 0 100
Month of Sale
January 51,581 0.040 0.197 0 1
February 51,581 0.070 0.256 0 1
March 51,581 0.102 0.302 0 1
April 51,581 0.107 0.309 0 1
May 51,581 0.117 0.321 0 1
June 51,581 0.123 0.328 0 1
July 51,581 0.110 0.313 0 1
August 51,581 0.090 0.287 0 1
September 51,581 0.072 0.259 0 1
October 51,581 0.066 0.249 0 1
November 51,581 0.062 0.241 0 1
December 51,581 0.040 0.197 0 1
Edmonton Region
Central 51,581 0.069 0.253 0 1
South central 51,581 0.099 0.299 0 1
West 51,581 0.157 0.364 0 1
Northwest 51,581 0.097 0.295 0 1
Northeast 51,581 0.166 0.372 0 1
Southwest 51,581 0.244 0.429 0 1
Southeast 51,581 0.168 0.374 0 1
* Converted to dollars of September 2022 using Alberta’s CPI.

2.1 Price and quantity

The average sold price (in dollars of September 2022) across all properties in our sample is $444,803. The average price of detached single family properties is $490,625 (N=37,728). For duplex, the average price is $381,764 (N=7,544). The average price of townhouses is $246,164 (N=6,309).

The following charts show the total number of transactions and the average sold price, by month, from January of 2018 to August of 2022. These show the cyclical pattern of the market. The number of transactions displays an upward trend with local peaks around late spring and early summer.

These show that average real estate prices exhibit a downward trend until 2020, with a gradual recovery in 2021 and 2022 for detached single family and duplex properties, but remaining flat for townhouses:

2.2 Structural variables

Our models use information from six variables to measure structural property attributes. These variable are: size (floor area), number of bedrooms, number of full bathrooms, an indicator for the presence of a garage, an indicator for whether or not the property has a finished basement, and the age of the property at the time it was sold.

On average, a property in our sample has:

2.3 Location and Amenities

We measure six attributes related to the location of properties and their amenities. These are: view, proximity to playground, proximity to shopping, waterfront property, proximity to a park, and distance to downtown. In our sample:

2.4 Marketing

We use three variables that measure marketing factors related to the online listing of the property. The first is an indicator for whether or not a virtual tour of the property is offer. 40% of the properties sold offered virtual tours (41.2% for detached single family, 39.5% for duplex, and 35.6% for townhouse). The second variable is the listing photo count. On average, the marketing strategy of a property offers approximately 30 photos (31 for detached single family, 30 for duplex, and 27 for townhouse).

Sentiment Analysis of Property Descriptions

The third marketing factor considered in this study leverages information in the professionally written descriptions of properties to quantify the overall sentiment (from negative to positive) of these public remarks. We use a natural language processing algorithm to construct a numeric sentiment (or polarity) score from real estate property descriptions. Our method is a lexicon-based approach that considers valence shifters (Rinker, 2022) and has been applied to perform sentiment analysis in a variety of contexts (Dong and Wu, 2022, Idler et al., 2022, Mishra and Panda, 2022, Roberts et al., 2022).

Lexicons are dictionaries that associate words with sentiments. For example, the National Research Council Canada Lexicon consists of a list of words (in English) and their associations with certain categories such as emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) or sentiments (negative and positive). Many lexicons are based on crowdsourcing, where online survey tools are used to elicit from a large number of individuals which emotions and/or sentiment are associated with different words (Mohammad and Turney, 2013).

Rinker’s approach to calculate a polarity score goes beyond the aggregation of the lexicon sentiment of individual words to consider sentiment (or valence) shifters. Valence shifters are words that modify the sentiment of the polarized word (i.e. the word which a sentiment score is attributed). These shifters can negate, amplify, or de-amplify a polarized word. Positive words are attributed a score of 1 while negative words have a score of -1. However, an amplifier increases the impact of the polarized word while a de-amplifier reduces its impact. For instance, if the polarized word is “strong”, the sentence “he is very strong” amplifies the sentiment of ‘strong’ while “he is hardly strong” de-amplifies it. As a result, the sentiment of a polarized word depends on the cluster of valence shifter around it (specifically, we use four words before and two words after). Each polarized word’s cluster is attributed a score based on a weighted function of the polarized word and its valence shifters. To obtain a sentence-level sentiment score, the scores of all clusters in a sentence are summed and divided by the square root of the sentence’s word count. The sentiment score of a real estate property is the average sentiment of sentences in the property description text. To facilitate interpretation of the valuation models, we normalize the description sentiment to the interval 0-100, where 100 represents the maximum positive sentiment.

Figure 2 shows the distribution of sentiment scores. The average sentiment score of properties in our sample is 18.96. 95% of the properties have a sentiment score less than 24.05. Interestingly, the average sentiment of duplexes (19.15) and townhouses (19.10) are larger than that of detached single family properties (18.90). Both differences are statistically significant with p-value < 0.05.

Figure 2: Distribution of Sentiment Scores

2.5 Month of Sale

As figure 1 shows, the number of real estate transactions in Edmonton is highly seasonal. 12.3% of the transactions in our sample were concluded in the month of June. January and December are the months with the smallest shares of properties sold (4% in each month). Refer to table 1 for additional details.

2.6 Edmonton Region

Properties in our sample are tabulated into one out of seven Edmonton’s regions. These regions are comprised of a collection of zones in the Edmonton MLS ® System. Table 2 shows the zones that make up each region.

Region Zones
Central 4, 5, 7, 8, 9, 11, 12, 13
South Central 15, 17, 18, 19
West 10, 20, 21, 22, 58
Northwest 1, 27, 59, 40
Northeast 2, 3, 6, 23, 28, 35
Southwest 14, 16, 55, 56, 57, 81
Southeast 29, 30, 52, 53, 54

Most transaction in our sample (24.4%) are from properties in the Southwest region. Examples of communities in this region include Allard, Chappelle, Edgemont, Terwillegar, and Windermere. We observe fewer transaction in the central region (6.9%). Central communities include Alberta Avenue, Inglewood, Parkdale, and Westmount. Refer to table 1 for additional details.

2.7 Interest Rates

It is important to control for the strength of the market demand when estimating valuation models. To control for credit conditions, we collect interest rate information from the Bank of Canada. Figure 3 shows the (average) overnight interest rate, by month of sale. We observe significant interest rate variation in the sampling period.

Figure 3: Bank of Canada’s Interest Rate

2.8 The COVID-19 Pandemic

We desegregate the pandemic to offer more insights on how different years of the pandemic affected real estate valuation. We split the sample into three periods to examine the impact of the COVID-19 pandemic on the valuation of property attributes. We refer to these periods as pre-pandemic, pandemic year 1, and pandemic year 2. The period classification is based on data on COVID-19 ICU hospitalizations in Alberta (Figure 4).

The first period is the pre-pandemic period, defined to contain all the real estate transactions from the beginning of the sample (January 9, 2018) to the day of the first ICU hospitalization in Alberta (March 12, 2020, first vertical line in Figure 4). The second period is defined to go from March 12, 2020 until the local minimum of 17 ICU hospitalizations on July 31, 2021 (second vertical line in Figure 4).

Figure 4: COVID-19 Intensive Care Unit Hospitalizations in Alberta

3. Valuation Model

Hedonic property-value models are the backbone of real estate valuation. The approach was originally developed in the 1970s and is based on the intuitive idea that property values can be decomposed into values of property attributes and location amenities (Rosen, 1974). Valuations of attributes and amenities come from the capitalization of the variation in these characteristics into property prices. Specifically, hedonic models estimate a real estate price function using data on sales prices, property-specific physical attributes, and location-specific amenities.

3.1 Defining Markets

Markets were defined over the seven regions of the city of Edmonton (refer to table 2). In addition to spatial boundaries, we also separate markets by property type, i.e. detached single family, duplex, and townhouse. In summary, we consider 21 distinct markets resulting from the combinations of the three types of properties with the seven regions of the city of Edmonton. As we discuss below, separate models are estimated for each market. This approach allows for market-specific attribute valuations.

3.2 Econometric Specification

First stage hedonic estimation refers to an econometric model where home prices are regressed on property attributes, i.e. structural variables (e.g. total floor area, number of full bathrooms) and location amenities (e.g. proximity to parks, distance to downtown). This model allows for the estimation of all market participants’ marginal willingness to pay for attributes at their baseline levels. For each housing market, we estimate the following regression model:

Υit = α + β1Ρit + β2Χit + β3ΡitΧit + β4Ζit + δt + εit

where:

We are interested in the pre- and post-pandemic valuation of the following attributes:

The main results are captured by the Ordinary Least Squares estimates of the following parameters:

β2:
the pre-pandemic valuation of Χ
β2 + β3:
the post-pandemic valuation of Χ

To further disaggregate our results, we consider separate models for the two years of the pandemic (see discussion in section 2.8). For the first year of the pandemic, the model considers all transactions from January 9, 2018 (beginning of the sample) to July 31, 2021 (cut-off based on ICU hospitalizations). For the second year of the pandemic, the model considers all pre-pandemic transactions (i.e. from January 9, 2018 to March 12, 2020) and the transactions from the July 31, 2021 cut-off until September 6, 2022 (the end of the sample).

In summary, our hedonic models produce valuation estimates based on the combination of 12 attributes, 21 markets times, and 3 time periods (pre-pandemic, pandemic year 1, pandemic year 2). These estimates are accessible via the Explore the Results tool.


4. Concluding Remarks

This research produces a variety of valuation estimates from several valuation models. These estimates are available in the Explore the Results tool. To select a model, choose an attribute, a region, and a property type. The online tool will display 3 estimates: a pre-pandemic valuation, a valuation during ‘pandemic year 1’, and a valuation during ‘pandemic year 2’ (specific pandemic dates are defined in section 2.8).

Finally, we reinforce that hedonic models simply produce valuation estimates. These estimates vary with the data (e.g. location and time), model specification, and estimator used. There are no academic consensus on best empirical approach, and each approach has advantages and disadvantages. The approach used in this project balances empirical robustness with simplicity and accessibility of results to the general public. The results offered by this project represent academic results. All content and information on this website is for educational and informational purposes only, and does not constitute any sort of advice. All content and information on this website are made available “as is”. We make no warranty, and expressly disclaims all warranties, as to the accuracy or completeness of the information.


On This Page

References