In our database (described in part 1 of this project) we have a table (prices) that records all the price and price-type ('Offers Over, Fixed Price) changes that we've observed for each property in the addresses table. Similarly, there's a status table that records all the status (Active, Under Offer, Sold) changes. For each property we now want to collect all those changes and make some new features that summarise that information.
Specifically, we want to:
The last two will be useful for adding to the hover tool in the Bokeh plots.
First, let's import the modules we'll need:
import pandas as pd
import numpy as np
import sqlite3
The first thing we need to do is find out what the date / time stamp of the last database update was. Then, pull the price and status history for the listings that were current at that time, from the 'prices' and 'status' tables in the database. First let's query the last database update:
database_path = 'databases/forsale2_hashed.db'
con = sqlite3.connect(database_path)
cur = con.cursor()
res = cur.execute("SELECT dt_of_update FROM database_update ORDER BY dt_of_update DESC LIMIT 2")
db_updates = res.fetchall()
dt_last_db_update = db_updates[0][0]
dt_prior_db_update = db_updates[1][0]
con.close()
Now let's get the price history for all the current listings:
con = sqlite3.connect(database_path)
query = f"SELECT * FROM prices NATURAL JOIN dates WHERE dates.dt_last_seen > \'{dt_prior_db_update}\'"
listings_prices_df = pd.read_sql(query, con)
listings_prices_df.set_index('pid', inplace = True)
con.close()
We're going to summarise each price entry in preparation for making a human-readable but succint summary. First let's shorten the common phrase "Offers over" and "Fixed Price" to "OO" and "FP". This will help the Bokeh ToolTip / Hover Tool real-estate later:
# Convert price_type for any price_type that contains 'Offers' to OO and any price_type that contains 'Fixed' to FP
listings_prices_df['price_type_short'] = np.where(listings_prices_df['price_type'].str.contains('Offers'), 'OO', \
np.where(listings_prices_df['price_type'].str.contains('Fixed'), 'FP',\
listings_prices_df['price_type']))
Now let's make the succint, human readable, price history string for each record:
listings_prices_df['hr_price_history'] = '(' + listings_prices_df['dt_of_price'].str.slice(0,10) + ' ' + \
listings_prices_df['price_type_short'] + '-£' + listings_prices_df['price'] + ')'
Load the data we developed in part 3 of the project:
actives_df = pd.read_pickle('actives_df_part3.pkl')
Compute the number of price changes there have been for each property in the listings_prices_df (using a GroupBy), subtract 1 (we want the number of changes since the first entry) and add that to the actives_df as a feature called 'price_changes'.
actives_df['price_changes'] = listings_prices_df.groupby('pid').count()['price'] - 1
And do the same for the status changes (read database, add the change count and human readable summary features):
con = sqlite3.connect(database_path)
query = f"SELECT * FROM status NATURAL JOIN dates WHERE dates.dt_last_seen > \'{dt_prior_db_update}\'"
listings_status_df = pd.read_sql(query, con)
listings_status_df.set_index('pid', inplace = True)
con.close()
actives_df['status_changes'] = listings_status_df.groupby('pid').count()['status'] - 1
listings_status_df['status_short'] = np.where(listings_status_df['status'].str.contains('Active'), 'Act',\
np.where(listings_status_df['status'].str.contains('offer'), 'UO',\
np.where(listings_status_df['status'].str.contains('Sold'), 'SLD',\
np.where(listings_status_df['status'].str.contains('Reserved'), 'RSD',\
listings_status_df['status']))))
listings_status_df['hr_status_history'] = '(' + listings_status_df['dt_of_status'].str.slice(0,10) + ' ' + \
listings_status_df['status_short'] + ')'
Now we can aggregate the human readable price histories into a single column in the actives_df:
actives_df['hr_price_history'] = listings_prices_df.groupby('pid')['hr_price_history'].agg(lambda x : ','.join(x))
And, the same for the status history:
actives_df['hr_status_history'] = listings_status_df.groupby('pid')['hr_status_history'].agg(lambda x : ','.join(x))
Now let's add a feature called 'm-factor'. It stands for 'Motivation Factor' and the idea is to identify those sellers who are highly motivated to sell (and therefore, perhaps, offer a value opportunity). Sellers that are frequently fiddling with the price are probably highly motivated. On the other hand, properties from sellers that have sat on the market for months without any price changes at all aren't probably that motivated to sell and are sitting waiting for the market to come to them. We'll define the m-factor as 100 x the number of price changes / number of days-on-market:
actives_df['m_factor'] = np.where(actives_df['dom'] > 0, np.round(100 * actives_df['price_changes'] / actives_df['dom'],2), 0)
Now we have everything we need to take a look at our most motivated sellers:
display_cols=['area','dwelling_type','beds','baths','dwelling_cat','dom','price_changes','hr_price_history','m_factor']
motivated_df = actives_df[(actives_df['m_factor'] > 6) & (actives_df['current_status'] == 'Active')][display_cols]\
.sort_values('m_factor', ascending=False)
display(motivated_df.style.hide(axis="index"))
| area | dwelling_type | beds | baths | dwelling_cat | dom | price_changes | hr_price_history | m_factor |
|---|---|---|---|---|---|---|---|---|
| eh16 | 3 bedroom terraced house for sale | 3 | 1 | House | 4 | 1 | (2023-10-31 OO-£230000),(2023-11-01 OO-£225000) | 25.000000 |
| eh3 | 1 bedroom flat for sale | 1 | 1 | Flat | 6 | 1 | (2023-10-29 OO-£165000),(2023-11-03 FP-£170000) | 16.670000 |
| eh15 | 5 bedroom detached house for sale | 5 | 2 | House | 28 | 3 | (2023-10-07 OO-£420000),(2023-10-14 OO-£399995),(2023-10-26 FP-£435000),(2023-10-31 OO-£399995) | 10.710000 |
| eh22 | 3 bedroom terraced house for sale | 3 | 2 | House | 32 | 3 | (2023-10-03 FP-£220000),(2023-10-06 FP-£215995),(2023-10-31 FP-£211995),(2023-11-03 OO-£199000) | 9.380000 |
| eh17 | 3 bedroom terraced house for sale | 3 | 1 | House | 43 | 4 | (2023-09-22 FP-£184999),(2023-10-06 FP-£179995),(2023-10-17 FP-£176995),(2023-10-18 OO-£169995),(2023-10-25 FP-£164995) | 9.300000 |
| eh10 | 2 bedroom flat for sale | 2 | 1 | Flat | 60 | 5 | (2023-09-04 OO-£399000),(2023-10-07 FP-£425000),(2023-10-17 FP-£412000),(2023-10-25 FP-£399000),(2023-10-31 FP-£412000),(2023-11-03 FP-£399000) | 8.330000 |
| eh7 | 2 bedroom flat for sale | 2 | 1 | Flat | 26 | 2 | (2023-10-09 FP-£315000),(2023-10-20 OO-£295000),(2023-10-29 FP-£306000) | 7.690000 |
| eh9 | 1 bedroom flat for sale | 1 | 1 | Flat | 26 | 2 | (2023-10-09 OO-£340000),(2023-10-18 FP-£345000),(2023-10-25 OO-£300000) | 7.690000 |
| eh3 | 2 bedroom ground floor flat for sale | 2 | 1 | Flat | 13 | 1 | (2023-10-22 FP-£227500),(2023-10-31 FP-£225000) | 7.690000 |
| eh15 | 2 bedroom terraced house for sale | 2 | 1 | House | 27 | 2 | (2023-10-08 OO-£250000),(2023-10-31 OO-£260000),(2023-11-01 FP-£260000) | 7.410000 |
| eh11 | 2 bedroom ground floor flat for sale | 2 | 2 | Flat | 28 | 2 | (2023-10-07 OO-£200000),(2023-10-19 FP-£205000),(2023-11-03 FP-£199995) | 7.140000 |
| eh21 | 5 bedroom detached house for sale | 5 | None | House | 28 | 2 | (2023-10-07 From-£443000),(2023-10-20 From-£440000),(2023-10-23 From-£443000) | 7.140000 |
| eh7 | 1 bedroom flat for sale | 1 | 1 | Flat | 14 | 1 | (2023-10-21 OO-£145000),(2023-11-01 FP-£150000) | 7.140000 |
| eh21 | 3 bedroom terraced house for sale | 3 | 1 | House | 71 | 5 | (2023-08-24 OO-£260000),(2023-09-06 OO-£255000),(2023-09-14 FP-£265000),(2023-09-28 FP-£260000),(2023-10-11 OO-£249995),(2023-10-25 FP-£255000) | 7.040000 |
| eh11 | 2 bedroom flat for sale | 2 | 1 | Flat | 90 | 6 | (2023-08-24 OO-£330000),(2023-09-05 FP-£340000),(2023-09-16 OO-£320000),(2023-10-05 OO-£300000),(2023-10-14 FP-£335000),(2023-10-21 FP-£330000),(2023-10-29 OO-£315000) | 6.670000 |
| eh16 | 3 bedroom semi-detached house for sale | 3 | 1 | House | 16 | 1 | (2023-10-19 OO-£270000),(2023-11-01 OO-£255000) | 6.250000 |
Note: I have deliberately dropped the address field for privacy reasons.
We can see all sorts of interesting price action in this table. First, it's very common in Scotland for a property to be listed as Offers Over, especially initially. The general concept is to generate broad interest with a "low-ball" price and then those that "fall in love" will offer considerably over that price to secure it (often in a blind sealed bidding situation termed 'going to a closing date'). In a hot market it's not unknown for properties to go for 20%-30% over the "Offers Over" price. When the market is cooler (as it is at the time of writing), we tend to see more "Fixed Price" properties, where the attraction for the buyer is that the uncertainty of getting into a blind bidding war is taken away. Often then, we might see sellers initially list their property as "Offers Over" and then if the interest hasn't been what they hoped, either reduce the "Offers Over" price or switch to "Fixed Price" which is often higher. This latter situation is really interesting because it signals to the market what the seller's expectations really are. We see plenty of examples of this in this table.
Also, we can see the behaviour of the highly motivated sellers in this table. Some start fiddling with the price just a few days after listing it - perhaps the number of initial enquries to view wasn't as they expected. Others we see adjusting the price and / or price type every week or so - often flip-flopping between "Offers Over" and "Fixed Price" and adjusting each time. Either way, it is clear these sellers are much more motivated to find the market price and sell vs those that listed weeks or months ago and have never adjusted the price and are perhaps waiting for the market to come to them.
Let's take look at the properties that have changed status the most:
display_cols=['area','dwelling_type','beds','baths','dwelling_cat','dom','status_changes','hr_status_history']
distressed_df = actives_df[(actives_df['status_changes'] >= 3) & (actives_df['current_status'] == 'Active')][display_cols]\
.sort_values('status_changes', ascending=False)
display(distressed_df.style.hide(axis="index"))
| area | dwelling_type | beds | baths | dwelling_cat | dom | status_changes | hr_status_history |
|---|---|---|---|---|---|---|---|
| eh26 | 3 bedroom semi-detached house for sale | 3 | 1 | House | 71 | 4 | (2023-08-24 Act),(2023-09-10 UO),(2023-09-30 Act),(2023-10-18 UO),(2023-10-31 Act) |
| eh6 | 2 bedroom apartment for sale | 2 | 2 | Flat | 136 | 3 | (2023-08-24 SLD),(2023-09-18 Act),(2023-09-27 SLD),(2023-10-15 Act) |
| eh14 | 2 bedroom flat for sale | 2 | 1 | Flat | 104 | 3 | (2023-08-24 UO),(2023-08-29 Act),(2023-09-27 UO),(2023-10-05 Act) |
| eh20 | 3 bedroom semi-detached house for sale | 3 | 1 | House | 259 | 3 | (2023-08-24 SLD),(2023-08-31 Act),(2023-09-06 UO),(2023-11-03 Act) |
| eh32 | 3 bedroom cottage for sale | 3 | 1 | House | 167 | 3 | (2023-08-24 UO),(2023-09-26 Act),(2023-10-12 UO),(2023-10-18 Act) |
Note: I have deliberately dropped the address field for privacy reasons.
We don't know why the sales fell through multiple times on these properties but identifying them may be useful in order to extract value in a distressed or special situation that an investor is better able to deal with.
Neither of these lists are likely to be something you'd be able to find online. The market information is mostly owned by the sellers and they're very unlikely to want you to know that they're highly motivated to sell or that there's been issues with previous sales.
Now let's update the map with the sales and status history information in the hover tool:
import os
from bokeh.io import show
from bokeh.plotting import gmap
from bokeh.models import GMapOptions
from bokeh.models import ColumnDataSource
from bokeh.io import output_notebook
from bokeh.models import Scatter
output_notebook()
from bokeh.transform import linear_cmap
from bokeh.palettes import Plasma256 as palette
from bokeh.models import ColorBar
from bokeh.models import HoverTool
# Render a Google roadmap at longitude, latitude centered on Granton / Western Harbour
lat, lon, zoom = 55.98, -3.21, 14
gmap_options = GMapOptions(lat=lat, lng=lon, map_type='roadmap', zoom=zoom)
api_key = os.environ['GOOGLE_API_KEY']
# Define the fields of the hover tool
hover = HoverTool(tooltips = [
# ('Address', '@address'),
('Description', '@dwelling_type'),
('Price', '@current_price'),
('Beds', '@beds'),
('Baths', '@baths'),
('Days on Market', '@dom'),
('Price History', '@hr_price_history'),
('Status History', '@hr_status_history')
])
# Add in the hover tool to the map tools
p = gmap(api_key, gmap_options, width=900, height=400, tools=[hover, 'reset', 'wheel_zoom', 'pan'])
# Define a colour mapper based on asking price and a data source to plot all the For Sale properties
mapper = linear_cmap('current_price', palette, 0, 1400000)
source = ColumnDataSource(actives_df[actives_df['current_status'] == 'Active'])
glyph = Scatter(x='longitude', y='latitude', size='marker_size', fill_color=mapper, line_color=mapper, marker='marker_shape',
line_alpha=0.6, fill_alpha=0.6)
p.add_glyph(source, glyph)
# Put the color bar "legend" on the right
color_bar = ColorBar(color_mapper=mapper['transform'], location=(0,0))
p.add_layout(color_bar, 'right')
show(p)
actives_df.to_pickle('actives_df_part4.pkl')