17
Sep
2023
0

A Data Scientist goes…House Hunting (Part 5 – Predicting Value)

Note: The finished dashboard for this project can be seen here.

So far we’ve been concerned with providing tools to explore the data and provide analysis. We now turn to another popular aspect of data science: machine learning and inference.

We’ve seen in the data (and we know this intuitively) that there are factors that strongly correlate with the price of a home – how many bedrooms it has, how many bathrooms (which is likely to be highly correlated) and where it is located, to name just a few. Indeed, using the analysis we’ve already done, one could make a pretty reasonable guess at the value of a generic flat, for example, just by knowing how many bedrooms it has and in which postcode sector it is located. We could perhaps make it a bit more sophisticated by adding in an ‘adjustment factor’ for how many bathrooms it has. However, once the number of factors gets beyond just a handful of ‘adjustment factors’, the model becomes quite complex and unwieldy and likely, inaccurate, as the shaky assumptions we made about the relationships between the factors is exposed.

This is where the power of machine learning comes in – a rich set of tools for building data models from existing data and which we can then use to infer values for new data we have never seen before.

We are going to build a machine learning model that predicts the price of a property based on the data we have available in our database. The task is a regression task and the model will be a supervised learning model.

As with my previous posts, I will be using Jupyter Notebook to develop the model:

You may also like

How to Overlay UK Postcode Sectors onto a Map
A Data Scientist goes…House Hunting (Part 4 – Searching for Value)
A Data Scientist goes…House Hunting (Part 3 – Geo-Location Visualisation)
A Data Scientist goes…House Hunting (Part 2 – Data Prep and EDA)