Time Series Home Value Index

Time-Series Home Value Index Tutorial using auto.arima()

Using 3 Years of Monthly Sales Data (2014-2016) for Zip Code 92508
in Riverside, CA, this tutorial demonstrates how to fit an optimal ARIMA
in R using auto.arima()

#Import Libraries
library(dplyr)
library(ggplot2)
library(forecast)
library(tseries)

Generate Unadjusted Median Sale Price Index

#View Data
glimpse(df.priceMedian92508)
#Format Date
df.priceMedian92508$date <- as.Date.character(df.priceMedian92508$dateSaleMoYr, "%m/%d/%Y")
#Convert to tbl class and view first 12 months
tbl_df(df.priceMedian92508) %>%
 print(n=12)
# A tibble: 38 × 6
   dateSaleMoYr dateSaleMonth dateSaleYear idZip priceMedian       date
          <chr>         <chr>        <chr> <chr>       <dbl>     <date>
1    01/01/2014            01         2014 92508      351500 2014-01-01
2    02/01/2014            02         2014 92508      405000 2014-02-01
3    03/01/2014            03         2014 92508      395000 2014-03-01
4    04/01/2014            04         2014 92508      404000 2014-04-01
5    05/01/2014            05         2014 92508      402500 2014-05-01
6    06/01/2014            06         2014 92508      396500 2014-06-01
7    07/01/2014            07         2014 92508      376750 2014-07-01
8    08/01/2014            08         2014 92508      375000 2014-08-01
9    09/01/2014            09         2014 92508      387000 2014-09-01
10   10/01/2014            10         2014 92508      410000 2014-10-01
11   11/01/2014            11         2014 92508      399000 2014-11-01
12   12/01/2014            12         2014 92508      380000 2014-12-01
# ... with 26 more rows

#View for Sale Year 2016
df.priceMedian92508 %>%
  filter(dateSaleYear == "2016") %>%
  print()

   dateSaleMoYr dateSaleMonth dateSaleYear idZip priceMedian       date
          <chr>         <chr>        <chr> <chr>       <dbl>     <date>
1    01/01/2016            01         2016 92508      375000 2016-01-01
2    02/01/2016            02         2016 92508      425000 2016-02-01
3    03/01/2016            03         2016 92508      407500 2016-03-01
4    04/01/2016            04         2016 92508      430000 2016-04-01
5    05/01/2016            05         2016 92508      444250 2016-05-01
6    06/01/2016            06         2016 92508      444000 2016-06-01
7    07/01/2016            07         2016 92508      407500 2016-07-01
8    08/01/2016            08         2016 92508      418000 2016-08-01
9    09/01/2016            09         2016 92508      420000 2016-09-01
10   10/01/2016            10         2016 92508      446000 2016-10-01
11   11/01/2016            11         2016 92508      450000 2016-11-01
12   12/01/2016            12         2016 92508      435000 2016-12-01

Plot Line Chart

Unadjusted Median Sale Price

 

ggplot(df.priceMedian92508, aes(date, priceMedian)) +
geom_line() +
scale_x_date('date')  +
ylab("Median Sale Price") +
xlab("")

 

Include 3 Month Moving Average

df.priceMedian92508$price_ma3mo = ma(df.priceMedian92508$priceMedian, order=3) #3mo moving avg
ggplot() +
  geom_line(data = df.priceMedian92508, aes(x = date, y = priceMedian,   colour = "Median Price"))  +
  geom_line(data = df.priceMedian92508, aes(x = date, y = price_ma3mo,   colour = "3mo moving avg"))  +
  ylab('Median Sale Price')

 

Apply Seasonal Adjustment using Loess method.

price_ma = ts(na.omit(df.priceMedian92508$price_ma3mo), frequency=12) #Frequency = 12 for monthly data
decomp = stl(price_ma, s.window="periodic")
deseasonal_price <- seasadj(decomp)
plot(decomp)

 

Implement Augmented Dickey Fuller Test for Stationarity
HO (Null Hypothesis) is data is non-stationary. p=.228
Cannot reject H0 with 95% confidence (p=.05).

adf.test(deseasonal_price, alternative = "stationary")
	Augmented Dickey-Fuller Test

data:  deseasonal_price
Dickey-Fuller = -2.8851, Lag order = 3, p-value = 0.2288
alternative hypothesis: stationary

 

First-Order Difference to obtain stationarity, p=.0482
We can now reject H0 with 95% confidence.

Now let’s fit the ARIMA model using auto.arima()
We know we’ll need to include a first order difference in the ARIMA model – ARIMA(p,d,q) d=1 from the ADF test we just performed.

Now let’s see if auto.arima() confirms.

#FIT ARIMA
fit<-auto.arima(deseasonal_price, seasonal=FALSE)
tsdisplay(residuals(fit), lag.max=45, main='(0,1,0) Model Residuals')
#ARIMA(0,1,0) Random Walk

Time to Forecast

#Forecast h = horizon periods
fcast <- forecast(fit, h=3)
plot(fcast)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s