Austin home

Quantitative Trading: example 3.1 Scrape web pags for financial data

This is the first in a series of posts that will convert the MATLAB scripts from E Chan’s book ‘Quantative Trading: How to Build Your Own Algorithmic Trading Business’ into R code.

The following code will be used to retrieve a stock’s historical price information from Yahoo! Finance

rm(list = ls()) # clear the workspace
library(httr)
library(quantmod)
library(XML)
symbol <- "IBM" # the stock of interest
in_url <- paste0('http://finance.yahoo.com/q/hp?s=',  symbol)

# retrieving the webpage data
histPriceFile_page <-GET(in_url)

# convert to a list
histPriceFile <- readHTMLTable(rawToChar(histPriceFile_page$content), 
                               stringsAsFactors = F)

# extract what we want from the list - the second element
histPriceFile_df <- histPriceFile[[2]]

# view the contents
head(histPriceFile_df)
##    Date         Open   High      Low  Close* AdjClose** Volume
## 1 Mar 14, 2018 160.17  160.68  157.74 158.12 158.12    3,614,300
## 2 Mar 13, 2018 160.09  162.11  158.81 159.32 159.32    4,185,600
## 3 Mar 12, 2018 159.64  161.02  158.87 160.26 160.26    5,063,500
## 4 Mar 09, 2018 157.47  159.58  157.30 159.31 159.31    5,022,200
## 5 Mar 08, 2018 159.00  159.57  155.07 156.21 156.21    6,455,600
## 6 Mar 07, 2018 155.00  158.83  154.73 158.32 158.32    4,607,700

The example from the book actually allocates the data to individual variables and so just for completeness

# standardise the column names
colnames(histPriceFile_df) <- c("Date", "Open", "High", "Low", 
                                "Close", "Adj_Close", "Volume")
head(histPriceFile_df)
##           Date   Open   High    Low  Close Adj_Close    Volume
## 1 Mar 14, 2018 160.17 160.68 157.74 158.12    158.12 3,614,300
## 2 Mar 13, 2018 160.09 162.11 158.81 159.32    159.32 4,185,600
## 3 Mar 12, 2018 159.64 161.02 158.87 160.26    160.26 5,063,500
## 4 Mar 09, 2018 157.47 159.58 157.30 159.31    159.31 5,022,200
## 5 Mar 08, 2018 159.00 159.57 155.07 156.21    156.21 6,455,600
## 6 Mar 07, 2018 155.00 158.83 154.73 158.32    158.32 4,607,700
# allocate to individual variables
op <- histPriceFile_df$Open
hi <- histPriceFile_df$High
lo <- histPriceFile_df$Low
cl <- histPriceFile_df$Close
vol <- histPriceFile_df$Volume
adjCl <- histPriceFile_df$Adj_Close

And that is basically it for the example. More needs to be done before we can use the data (preprocessing) but that will be another post as I want to keep the e chan examples as stand alone items.

To finish we will save our dataframe for use another time. This can be done in a variety of ways dependent on what you want to do with it. A csv file is useful for sharing and if you want to look at your data outside of R; an rds file allows you to read you data back in as an R variable.

write.csv(histPriceFile_df, file = "histPriceFile_df.csv")
saveRDS(histPriceFile_df, "histPriceFile_df.rds")

# histPriceFile_df <- read.csv("histPriceFile_df.csv", header = TRUE)
# histPriceFile_df <- readRDS("histPriceFile_df.rds")