Simple introduction to quantmod and tushare

Chao Cheng

Last updated on Feb 18, 2021 7 min read

When doing quantitative financial analysis, the first step is to obtain the related data. In the world of R, multiple packages are capable of doing this. In this post, we briefly introduce two packages, quantmod and tushare.

`quantmod`

The quantmod package is designed to assist the quantitative trader. The information at its official website is somewhat outdated with the news section being staged at 2009. But this package is still under development at Github.

Fetch the data

We can fetch the data of Huatai 300 ETF with

library(quantmod)

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: TTR

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

getSymbols("510300.SS", src = "yahoo") # Get the data: 300ETF of Huatai

## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.

## [1] "510300.SS"

etf300 <- `510300.SS`    # rename the variable, note the ` sign around the name
rm(`510300.SS`)

In the previous script, the data is fetched from source Yahoo Finance and "510300.SS" is the code name for Huatai 300ETF at yahoo’s website. This function also supports different data source such as "google" for Google Finance, "FRED" for Federal Reserve Economic Data and "oanda" for Oanda.com. Also "csv" and "rda" for loading data from local source. By default, getSymbols will return the results in the name of the symbol, in this case, it’s "510300.SS", which is an unusual R variable name since it starts with a number 5. So we rename it to etf300.

Another way is to use setSymbolLookup to tell quantmod how we want our data to be fetched

setSymbolLookup(ETF300 = list(name = "510300.SS",    # now ETF300 is `510300.SS`
                              src = "yahoo"))

and we can check the settings for this symbol with

getSymbolLookup("ETF300")    # we can check our setting for this symbol

## $ETF300
## $ETF300$name
## [1] "510300.SS"
## 
## $ETF300$src
## [1] "yahoo"

Then we can fetch the data as usual:

getSymbols("ETF300")

## [1] "ETF300"

Save and load the data from local file

One can save the symbol data to local file for easy future access and analysis. This can be done with saveSymbols

saveSymbols("ETF300", file.path = ".")

This command will save the Huatai 300ETF data to a local file called "ETF300.RData" to the current working directory(file.path = "."). Then one can load it via getSymbols by setting src = "RData":

getSymbols("ETF300", src = "RData",    # we can load data from local file by src = "RData"
           dir = "./",                 # directory and file extension of local file
           extension = "RData")        # see documentation of getSymbols.rda()

## [1] "ETF300"

By default, this function will look for a file with extension .rda. We manually set it with extension = "RData".

Visualize the data

One advantage of quantmod is that it is very easy to visualize the data.

chartSeries(ETF300)

Thanks to xts package, it is easy to visualize a subset of the data

chartSeries(ETF300, subset = "2020-12-15::2021-01-18")    # adjust timeline

One can also add technical indicators from the TTR package to existing figures with

addMACD()    # add moving average plot

More examples can be found at quantmod’s example website.

`Tushare`

In China mainland, it’s somewhat inconvenient to use quantmod due to connection issues with Yahoo Finance and Google Finance. On the other hand, Tushare is a financial data platform mainly focused on China mainland. It’s first released open source in Dec 2014. And now the new version Tushare Pro has been released. Although it’s mainly focused on Python users, an R package tushare is also available at CRAN.

library(Tushare)

## Loading required package: httr

## Loading required package: tidyverse

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::first()  masks xts::first()
## x dplyr::lag()    masks stats::lag()
## x dplyr::last()   masks xts::last()

## Loading required package: forecast

## Loading required package: data.table

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

## The following object is masked from 'package:purrr':
## 
##     transpose

## The following objects are masked from 'package:xts':
## 
##     first, last

A key difference between tushare and quantmod is that in order to acquire Tushare Pro’s personal API token, one has to register an free account. After that, we can initialize tushare API in R with

api <- Tushare::pro_api(token = "PUT YOUR TOKEN HERE")

Then we can query for 300ETF data in such a way

res <- api(api_name = 'fund_daily', ts_code = "510300.SH", start_date = "20200101")
str(res)

## 'data.frame':    271 obs. of  11 variables:
##  $ ts_code   : chr  "510300.SH" "510300.SH" "510300.SH" "510300.SH" ...
##  $ trade_date: chr  "20210210" "20210209" "20210208" "20210205" ...
##  $ pre_close : num  5.69 5.55 5.48 5.47 5.49 ...
##  $ open      : num  5.71 5.58 5.52 5.49 5.46 ...
##  $ high      : num  5.83 5.69 5.57 5.55 5.51 ...
##  $ low       : num  5.7 5.55 5.48 5.48 5.42 ...
##  $ close     : num  5.81 5.69 5.55 5.48 5.47 ...
##  $ change    : num  0.115 0.138 0.072 0.008 -0.011 -0.006 0.08 0.063 -0.037 -0.136 ...
##  $ pct_chg   : num  2.02 2.485 1.313 0.146 -0.201 ...
##  $ vol       : num  3419099 4182998 4750134 5600005 3000842 ...
##  $ amount    : num  1973984 2349538 2628084 3085297 1639395 ...

head(res)

##     ts_code trade_date pre_close  open  high   low close change pct_chg     vol
## 1 510300.SH   20210210     5.692 5.706 5.833 5.697 5.807  0.115  2.0204 3419099
## 2 510300.SH   20210209     5.554 5.575 5.694 5.550 5.692  0.138  2.4847 4182998
## 3 510300.SH   20210208     5.482 5.520 5.573 5.482 5.554  0.072  1.3134 4750134
## 4 510300.SH   20210205     5.474 5.486 5.547 5.476 5.482  0.008  0.1461 5600005
## 5 510300.SH   20210204     5.485 5.460 5.510 5.416 5.474 -0.011 -0.2005 3000842
## 6 510300.SH   20210203     5.491 5.491 5.525 5.464 5.485 -0.006 -0.1093 2819695
##    amount
## 1 1973984
## 2 2349538
## 3 2628084
## 4 3085297
## 5 1639395
## 6 1549207

skimr::skim(res)

Table 1: Data summary
Name	res
Number of rows	271
Number of columns	11
_______________________
Column type frequency:
character	2
numeric	9
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
ts_code	0	1	9	9	0	1	0
trade_date	0	1	8	8	0	271	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
pre_close	1	4.52	0.57	3.53	3.98	4.69	4.92	5.69	▇▇▇▇▃
open	1	4.52	0.57	3.54	3.98	4.70	4.94	5.71	▇▆▇▇▃
high	1	4.56	0.58	3.58	4.01	4.74	4.96	5.83	▇▅▇▅▃
low	1	4.48	0.56	3.49	3.95	4.64	4.89	5.70	▇▆▇▆▃
close	1	4.53	0.57	3.53	3.98	4.70	4.93	5.81	▇▅▇▅▂
change	1	0.01	0.07	-0.34	-0.03	0.00	0.04	0.33	▁▁▇▂▁
pct_chg	1	0.14	1.47	-8.43	-0.61	0.12	0.89	7.34	▁▁▇▂▁
vol	1	4467546.98	1946338.27	1628381.85	3102811.71	3967685.53	5430194.20	14710757.72	▇▅▁▁▁
amount	1	2017224.92	880484.24	647069.16	1384610.85	1825584.80	2418173.43	6024138.89	▇▇▂▁▁

Details about these API keys can be found at here and here.

NOTE: A free tushare account can only access partial of the tushare database. In order to have higher data accessing privilege, one should try to increase their accounts’ points. New free account gets 100 points, with some additional 20s via finishing user profile pages. Possible ways including contribution to the community such as bug report, feature request and development. Also making donation is another acceptable approach. Details about acquiring points can be found at here. Point requirements for accessing different types of data can be found at here.

R quantmod tushare