Courtesy of Yahoo finance, it is possible to bulk download historical prices data. This script, borrowed from pycurl retriever-multi.py example, fetch series for several tickers at a time. It uses urllib to fetch web data, so it should work with a plain vanilla python distro.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | #! /usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'gian paolo ciceri <gp.ciceri@gmail.com>'
__version__ = '0.1'
__date__ = '20070401'
__credits__ = "queue and MT code was shamelessly stolen from pycurl example retriever-multi.py"
#
# Usage: python grabYahooDataMt.py -h
#
#
# for selecting tickers and starting date it uses an input file of this format
# <ticker> <fromdate as YYYYMMDD>
# like
# ^GSPC 19500103 # S&P 500
# ^N225 19840104 # Nikkei 225
import sys, threading, Queue, datetime
import urllib
from optparse import OptionParser
# this thread ask the queue for job and does it!
class WorkerThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while 1:
try:
# fetch a job from the queue
ticker, fromdate, todate = self.queue.get_nowait()
except Queue.Empty:
raise SystemExit
if ticker[0] == "^":
tick = ticker[1:]
else:
tick = ticker
filename = downloadTo + "%s_%s.csv" % (tick, todate)
fp = open(filename, "wb")
if options.verbose:
print "last date asked:", todate, todate[0:4], todate[4:6], todate[6:8]
print "first date asked:", fromdate, fromdate[0:4], fromdate[4:6], fromdate[6:8]
quote = dict()
quote['s'] = ticker
quote['d'] = str(int(todate[4:6]) - 1)
quote['e'] = str(int(todate[6:8]))
quote['f'] = str(int(todate[0:4]))
quote['g'] = "d"
quote['a'] = str(int(fromdate[4:6]) - 1)
quote['b'] = str(int(fromdate[6:8]))
quote['c'] = str(int(fromdate[0:4]))
#print quote
params = urllib.urlencode(quote)
params += "&ignore=.csv"
url = "http://ichart.yahoo.com/table.csv?%s" % params
if options.verbose:
print "fetching:", url
try:
f = urllib.urlopen(url)
fp.write(f.read())
except:
import traceback
traceback.print_exc(file=sys.stderr)
sys.stderr.flush()
fp.close()
if options.verbose:
print url, "...fetched"
else:
sys.stdout.write(".")
sys.stdout.flush()
if __name__ == '__main__':
# today is
today = datetime.datetime.now().strftime("%Y%m%d")
# parse arguments
parser = OptionParser()
parser.add_option("-f", "--file", dest="tickerfile", action="store", default = "./tickers.txt",
help="read ticker list from file, it uses ./tickers.txt as default")
parser.add_option("-c", "--concurrent", type="int", dest="connections", default = 10, action="store",
help="# of concurrent connections")
parser.add_option("-d", "--dir", dest="downloadTo", action="store", default = "./rawdata/",
help="save date to this directory, it uses ./rawdata/ as default")
parser.add_option("-t", "--todate", dest="todate", default = today, action="store",
help="most recent date needed")
parser.add_option("-v", "--verbose",
action="store_true", dest="verbose")
parser.add_option("-q", "--quiet",
action="store_false", dest="verbose")
(options, args) = parser.parse_args()
tickerfile = options.tickerfile
downloadTo = options.downloadTo
connections = options.connections
today = options.todate
# get input list
try:
tickers = open(tickerfile).readlines()
except:
parser.error("ticker file %s not found" % (tickerfile,))
raise SystemExit
# build a queue with (ticker, fromdate, todate) tuples
queue = Queue.Queue()
for tickerRow in tickers:
#print tickerRow
tickerRow = tickerRow.strip()
if not tickerRow or tickerRow[0] == "#":
continue
tickerSplit = tickerRow.split()
# ticker, fromdate, todate
queue.put((tickerSplit[0], tickerSplit[1], today))
# Check args
assert queue.queue, "no Tickers given"
numTickers = len(queue.queue)
connections = min(connections, numTickers)
assert 1 <= connections <= 255, "too much concurrent connections asked"
if options.verbose:
print "----- Getting", numTickers, "Tickers using", connections, "simultaneous connections -----"
# start a bunch of threads, passing them the queue of jobs to do
threads = []
for dummy in range(connections):
t = WorkerThread(queue)
t.start()
threads.append(t)
# wait for all threads to finish
for thread in threads:
thread.join()
sys.stdout.write("\n")
sys.stdout.flush()
# tell something to the user before exiting
if options.verbose:
print "all threads are finished - goodbye."
|
It seems easy to automate web data download, and this little sample do the job using only a plain vanilla python distribution. The multithread approach (thanks to pycurl sample) is implemented in a simple manner, with the help of queue of jobs (one for each serie to get, all performing the same task for different tickers). As a minor feature, optparse is used to read line parameters to the script.
Yahoo URL. Is the URL in the script up to date? The script seems to work fine, but I'm not getting a hit on the URL. Thanks, -t
There was a recent change (2-3 months maybe) in Yahoo's URLs, which is probably easy to fix.
However, what if there are more than 256 stocks? What if I want to do the SP500 (500 stocks), or if even 20 concurrent connections are too much? I'm still learning threading but the answer may need the use of some kind of locks + global vars (not sure though)
How to add ticker symbol to tuples. Thanks! The script works great.
I was wondering how you can modify the script to add the corresponding ticker symbol to the beginning of each line in each downloaded CSV file. The stock analysis program I use require the ticker symbol to be in each line of data in order for the CSV file to import properly.
Greatly appreciate your help!
How to run this script.... please sugest
IndexError: list index out of range this is the error i get... plaease suggest
TypeError: coercing to Unicode: need string or buffer, NoneType found this is the eror i get
Hi,
I am trying to run this code but it gives me syntax error (invalid syntax). Could anyone help me how to run this code and output s&p data in csv ?
Thanks