Task: Administrative job to run in my case 2300 jobs in a scheduled manner Restriction: Don't start two jobs at same schedule on same server
Problems to solve that for: * align list of projects into batch of jobs with distinct servers * templated job creation * create a crontab * to start all this jobs from a starting schedule every hour * respect some restrictions that on some days and some hours no jobs should be started
Thanks to builtin map() and standard-library time, datetime and timedelta to make that an ease at the end!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | # -*- coding: utf-8 -*-
"""
Assembled by : Peter Arwanitis (spex66)
Task: Administrative job to run in my case 2300 jobs in a scheduled manner
Restriction: Don't start two jobs at same schedule on same server
Problems to solve that for:
* align list of projects into batch of jobs with distinct servers
* templated job creation
* create a crontab
** to start all this jobs from a starting schedule every hour
** respect some restrictions that on some days and some hours no jobs should be started
** thanks to standard-library time, datetime and timedelta to make that an ease at the end!
References:
* http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/410687
Transposing a List of Lists with Different Lengths without Loosing Elements (by Zoran Isailovski)
* thanks to Gerhard Kalab for his excellent pycron (cause it runs on windows too :))
http://www.kalab.com/freeware/pycron/pycron.htm
Targeted on windows and python23
"""
import os, time
from datetime import datetime, timedelta
report_ROOTDIR = r"/tmp/jobs"
crontabBuilderData = """
# list of jobnames with paramaters to run on a specific server
jobname1,server1,param1,param2,param3
jobname2,server1,param1,param2,param3
jobname3,server1,param1,param2,param3
jobname4,server2,param1,param2,param3
jobname5,server2,param1,param2,param3
jobname6,server3,param1,param2,param3
jobname7,server4,param1,param2,param3
jobname8,server4,param1,param2,param3
"""
def groupProjectsByUniqueServer():
listOfJobs = [l.strip().split(',')
for l in crontabBuilderData.split('\n') # or read this from file
if not l.startswith('#') # skip comments
if l.strip() # skip empty lines
]
print listOfJobs
# group by servername
dictByServer = {}
_skip = [dictByServer.setdefault(i[1], []).append(i) for i in listOfJobs]
# align them that way, that each server comes up only one time in a batch of job
# background: to minimize server load on then same time
# clean up the None's out of the lists got from map(None, list1, list2, list3) of different length
# most elegant solution?-) kudos to Recipe/410687
# for some more insights how to handle such mappings, I never fiddled out that *row part!
alignedJobs = map(lambda *row: [elem for elem in row if elem is not None],*dictByServer.values())
return alignedJobs
def reportcrontab(alignedprojects, year, month, day=1, hour=0, excludedays=[], excludehours=[]):
# start from that date at default 0 o'clock
# remember start for some statistics
startschedule = schedule = datetime(year, month, day, hour)
# prepare that folders for job creation
report_logfile = os.path.join(report_ROOTDIR, 'logs', "%(scheduled_time)s_%(jobname)s.log")
report_cmddir = os.path.join(report_ROOTDIR, 'cmds')
report_crontabname = os.path.join(report_ROOTDIR, "crontab_projects.txt")
report_crontabheader = """
# THIS CONFIGURATION IS AUTOMATICALLY GENERATED!!!
# this version is from: %s
""" % (time.strftime('%Y-%m-%d/%H:%M:%S', time.localtime()))
# 0 13 15 6 * "test.cmd"
report_crontabtemplate = '''0 %(hour)s %(day)s %(month)s * "%(cmd)s"'''
report_crontab = []
# example template for commandfile generation
report_commands = [
"@echo SomeJobRunner --name %(jobname)s --server=%(server)s --p1 %(param1)s --p2 %(param2)s --p3 %(param3)s 1> " + report_logfile,
r"c:",
r"cd c:\python23",
"python SomeJobRunner.py --name %(jobname)s --server=%(server)s --p1 %(param1)s --p2 %(param2)s --p3 %(param3)s 1>> " + report_logfile + " 2>&1",
]
report_cmd_template = '\n'.join(report_commands)
project_count = 0
while alignedprojects:
if schedule.weekday() in excludedays:
continue # SKIP
if schedule.hour in excludehours:
continue # SKIP
actual = alignedprojects.pop(0) #reduce the stack, from TOP where the biggest projects are
for (jobname, server, param1, param2, param3) in actual:
project_count += 1
# build a characteristic prefix
scheduled_time = '%i%02i%02i_%02i' % (
schedule.year,
schedule.month,
schedule.day,
schedule.hour
)
jobcmddfile = os.path.join(report_cmddir, '%s_%s.cmd' % ( scheduled_time,
jobname))
# write complete batch out
# hint: the replace is only essential on windows, cause % is reserved!
# locals() to feed right away parameters wia keywords into command template
file(jobcmddfile,'w').write((report_cmd_template % locals()).replace('%', '%%'))
report_crontab.append( report_crontabtemplate % {
'hour' : schedule.hour,
'day' : schedule.day,
'month' : schedule.month,
'cmd' : jobcmddfile,
}
)
# jump to next time slice
schedule += timedelta(hours=1) # next hour, place your stepping here
file(report_crontabname, 'w').write('\n'.join([report_crontabheader]+report_crontab))
print '%i projects scheduled starting from %s, ends up %s' % (project_count, startschedule, schedule)
print 'no schedules at hours (%s) and days of week (%s) # Monday is 0 and Sunday is 6' % (excludehours, excludedays)
def buildReportCrontab():
# configure startdate here
year, month, day = (2007, 5, 26)
# configure excludes to skip over here
# Monday is 0 and Sunday is 6
excludedays = [6] # for example: keep free backup day
excludehours = [6,7] # for example: keep free administrative window
# build for every hour a slice of jobs running on different fileshares
reportcrontab(groupProjectsByUniqueServer(), year, month, day, excludedays=excludedays, excludehours=excludehours)
if __name__ == '__main__':
# let it run
buildReportCrontab()
""" example crontab, no fun to make that for 2300 jobs :)
# THIS CONFIGURATION IS AUTOMATICALLY GENERATED!!!
# this version is from: 2007-05-28/09:25:21
0 0 26 5 * "/tmp/jobs/cmds/20070526_00_jobname7.cmd"
0 0 26 5 * "/tmp/jobs/cmds/20070526_00_jobname1.cmd"
0 0 26 5 * "/tmp/jobs/cmds/20070526_00_jobname4.cmd"
0 0 26 5 * "/tmp/jobs/cmds/20070526_00_jobname6.cmd"
0 1 26 5 * "/tmp/jobs/cmds/20070526_01_jobname8.cmd"
0 1 26 5 * "/tmp/jobs/cmds/20070526_01_jobname2.cmd"
0 1 26 5 * "/tmp/jobs/cmds/20070526_01_jobname5.cmd"
0 2 26 5 * "/tmp/jobs/cmds/20070526_02_jobname3.cmd"
"""
|
Maybe this are two recipes, but I use them in conjunction for administrative purposes to handle a huge list of jobs. Cookbook and standard-library had all things available to assemble that, but to show that parts all together is maybe a quick starter for someone else on such a job. Hurray for one-stop-shopping :) cause it costs me a bit of time to compose and reduce it to be pythonic.
- How to align (group) a lot of jobs according to a given characteristic (server in that example)
- Templated job creation
- Use of Datetime to automatic creation of crontab timetable with restrictions to exclude days of week and/or given hours of day.
Thanks and references to: * http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/410687 Transposing a List of Lists with Different Lengths without Loosing Elements (by Zoran Isailovski) * thanks to Gerhard Kalab for his excellent pycron (cause it runs on windows too :)) http://www.kalab.com/freeware/pycron/pycron.htm
(=PA=)