This function returns a dictionary containing the hit counts for each individual IP that has accessed your Apache web server.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | def CalculateApacheIpHits(logfile_pathname):
# make a dictionary to store Ip's and their hit counts and read the
# contents of the logfile line by line
IpHitListing = {}
Contents = open(logfile_pathname, "r").readlines()
# go through each line of the logfile
for line in Contents:
#split the string to isolate the ip
Ip = line.split(" ")[0]
# ensure length of the ip is proper: see discussion
if 6 < len(Ip) < 15:
# Increase by 1 if ip exists else hit count = 1
IpHitListing[Ip] = IpHitListing.get(Ip, 0) + 1
return IpHitListing
# example usage
HitsDictionary = CalculateApacheIpHits("/usr/local/nusphere/apache/logs/access_log")
print HitsDictionary["127.0.0.1"]
|
This function is quite useful for many things. For one, I often use it in my code to determine how many of my "hits" are actually originating from locations other than my local host. This function was also used to chart which IP's are most actively viewing pages that are served by a particular installation of Apache.
As for the method of "validating" the IP, it as is follows: 1) an IP address will never be longer than 15 digits (4 sets of triplets and 3 periods); 2) an IP address will never be shorter than 6 digits (4 sets of single digits and 3 periods). The whole purpose of this validation is not to enforce stigent validation (for that we could use a regular expression), but rather to avoid the possibility of putting blantently "garbage" data into the dictionary.
line.split ??? Ip = line.split(" ")[0]
AttributeError: 'string' object has no attribute 'split'
Older version of Python.. You must be using Python 1.52, or maybe 1.6, or something?
".split" (without importing 'string') is a feature introduced in Python 2.0+
To make that work in older versions you will need to change it to something like:
import string
Ip = string.split("", line)
Strop. In Python 2.0 and above the string module is used for backward compatibility. Instead, a builtin strop module is used.
And now, even strop is obsolete, with the string functions being built-in
If you use the string module nevertheless, there is no overhead.
Thanks for the code but small bug. the part where is tested for a good IP is not wholy correct:
ensure length of the ip is proper: see discussion
if 6 the part where is tested for a good IP is not wholy correct:
ensure length of the ip is proper: see discussion
if 6
again the small bug. # ensure length of the ip is proper: see discussion
if 6 < len(Ip) < 15:
should be:
if 6 < len(Ip) < 16: