Welcome, guest | Sign In | My Account | Store | Cart

This is a simple recipe - it reads a line in file, removes the line-ending and attempts to search throughout another file for the same line, anywhere in the file

In case a line is missing, the line number is printed to stdout

Python, 36 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import sys
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("file1", help="First file whose lines you want to check")
parser.add_argument("file2", help="Second file, in which you want to search for lines from first file")
args = parser.parse_args()

file1 = open(args.file1)
file2 = open(args.file2)

print "Comparing:"
print args.file1
print "and"
print args.file2
print ""
print "Attempting to find lines in *file1* that are missing in *file2*"
print ""
file1array = file1.readlines()
file2a = file2.readlines()
lengthfile1array = len(file1array)
j=0;
for file1item in file1array:
    j += 1
    sys.stdout.write("Checking line#: %d/" %(j))
    sys.stdout.write("%d   \r" %(lengthfile1array))
    i=0;
    for file2item in file2a:
        if file1item.rstrip() == file2item.rstrip():
            i += 1
            break
        else:
            i += 1
        
        if i == len(file2a):
            print "MISSING LINE FOUND at Line# " + str(j)

This recipe is useful if you have a large amount of line-by-line data e.g. telecom network CDRs

I wrote this in under an hour and this is NOT optimized - there may be lots of ways to improve this syntactically and performance-wise

Edit: removed extraneous print statements

4 comments

rebs.guarina 11 years, 10 months ago  # | flag

I had a similar thing built a couple of months ago, which used generators and regex. Sharing the code anyway...

https://github.com/rebx/crosscheck

sami jan (author) 11 years, 10 months ago  # | flag

thanks rebs, btw I am a bit bummed out activeState doesn't let me edit my recipes..

David Adler 11 years, 10 months ago  # | flag

it does scroll down on the right hand side column "edit this recipe" if logged in

sami jan (author) 11 years, 10 months ago  # | flag

thanks david