Welcome, guest | Sign In | My Account | Store | Cart

Program will compare two CSV files using a unique ID field and save any changes to ID as well as two secondary fields (qty & price). The code was written to pick out updates from supplier inventory files.

Python, 109 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
#!/usr/bin/python
'''
CompareCsv.py - Python script that will compare two CSV files based upon a unique ID field and record changes in this field as well as two secondary fields (qty & price).  The output is a CSV file. 
'''
#Imports (do not change)
from __future__ import division
import sys,csv

# ---------------------------------------------
#Globals - CHANGE ME
# ----------------------------------------------
OldFile = 'test1.csv'			# File Name For Old CSV File (Front Slashes / to separate paths)
NewFile = 'test2.csv'			# File Name For New CSV File (Front Slashes / to separate paths)
OutFile =  'test-out.csv'		# File To Save Results (Front Slashes / to separate paths)
CSVFilesHaveHeaderRow = True	# True or False if input files include a header row
IdColNumber = 0					# Column that contains the unique Item ID (Start counting with zero)
QtyColNumber = 1				# Column that contains the qty in stock
PriceColNumber = 2				# Column that contains the prices
DefaultMarkUpPercent = 0 		# Markup in percent - Needs to be whole number 0 = disable

# ---------------------------------------------
# Create CSV Objects 
CSVOld = csv.reader(open(OldFile,'rb'))
CSVNew = csv.reader(open(NewFile,'rb'))
CSVOut = csv.writer(open(OutFile,'wb'))
CSVOut.writerow(['ID','Price','Qty','Orderable','Status'])

# -----------------------------------------
# Load Old Data Into Memory
OldData = {}
FirstLine = True
for line in CSVOld:
	if CSVFilesHaveHeaderRow and FirstLine:
		FirstLine = False
	else:
		OldData[line[IdColNumber]] =  (line[QtyColNumber],line[PriceColNumber])

# -----------------------------------------
# Read Through New Data & Compare to Old
NewIDList = []
FirstLine = True
for line in CSVNew:
		
	# Skip Header In CSV Files
	if CSVFilesHaveHeaderRow and FirstLine:
		FirstLine = False
	else:
	
		# Assign New Values to locals for readability
		NewID = line[IdColNumber]
		NewQty = line[QtyColNumber]
		NewPrice = line[PriceColNumber]
		
		# Existing Items - Compare Secondary Fields
		if OldData.has_key(line[IdColNumber]):
			
			# Assign Old Values locals for readability
			OldID = OldData[line[IdColNumber]]
			OldQty = OldID = OldData[line[IdColNumber]][0]
			OldPrice = OldID = OldData[line[IdColNumber]][1]
			
			# Save ID Field To Test for deleted
			NewIDList.append(NewID)
			
			# Reset Status
			Status = ""
			
			# Define Current Orderable Status
			if NewQty == 0:
				Orderable = "NO"
			else:
				Orderable = "YES"
				
			# Check for Stock Status Change
			if int(OldQty) <= 0  and int(NewQty) > 0:
				Status += "In-Stock "	
			if int(OldQty) > 0  and int(NewQty) <= 0:
				Status += "Out-Stock "	
			
			# Check for Price Change
			if OldPrice != NewPrice:
				Status += "Price "
				if DefaultMarkUpPercent > 0:
					SellPrice = str(float(NewPrice) * (1+(DefaultMarkUpPercent / 100)))
				else:
					SellPrice = NewPrice
			else:
				SellPrice = NewPrice
				
			# Save to output only if the item has a changed status
			if Status:
				CSVOut.writerow([NewID,SellPrice,NewQty,Orderable,Status])
		else:
			
			# New Item ID 
			
			# Markup Price
			if DefaultMarkUpPercent > 0:
				SellPrice = str(float(NewPrice) * (1+(DefaultMarkUpPercent / 100)))
			else:
				SellPrice = NewPrice
			
			# Save to Output
			CSVOut.writerow([NewID,SellPrice,NewQty,'YES','NewItem'])
			
# Check For Deleted Items 
for id in OldData.keys():
	if id not in NewIDList:
		CSVOut.writerow([id,'','','NO','Deleted'])

2 comments

Charlie Clark 14 years, 7 months ago  # | flag

This is not a recipe - it is a very specific solution to an individual problem.

  • Using the csv.DictReader class would provide direct dictionary access to the fields and avoid the checking for the firstline for every line in your file.

  • Use the for/if/continue pattern rather than for/if/pass you intend to skip over the body of your function in some circumstances. This is flexibler, allows the iterator to move on the next item whilst completing igoring the body of the function and stops you having to worry about indentation levels in the body of the function.

  • An alternative would be to use the builtin SQLite module and convert the files to tables in a temporary and use SQL queries to compare the two datasets and return the results in a new CSV file (using csv.DictWriter)

Gabriel Genellina 14 years, 7 months ago  # | flag

In addition to the previous comments:

for id in OldData:

is a better way to iterate over the keys in a dictionary (and avoids actually creating a temporary list). Also:

if line[IdColNumber] in OldData:

is better than using has_key(). Also, using a set instead of a list for NewIDList improves performance.