Program will compare two CSV files using a unique ID field and save any changes to ID as well as two secondary fields (qty & price). The code was written to pick out updates from supplier inventory files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | #!/usr/bin/python
'''
CompareCsv.py - Python script that will compare two CSV files based upon a unique ID field and record changes in this field as well as two secondary fields (qty & price). The output is a CSV file.
'''
#Imports (do not change)
from __future__ import division
import sys,csv
# ---------------------------------------------
#Globals - CHANGE ME
# ----------------------------------------------
OldFile = 'test1.csv' # File Name For Old CSV File (Front Slashes / to separate paths)
NewFile = 'test2.csv' # File Name For New CSV File (Front Slashes / to separate paths)
OutFile = 'test-out.csv' # File To Save Results (Front Slashes / to separate paths)
CSVFilesHaveHeaderRow = True # True or False if input files include a header row
IdColNumber = 0 # Column that contains the unique Item ID (Start counting with zero)
QtyColNumber = 1 # Column that contains the qty in stock
PriceColNumber = 2 # Column that contains the prices
DefaultMarkUpPercent = 0 # Markup in percent - Needs to be whole number 0 = disable
# ---------------------------------------------
# Create CSV Objects
CSVOld = csv.reader(open(OldFile,'rb'))
CSVNew = csv.reader(open(NewFile,'rb'))
CSVOut = csv.writer(open(OutFile,'wb'))
CSVOut.writerow(['ID','Price','Qty','Orderable','Status'])
# -----------------------------------------
# Load Old Data Into Memory
OldData = {}
FirstLine = True
for line in CSVOld:
if CSVFilesHaveHeaderRow and FirstLine:
FirstLine = False
else:
OldData[line[IdColNumber]] = (line[QtyColNumber],line[PriceColNumber])
# -----------------------------------------
# Read Through New Data & Compare to Old
NewIDList = []
FirstLine = True
for line in CSVNew:
# Skip Header In CSV Files
if CSVFilesHaveHeaderRow and FirstLine:
FirstLine = False
else:
# Assign New Values to locals for readability
NewID = line[IdColNumber]
NewQty = line[QtyColNumber]
NewPrice = line[PriceColNumber]
# Existing Items - Compare Secondary Fields
if OldData.has_key(line[IdColNumber]):
# Assign Old Values locals for readability
OldID = OldData[line[IdColNumber]]
OldQty = OldID = OldData[line[IdColNumber]][0]
OldPrice = OldID = OldData[line[IdColNumber]][1]
# Save ID Field To Test for deleted
NewIDList.append(NewID)
# Reset Status
Status = ""
# Define Current Orderable Status
if NewQty == 0:
Orderable = "NO"
else:
Orderable = "YES"
# Check for Stock Status Change
if int(OldQty) <= 0 and int(NewQty) > 0:
Status += "In-Stock "
if int(OldQty) > 0 and int(NewQty) <= 0:
Status += "Out-Stock "
# Check for Price Change
if OldPrice != NewPrice:
Status += "Price "
if DefaultMarkUpPercent > 0:
SellPrice = str(float(NewPrice) * (1+(DefaultMarkUpPercent / 100)))
else:
SellPrice = NewPrice
else:
SellPrice = NewPrice
# Save to output only if the item has a changed status
if Status:
CSVOut.writerow([NewID,SellPrice,NewQty,Orderable,Status])
else:
# New Item ID
# Markup Price
if DefaultMarkUpPercent > 0:
SellPrice = str(float(NewPrice) * (1+(DefaultMarkUpPercent / 100)))
else:
SellPrice = NewPrice
# Save to Output
CSVOut.writerow([NewID,SellPrice,NewQty,'YES','NewItem'])
# Check For Deleted Items
for id in OldData.keys():
if id not in NewIDList:
CSVOut.writerow([id,'','','NO','Deleted'])
|
This is not a recipe - it is a very specific solution to an individual problem.
Using the csv.DictReader class would provide direct dictionary access to the fields and avoid the checking for the firstline for every line in your file.
Use the for/if/continue pattern rather than for/if/pass you intend to skip over the body of your function in some circumstances. This is flexibler, allows the iterator to move on the next item whilst completing igoring the body of the function and stops you having to worry about indentation levels in the body of the function.
An alternative would be to use the builtin SQLite module and convert the files to tables in a temporary and use SQL queries to compare the two datasets and return the results in a new CSV file (using csv.DictWriter)
In addition to the previous comments:
is a better way to iterate over the keys in a dictionary (and avoids actually creating a temporary list). Also:
is better than using has_key(). Also, using a set instead of a list for NewIDList improves performance.