Spits out sorted, deduplicated lines.
1 2 3 4 5 6 7 8 9 10
# Unique lines case insensitive filename = r"h:\keywords.txt" li = list(file(filename)) # Note: listifying file() leaves \n at end of each list element st = "".join(li) # comment out next line to get case-sensitive version st = st.lower() se = set(st.split("\n")) result = "\n".join(sorted(se)) print result
Working with websites, I've often got a bunch of keywords from different sources that I want deduplicating. I drop words onto this Python script as a text file, one phrase to a line.
Deduplication is done using the excellent Python set() type. Well worth investigating.