ActiveState Code

Recipe 65433: Parsing comma separated values


How to parse lines containing comma-separated values.

Tcl
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
proc csv2list {str {sepChar ,}} {
    regsub -all {(\A\"|\"\Z)} $str \0 str
    set str [string map [list $sepChar\"\"\" $sepChar\0\" 		\"\"\"$sepChar \"\0$sepChar 		\"\" \" \" \0 ] $str]
    set end 0
    while {[regexp -indices -start $end {(\0)[^\0]*(\0)} 		$str -> start end]} {
	set start [lindex $start 0]
	set end   [lindex $end 0]
	set range [string range $str $start $end]
	set first [string first $sepChar $range]
	if {$first >= 0} {
	    set str [string replace $str $start $end 		[string map [list $sepChar \1] $range]]
        }
        incr end
    }
    set str [string map [list $sepChar \0 \1 $sepChar \0 {} ] $str] 
    return [split $str \0]
}


proc list2csv {list {sepChar ,}} {
    set out ""
    foreach l $list {
	set sep {}
	foreach val $l {
	    if {[string match "*\[\"$sepChar\]*" $val]} {
		append out $sep\"[string map [list \" \"\"] $val]\"
            } else {
		append out $sep$val
	    }
	    set sep $sepChar
	}
	append out \n
    }
    return $out
}

Discussion

A record of a csv file (comma-separated values, as exported e.g. by Excel) is a set of ascii values separated by "," (for other languages it may be ";" however, although this is not important for this case).

If a value contains itself the separator ",", then it (the value) is put between "".

If a value contains ", it is replaced by "".

The following record for example is parsed as follows:

<pre>123,"123,521.2","Mary says ""Hello, I am Mary"""</pre>- 123 - 123,521.2 - Mary says "Hello, I am Mary"

Comments

  1. 1. At 4:17 p.m. on 2 aug 2001, Gerald Lester said:

    CSV Routines are part of TclLib as of 1.0. Starting with version 1.0 the TclLib (http://tcllib.sf.net) contains routines to handle conversions to and from CSV, including reading and writing to channels.

Sign in to comment