This is a CSV parsing library written in portable R6RS Scheme. I wrote it because most of the Scheme CSV parsers I could find were sloppy or written on top of big, complicated general parsing libraries.
This library takes a rather strict view of CSV, modeled on a draft update to RFC 4180. In particular, a record without a CRLF or LF terminator is considered a syntax error; this requirement simplifies the grammar of CSV data dramatically. My guiding principle in designing this library is that any optional complexity should come from error handling (an important feature often ignored in hacker-designed parsers) rather than from trying to parse permissively.
All the usual warnings about works-in-progress apply. If you find any bugs or have suggestions, please let me know.
A tarball including the library and a test suite can be found on FTP.
(make-csv-parser port)
Returns a CSV parser object which will read CSV data from port, which must be a textual input port. Since the parser will mutate port, creating multiple parsers with the same input port is not recommended. The parser expects port to produce UTF-8-encoded text with CRLF or LF line endings.
(parse-csv parser handler
init ...)
The main parser driver.
parser must be a CSV parser object created with
make-csv-parser; it reads data from the port associated
with it when it was created.
For each compete record, it calls handler with a list
of that record's fields, in order, and a number of "seeds".
handler, in turn, returns one or more values to the
parser.
If the first value is false, then the parser halts and returns
the current seeds.
Otherwise, parsing continues, using the remaining values returned
by handler as new seeds.
When parser's input port retuns EOF, the
parse-csv
procedure returns the final seeds.
The initial seed values are given by the init arguments.
The number of seeds should not vary between calls to parser.
If an error occurs while parsing, then a condition of type
&parser (described below) is raised.
When things go wrong, a CSV parser raises a condition of type
&parser.
This is a subtype of &i/o-port (see the R6RS
Standard Libraries
document, section 8).
Most parser-error conditions include a message condition; the
error message can be retrieved with condition-message.
Some parser errors are recoverable. These are reported with a
condition belonging to the &parser subtype
&parser-recoverable.
(parser-error? obj)
Returns true if obj is a &parser
condition and false otherwise.
(recoverable-parser-error? obj)
Returns true if obj is a
&recoverable-parser
condition and false otherwise.
(condition-line-position condition)
Returns the input line at which the error occurred.
FIXME: Should this be replaced by the ordinal number of the record at which the error occurred? If not, make sure that embedded [CR]LFs are correctly counted.
(condition-char-position condition)
Returns the character (on the line designated by
(condition-line-position condition)
at which the error occurred.
(condition-input-char condition)
Returns the input token (as a Scheme character) that triggered the
error, or #f if there is no relevant token.
(condition-skip-record condition)
condition must satisfy
recoverable-parser-error?.
Returns a zero-argument procedure. When invoked, the parser will attempt to skip the rest of the current CSV record and resume parsing at the beginning of the next.
Gotcha: The record-skipping algorithm is currently quite simple: characters from the CSV port are discarded up to and including the next [CR]LF, regardless of the parser's pre-error context. In particular, if the error occurred while the parser was in a quoted field, no attempt is made to find a closing double-quote before considering [CR]LFs to be record terminators. In the case of quoted fields with embedded newlines, this could cause the parser to resume parsing within the field, leading to more (recoverable) errors down the line.
This behavior seems better than the alternative, which is to expect every quoted field to be properly terminated. When a field has already triggered an error, that expectation seems dubious. If there is no terminator, then the parser could end up discarding several valid records, or even the entire remaining CSV stream.
The following procedure returns a list of all CSV records parsed from file, in order:
(define (csv-file->list file)
(call-with-input-file
file
(lambda (port)
(let ((parser (make-csv-parser port)))
(reverse
(parse-csv parser
(lambda (r rs)
(values #t (cons r rs)))
0))))))
Assume the file input.csv contains:
a,b,c 1,2,3
Then:
(csv-file->list "input.csv")
⇒ (("a" "b" "c")
("1" "2" "3"))
Sloppy CSV files are common, and it may be preferable for parsing to continue through errors whenever this is possible. Here is a variant of the above example which prints a warning and continues whenever a recoverable parser error occurs:
(define (csv-file->list file)
(call-with-input-file
file
(lambda (port)
(let ((parser (make-csv-parser port)))
(guard (con
((recoverable-parser-error? con)
(display "WARNING: bad record at line ")
(display (condition-line-position con))
(newline)
((condition-skip-record) con)))
(reverse
(parse-csv parser
(lambda (r rs)
(values #t (cons r rs)))
0)))))))
None.
© 2025 by Wolfgang Corcoran-Mathe
Licensed under the EUPL version 1.2 or later.