CSV Format Guide — Structure, Delimiters, and Common Parsing Issues
CSV (Comma-Separated Values) is a simple tabular text format. Here's the RFC 4180 standard, delimiter variations, quoting rules, and how to parse CSV correctly in code.
CSV (Comma-Separated Values) is a plain text format for tabular data. Despite its simplicity, CSV has enough edge cases (quoted fields, embedded commas, different line endings, BOM markers) that naive parsing breaks on real-world data. Understanding the standard prevents these issues.
Use the CSV to JSON Converter to convert CSV data to JSON format.
RFC 4180: The CSV standard
RFC 4180 defines the most common CSV format:
- Records are separated by CRLF (
\r\n) - The last record may or may not have a trailing CRLF
- An optional header row is the first record
- Each record has the same number of fields
- Fields may be enclosed in double quotes
" - Fields containing commas, double quotes, or newlines must be quoted
- Double quotes within quoted fields are escaped by doubling them (
"")
"Name","Age","City"
"Alice","30","New York"
"Bob, Jr.","25","Los Angeles"
"Charlie ""Chuck""","45","Chicago"
Basic structure
Simple CSV
id,name,email,active
1,Alice,alice@example.com,true
2,Bob,bob@example.com,false
3,Charlie,charlie@example.com,true
With quoted fields
id,name,bio
1,Alice,"Software engineer, loves hiking"
2,Bob,"Wrote the book ""Clean Code"""
3,Charlie,"Lives in
New York"
Key points:
- Field with comma:
"Software engineer, loves hiking" - Field with quote:
"Wrote the book ""Clean Code"""(doubled double-quotes) - Field with newline: allowed when quoted
Delimiter variations
CSV is ambiguous — “comma-separated” doesn’t always mean commas:
| Format | Delimiter | Common use |
|---|---|---|
| CSV | , | General purpose, most common |
| TSV | \t (tab) | Spreadsheet exports, safer with text containing commas |
| SSV | ; (semicolon) | European locales (comma is decimal separator) |
| PSV | ` | ` (pipe) |
Always check the delimiter when importing CSVs from different sources. European Excel exports use semicolons. Many log tools use pipes.
Header row
The header row contains column names:
first_name,last_name,email,created_at
Alice,Smith,alice@example.com,2024-01-15
Bob,Jones,bob@example.com,2024-01-16
Without a header, each row is just an array of values. With a header, rows become objects keyed by column name.
Encoding considerations
UTF-8 BOM (Byte Order Mark)
Excel exports CSV with a UTF-8 BOM (EF BB BF at the start of the file). This causes the first column name to include the BOM character:
# Without BOM:
name,age
# With BOM (Excel):
name,age ← BOM character prepended to "name"
When parsing, strip the BOM:
// Remove BOM if present:
const csv = content.startsWith('') ? content.slice(1) : content;
Line endings
CSV files can use \n (Unix), \r\n (Windows), or \r (old Mac). Normalize before parsing:
const normalized = csv.replace(/\r\n/g, '\n').replace(/\r/g, '\n');
Parsing CSV in code
JavaScript (Papa Parse)
import Papa from 'papaparse';
const csv = `name,age,city
Alice,30,New York
Bob,25,Los Angeles`;
// Parse string to array of objects:
const result = Papa.parse(csv, {
header: true, // Use first row as keys
dynamicTyping: true, // Convert "30" to 30 (numbers), "true" to true
skipEmptyLines: true, // Skip empty rows
});
console.log(result.data);
// [
// { name: 'Alice', age: 30, city: 'New York' },
// { name: 'Bob', age: 25, city: 'Los Angeles' }
// ]
// Parse with different delimiter:
Papa.parse(tsvString, { delimiter: '\t', header: true });
// Parse a file (browser):
Papa.parse(fileInput.files[0], {
header: true,
complete: (results) => console.log(results.data),
error: (error) => console.error(error),
});
// Stream large files:
Papa.parse(largeFile, {
header: true,
step: (row) => {
processRow(row.data); // Process one row at a time
},
complete: () => console.log('Done'),
});
Node.js (csv-parse)
import { parse } from 'csv-parse/sync';
const input = `name,age,active
Alice,30,true
Bob,25,false`;
const records = parse(input, {
columns: true, // Use first row as column names
cast: true, // Type casting (numbers, booleans)
skip_empty_lines: true,
trim: true, // Trim whitespace from values
});
console.log(records);
// [{ name: 'Alice', age: 30, active: true }, ...]
// Async streaming for large files:
import { createReadStream } from 'fs';
import { parse } from 'csv-parse';
createReadStream('large-file.csv')
.pipe(parse({ columns: true, cast: true }))
.on('data', (row) => processRow(row))
.on('end', () => console.log('Done'));
Python (csv module)
import csv
csv_string = """name,age,city
Alice,30,New York
Bob,25,Los Angeles"""
# Parse string:
import io
reader = csv.DictReader(io.StringIO(csv_string))
for row in reader:
print(row['name'], row['age'])
# Alice 30
# Bob 25
# Parse file:
with open('data.csv', 'r', encoding='utf-8-sig') as f: # utf-8-sig handles BOM
reader = csv.DictReader(f)
rows = list(reader)
# Write CSV:
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=['name', 'age'])
writer.writeheader()
writer.writerows([{'name': 'Alice', 'age': 30}])
Python (pandas)
import pandas as pd
# Read CSV:
df = pd.read_csv('data.csv', encoding='utf-8-sig') # Handles BOM
print(df.head())
print(df.dtypes)
# With options:
df = pd.read_csv('data.csv',
delimiter=',',
header=0, # Row 0 is header
dtype={'id': int, 'name': str},
na_values=['', 'N/A', 'NULL'],
parse_dates=['created_at'],
)
# Write CSV:
df.to_csv('output.csv', index=False, encoding='utf-8')
Common CSV issues
Missing quotes for fields with commas:
# WRONG:
name,address
Alice,123 Main St, Suite 5 ← "Suite 5" becomes a separate field
# CORRECT:
name,address
Alice,"123 Main St, Suite 5"
Inconsistent column count:
# Problem: row 2 has 4 fields but header has 3:
name,age,city
Alice,30,New York
Bob,25,Los Angeles,USA ← Extra field
Most parsers either error or silently drop the extra field.
Unquoted newlines:
# WRONG:
description
"A description with
a newline in it" ← Fine
A description without quotes
but with a newline ← Error: starts new record
Related tools
- CSV to JSON Converter — convert CSV data to JSON
- CSV to JSON Guide — conversion walkthrough
- CSV Quoting Rules — escaping special characters
Related posts
- CSV Quoting and Escaping Rules (the Real Ones, Not the Folklore) — CSV looks trivial until your spreadsheet has a comma in a name field. Here's the…
- CSV Data Validation — Schema Validation, Type Checking, and Error Reporting — Validate CSV files before importing them into a database or processing pipeline.…
- Import CSV to Database — PostgreSQL, MySQL, SQLite, and Node.js — Import CSV files into PostgreSQL, MySQL, and SQLite using COPY commands, LOAD DA…
- CSV to JSON Converter — Transform Spreadsheet Data to JSON — CSV to JSON conversion turns rows and columns into an array of objects, using th…
- YAML to JSON Converter — Convert YAML Configuration to JSON — YAML to JSON conversion is lossless for most data types. Here's how the conversi…
Related tool
Convert CSV files to JSON with proper quoting and escaping.
Written by Mian Ali Khalid. Part of the Data & Format pillar.