X Xerobit

CSV Format Guide — Structure, Delimiters, and Common Parsing Issues

CSV (Comma-Separated Values) is a simple tabular text format. Here's the RFC 4180 standard, delimiter variations, quoting rules, and how to parse CSV correctly in code.

Mian Ali Khalid · · 5 min read
Use the tool
CSV to JSON Converter
Convert CSV files to JSON with proper quoting and escaping.
Open CSV to JSON Converter →

CSV (Comma-Separated Values) is a plain text format for tabular data. Despite its simplicity, CSV has enough edge cases (quoted fields, embedded commas, different line endings, BOM markers) that naive parsing breaks on real-world data. Understanding the standard prevents these issues.

Use the CSV to JSON Converter to convert CSV data to JSON format.

RFC 4180: The CSV standard

RFC 4180 defines the most common CSV format:

  1. Records are separated by CRLF (\r\n)
  2. The last record may or may not have a trailing CRLF
  3. An optional header row is the first record
  4. Each record has the same number of fields
  5. Fields may be enclosed in double quotes "
  6. Fields containing commas, double quotes, or newlines must be quoted
  7. Double quotes within quoted fields are escaped by doubling them ("")
"Name","Age","City"
"Alice","30","New York"
"Bob, Jr.","25","Los Angeles"
"Charlie ""Chuck""","45","Chicago"

Basic structure

Simple CSV

id,name,email,active
1,Alice,alice@example.com,true
2,Bob,bob@example.com,false
3,Charlie,charlie@example.com,true

With quoted fields

id,name,bio
1,Alice,"Software engineer, loves hiking"
2,Bob,"Wrote the book ""Clean Code"""
3,Charlie,"Lives in
New York"

Key points:

  • Field with comma: "Software engineer, loves hiking"
  • Field with quote: "Wrote the book ""Clean Code""" (doubled double-quotes)
  • Field with newline: allowed when quoted

Delimiter variations

CSV is ambiguous — “comma-separated” doesn’t always mean commas:

FormatDelimiterCommon use
CSV,General purpose, most common
TSV\t (tab)Spreadsheet exports, safer with text containing commas
SSV; (semicolon)European locales (comma is decimal separator)
PSV`` (pipe)

Always check the delimiter when importing CSVs from different sources. European Excel exports use semicolons. Many log tools use pipes.

Header row

The header row contains column names:

first_name,last_name,email,created_at
Alice,Smith,alice@example.com,2024-01-15
Bob,Jones,bob@example.com,2024-01-16

Without a header, each row is just an array of values. With a header, rows become objects keyed by column name.

Encoding considerations

UTF-8 BOM (Byte Order Mark)

Excel exports CSV with a UTF-8 BOM (EF BB BF at the start of the file). This causes the first column name to include the BOM character:

# Without BOM:
name,age

# With BOM (Excel):
name,age  ← BOM character prepended to "name"

When parsing, strip the BOM:

// Remove BOM if present:
const csv = content.startsWith('') ? content.slice(1) : content;

Line endings

CSV files can use \n (Unix), \r\n (Windows), or \r (old Mac). Normalize before parsing:

const normalized = csv.replace(/\r\n/g, '\n').replace(/\r/g, '\n');

Parsing CSV in code

JavaScript (Papa Parse)

import Papa from 'papaparse';

const csv = `name,age,city
Alice,30,New York
Bob,25,Los Angeles`;

// Parse string to array of objects:
const result = Papa.parse(csv, {
  header: true,          // Use first row as keys
  dynamicTyping: true,   // Convert "30" to 30 (numbers), "true" to true
  skipEmptyLines: true,  // Skip empty rows
});

console.log(result.data);
// [
//   { name: 'Alice', age: 30, city: 'New York' },
//   { name: 'Bob', age: 25, city: 'Los Angeles' }
// ]

// Parse with different delimiter:
Papa.parse(tsvString, { delimiter: '\t', header: true });

// Parse a file (browser):
Papa.parse(fileInput.files[0], {
  header: true,
  complete: (results) => console.log(results.data),
  error: (error) => console.error(error),
});

// Stream large files:
Papa.parse(largeFile, {
  header: true,
  step: (row) => {
    processRow(row.data);  // Process one row at a time
  },
  complete: () => console.log('Done'),
});

Node.js (csv-parse)

import { parse } from 'csv-parse/sync';

const input = `name,age,active
Alice,30,true
Bob,25,false`;

const records = parse(input, {
  columns: true,       // Use first row as column names
  cast: true,          // Type casting (numbers, booleans)
  skip_empty_lines: true,
  trim: true,          // Trim whitespace from values
});

console.log(records);
// [{ name: 'Alice', age: 30, active: true }, ...]

// Async streaming for large files:
import { createReadStream } from 'fs';
import { parse } from 'csv-parse';

createReadStream('large-file.csv')
  .pipe(parse({ columns: true, cast: true }))
  .on('data', (row) => processRow(row))
  .on('end', () => console.log('Done'));

Python (csv module)

import csv

csv_string = """name,age,city
Alice,30,New York
Bob,25,Los Angeles"""

# Parse string:
import io
reader = csv.DictReader(io.StringIO(csv_string))
for row in reader:
    print(row['name'], row['age'])
    # Alice 30
    # Bob 25

# Parse file:
with open('data.csv', 'r', encoding='utf-8-sig') as f:  # utf-8-sig handles BOM
    reader = csv.DictReader(f)
    rows = list(reader)

# Write CSV:
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=['name', 'age'])
    writer.writeheader()
    writer.writerows([{'name': 'Alice', 'age': 30}])

Python (pandas)

import pandas as pd

# Read CSV:
df = pd.read_csv('data.csv', encoding='utf-8-sig')  # Handles BOM
print(df.head())
print(df.dtypes)

# With options:
df = pd.read_csv('data.csv',
    delimiter=',',
    header=0,           # Row 0 is header
    dtype={'id': int, 'name': str},
    na_values=['', 'N/A', 'NULL'],
    parse_dates=['created_at'],
)

# Write CSV:
df.to_csv('output.csv', index=False, encoding='utf-8')

Common CSV issues

Missing quotes for fields with commas:

# WRONG:
name,address
Alice,123 Main St, Suite 5     ← "Suite 5" becomes a separate field

# CORRECT:
name,address
Alice,"123 Main St, Suite 5"

Inconsistent column count:

# Problem: row 2 has 4 fields but header has 3:
name,age,city
Alice,30,New York
Bob,25,Los Angeles,USA   ← Extra field

Most parsers either error or silently drop the extra field.

Unquoted newlines:

# WRONG:
description
"A description with
a newline in it"   ← Fine
A description without quotes
but with a newline ← Error: starts new record

Related posts

Related tool

CSV to JSON Converter

Convert CSV files to JSON with proper quoting and escaping.

Written by Mian Ali Khalid. Part of the Data & Format pillar.