Command documentation sourced from the linux-command project This comprehensive command reference is part of the linux-command documentation project.
csplit - Split a File into Sections Determined by Context Lines
The csplit command splits files into sections based on context patterns, unlike split which divides files by size or line count. It's particularly useful for processing log files, configuration files, and documents with clear section delimiters. csplit uses regular expressions to identify split points, making it ideal for context-aware file splitting tasks. It creates output files with customizable naming patterns and supports various splitting modes including line numbers, patterns, and repeat operations.
Basic Syntax
csplit [OPTION]... FILE PATTERN...
Common Options
Output Control
-f, --prefix=PREFIX- Use PREFIX instead of 'xx' for created filenames-b, --suffix-format=FORMAT- Use sprintf FORMAT for suffixes (default: %02d)-n, --digits=DIGITS- Use DIGITS digits in suffix names
File Handling
-k, --keep-files- Do not remove output files on errors-s, --quiet, --silent- Do not print counts of output file sizes-z, --elide-empty-files- Remove empty output files
Help Options
--help- Display help message and exit--version- Output version information and exit
Pattern Specifications
Line Number Patterns
N # Split at line N
{N} # Repeat N times
* # Repeat until end of file
Regular Expression Patterns
/REGEXP/ # Split at line matching REGEXP
%REGEXP% # Split but exclude matching line
/REGEXP/+N # Split N lines after matching line
/REGEXP/-N # Split N lines before matching line
Repeat Patterns
{N} # Repeat previous pattern N times
{*} # Repeat previous pattern until EOF
Usage Examples
Basic Splitting
Creating Simple Splits
# Split at specific line numbers
csplit file.txt 5 10
# Split at pattern matches
csplit logfile.txt '/ERROR/' '/WARN/'
# Split with custom prefix
csplit -f section document.txt '/Chapter /' '{*}'
# Split with numbered output
csplit -b '%03d.txt' -f chapter book.txt '/Chapter /' '{*}'
Line Number Operations
# Split every 100 lines
csplit largefile.txt 100 200 300 400
# Split first 50 lines, then every 100 lines
csplit data.txt 50 '/^---/' '{*}'
# Create splits at specific intervals
csplit document.txt 10 20 30 40 50
Advanced Pattern Matching
Regular Expression Patterns
# Split but exclude pattern lines
csplit config.txt '%# Configuration%' '{*}'
# Split with offset lines
csplit access.log '/ERROR/+5'
# Split before pattern
csplit data.txt '/Header/-1'
# Complex pattern with repeats
csplit log.txt '/session_start/' '/session_end/' '{*}'
Pattern-Based Splitting
# Split by date patterns in logs
csplit web.log '/\[2023-01-01/' '/\[2023-01-02/' '{*}'
# Split by function definitions
csplit code.py '/^def /' '{*}'
# Split by XML/HTML tags
csplit html.txt '/<div class="content">/' '{*}'
# Split by email headers
csplit mbox.txt '/^From:/' '/^Subject:/' '{*}'
Context-Aware Splitting
# Split including context lines
csplit debug.log '/ERROR/-2' '/ERROR/+2'
# Split but keep delimiter in next section
csplit data.txt '%SECTION/'
# Split with multiple conditions
csplit mixed.log '/CRITICAL/+1' '/WARNING/-1' '{*}'
Practical Examples
Log File Processing
Web Log Analysis
# Split Apache access log by days
csplit access_log.txt '/\[24\/Jan\/2023:/' '{*}'
# Split Nginx log by HTTP status codes
csplit nginx.log '/ 5[0-9][0-9] /' '/ 4[0-9][0-9] /' '{*}'
# Extract error sections only
csplit -f 'error_' -z server.log '/ERROR/' '/FATAL/' '{*}'
# Split by session boundaries
csplit access.log '/Session ID:/' '{*}'
Application Log Processing
# Split application log by session IDs
csplit app.log '/Session ID: [A-Z0-9]{10}/' '{*}'
# Split database log by transaction
csplit mysql.log '/START TRANSACTION/' '/COMMIT/' '{*}'
# Extract performance metrics
csplit performance.log '/METRICS:/' '/END METRICS/' '{*}'
# Separate different log levels
csplit app.log '/FATAL ERROR/' '/ERROR/' '/WARNING/' '/INFO/' '{*}'
System Log Management
# Split system log by boot sessions
csplit /var/log/syslog '/systemd\[1\]: Started/' '{*}'
# Split kernel messages
csplit dmesg.log '/kernel: /' '{*}'
# Extract authentication events
csplit auth.log '/pam_unix(sshd:auth): authentication failure/' '{*}'
# Split mail log by users
csplit /var/log/maillog '/from=/' '{*}'
Configuration File Management
Configuration Extraction
# Split Apache config by virtual hosts
csplit httpd.conf '/<VirtualHost/' '{*}'
# Split INI file by sections
csplit config.ini '/\[.*\]/' '{*}'
# Extract database configurations
csplit my.cnf '/\[mysqld\]' '/\[client\]/'
# Split Nginx config by server blocks
csplit nginx.conf '/server {/' '{*}'
Configuration Analysis
# Extract SSL configurations
csplit ssl.conf '/ssl_certificate/' '/ssl_certificate_key/' '{*}'
# Split YAML documents
csplit data.yml '/^---/' '{*}'
# Extract specific service configurations
csplit systemd.conf '/\[Service\]/' '{*}'
# Split by configuration sections
csplit config.txt '/\[database\]/' '/\[cache\]/' '/\[security\]/'
Document Processing
Document Structuring
# Split Markdown by headers
csplit README.md '/^## /' '{*}'
# Split book by chapters
csplit novel.txt '/Chapter [0-9]*/' '{*}'
# Split documentation by sections
csplit manual.txt '/Section [A-Z]/' '{*}'
# Extract table of contents
csplit document.txt '/^Table of Contents$/' '/^Chapter 1$/'
Code and Script Processing
# Split code file by functions
csplit script.py '/^def /' '{*}'
# Split Java file by classes
csplit code.java '/^public class /' '{*}'
# Extract function documentation
csplit code.py '/^def /' '/^class /' '{*}'
# Split CSS by rule blocks
csplit styles.css '/^[.#]/' '{*}'
Email and Text Processing
# Extract email sections
csplit email.txt '/^From:/' '/^Subject:/' '/^Date:/' '{*}'
# Split mbox file by individual emails
csplit mailbox.txt '/^From /' '{*}'
# Extract email headers
csplit email.txt '/^$/' '{1}'
# Separate email body parts
csplit email.txt '/--boundary/' '{*}'
Advanced Usage
Complex Splitting Patterns
Multi-Condition Splitting
# Split with multiple conditions
csplit mixed.log '/ERROR/+1' '/WARNING/-1' '/INFO/'
# Split using lookahead
csplit data.txt '/pattern/{2}'
# Split with custom output format
csplit -f 'part_' -b '%04d.txt' large_file.txt '/SECTION/' '{*}'
# Split and keep empty sections
csplit -k sparse.txt '/EOF/'
# Split but remove empty files
csplit -z file.txt '/empty_section/'
Advanced Pattern Techniques
# Split by multiple delimiters
csplit data.txt '/---/' '/===/' '/***/' '{*}'
# Split with complex regex
csplit log.txt '/ERROR.*\[.*\].*Exception/' '{*}'
# Split by indentation levels
csplit python_code.txt '/^ def /' '{*}'
# Split by XML elements
csplit xml_file.xml '/<item[^>]*>/' '{*}'
Context and Offset Operations
# Split with positive offset (after match)
csplit debug.log '/ERROR/+3' '{*}'
# Split with negative offset (before match)
csplit debug.log '/ERROR/-2' '{*}'
# Split with context preservation
csplit text.txt '/CHAPTER.*[0-9]*/-1' '/CHAPTER.*[0-9]*/+1' '{*}'
# Complex offset combinations
csplit data.txt '/HEADER/-1' '/BODY/' '/FOOTER/+1'
Script Integration
Processing Pipelines
# Process each section separately
csplit data.txt '/SECTION/' '{*}'
for part in xx*; do
process_section "$part"
done
# Split and analyze separately
csplit -f 'segment_' -b '%02d.log' server.log '/SERVER/' '{*}'
for segment in segment_*; do
analyze_log_segment "$segment"
done
Automated Workflows
# Split and compress results
csplit -f temp_ large_file.txt '/PATTERN/' '{*}'
for temp_file in temp_*; do
gzip "$temp_file"
done
# Split and process with error handling
csplit -k critical_data.txt '/SECTION/' '{*}' || {
echo "Splitting failed, cleaning up..."
rm -f xx*
exit 1
}
Batch Processing
# Split multiple files with same pattern
for file in *.log; do
csplit -f "${file%.*}_" "$file" '/ERROR/' '{*}'
done
# Create structured output directory
mkdir -p output/sections
csplit -f 'output/sections/section_' data.txt '/CHAPTER/' '{*}'
Quality Control
Verification and Validation
# Verify split integrity
wc -l original.txt
awk '{sum+=$1} END {print "Total lines:", sum}' xx*
# Check for empty sections
csplit -k -z file.txt '/pattern/'
for part in xx*; do
[ -s "$part" ] || echo "Empty section: $part"
done
# Validate split content
grep -l "expected_content" xx*
Testing and Debugging
# Test patterns before actual split
grep -n "/pattern/" file.txt
# Dry run simulation
csplit --debug file.txt '/pattern/' '{1}'
# Check split points
awk '/pattern/ {print NR ": " $0}' file.txt
Real-World Applications
System Administration
Log Rotation and Analysis
# Split system logs by boot sessions
csplit /var/log/syslog '/systemd\[1\]: Started/' '{*}'
# Split Apache access log by hours
csplit access_log.txt '/\[01:/' '/\[02:/' '/\[03:/' '{*}'
# Extract configuration sections
csplit /etc/ssh/sshd_config '/^#/' '/^[A-Z]/' '{*}'
# Monitor disk usage by splitting df output
csplit df_output.txt '/Filesystem/' '{*}'
Backup and Recovery
# Split large backup files
csplit backup.tar '/^file_[0-9]+/' '{*}'
# Extract specific data ranges
csplit data_export.txt '/2023-01-01/' '/2023-02-01/'
# Split configuration for migration
csplit server_config '/\[database\]/' '/\[web\]/' '/\[cache\]/'
Database Management
SQL Processing
# Split SQL dump by tables
csplit backup.sql '/CREATE TABLE/' '{*}'
# Extract stored procedures
csplit database.sql '/CREATE PROCEDURE/' '{*}'
# Split by transaction batches
csplit transaction.log '/COMMIT/' '{*}'
# Separate INSERT statements
csplit data_dump.sql '/INSERT INTO/' '{*}'
Database Configuration
# Split MySQL configuration by sections
csplit my.cnf '/\[mysqld\]/' '/\[client\]/' '/\[mysql\]/'
# Extract database users
csplit mysql_users.txt '/User:/' '/Host:/' '{*}'
# Split slow query log by query types
csplit slow_query.log '/SELECT/' '/INSERT/' '/UPDATE/' '/DELETE/' '{*}'
Development Workflow
Build and Compilation
# Split test results by test suites
csplit test_output.txt '/TEST SUITE:/' '{*}'
# Split build logs by targets
csplit make.log '/Making target/' '{*}'
# Separate error and warning sections
csplit compile.log '/error:/' '/warning:/' '{*}'
# Extract dependency information
csplit makefile '/^include/' '/^require/' '{*}'
Code Analysis
# Extract function documentation
csplit code.py '/^def /' '/^class /' '{*}'
# Split git diff by files
csplit git_diff.txt '/^diff --git/' '{*}'
# Analyze code coverage reports
csplit coverage.txt '/File:/' '/Line:/' '{*}'
# Split JUnit test results
csplit junit.xml '/<testcase/' '{*}'
Documentation Processing
# Split API documentation by endpoints
csplit api_docs.txt '/GET\|POST\|PUT\|DELETE/' '{*}'
# Extract changelog sections
csplit CHANGELOG.md '/## \[.*\]/' '{*}'
# Split technical specifications
csplit spec.txt '/Requirement [0-9]*/' '{*}'
# Process multi-language documentation
csplit docs.txt '/## English/' '/## 中文/' '/## 日本語/'
Data Processing and Migration
Data Format Conversion
# Split CSV by record types
csplit data.csv '/TYPE=A/' '/TYPE=B/' '/TYPE=C/'
# Process mixed format files
csplit mixed.txt '/^\[XML\]/' '/^\[JSON\]/' '/^\[TXT\]/'
# Extract configuration sections
csplit legacy.conf '/\[BEGIN\]/' '/\[END\]/' '{*}'
# Process backup archives
csplit backup_manifest.txt '/File:/' '{*}'
ETL Operations
# Split data extraction logs
csplit etl_log.txt '/Extract:' '/Transform:' '/Load:' '{*}'
# Process migration scripts
csplit migration.sql '-- Migration [0-9]*/' '{*}'
# Split data validation reports
csplit validation.txt '/Valid Records/' '/Invalid Records/' '{*}'
Special Use Cases
Email Processing
Mail Management
# Split mbox file by individual emails
csplit mailbox.txt '/^From /' '{*}'
# Extract email headers
csplit email.txt '/^$/' '{1}'
# Separate email body parts
csplit email.txt '/--boundary/' '{*}'
# Split digest emails
csplit digest.txt '/Digest #/' '{*}'
Email Analysis
# Extract attachments from email
csplit email_with_attachments '/--multipart/' '{*}'
# Split by email threads
csplit thread.txt '/Thread: [0-9]*/' '{*}'
# Analyze email patterns
csplit mail_log.txt '/spam/' '/ham/' '/virus/' '{*}'
Web Content Processing
HTML/XML Processing
# Split HTML by sections
csplit webpage.html '/<section[^>]*>/' '{*}'
# Extract XML elements
csplit data.xml '/<item[^>]*>/' '/<\/item>/' '{*}'
# Process RSS feeds
csplit rss.xml '/<item>/' '{*}'
# Split CSS by media queries
csplit styles.css '/@media/' '{*}'
Log Analysis
# Split access logs by IP addresses
csplit access.log '/^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/' '{*}'
# Extract user sessions
csplit user_tracking.log '/User ID: [0-9]+/' '{*}'
# Split by HTTP methods
csplit api.log '/GET\|POST\|PUT\|DELETE/' '{*}'
Performance Considerations
Large File Processing
Memory Efficiency
# Process in memory-efficient way
csplit -s huge_log.txt '/pattern/' '{*}'
# Use temporary directory for large splits
TMPDIR=/fast/temp csplit massive_file.txt '/SECTION/' '{*}'
# Split and compress in pipeline
csplit big_file.txt '/SECTION/' '{*}'
for part in xx*; do gzip "$part"; done
Performance Optimization
# Quiet mode for better performance
csplit -s file.txt '/pattern/' '{*}'
# Avoid creating unnecessary files
csplit -z file.txt '/empty_section/'
# Process large files in chunks
csplit huge_file.txt 10000 20000 30000 '{*}'
Resource Management
Disk Space Optimization
# Split and immediately compress
csplit data.txt '/SECTION/' '{*}' && gzip xx*
# Split with limited output file size
split -b 1M large_file.txt chunk_ && csplit chunk_aa '/pattern/' '{*}'
# Clean up after processing
csplit file.txt '/pattern/' '{*}'; process_files; rm -f xx*
Processing Optimization
# Parallel processing of splits
csplit -f 'part_' data.txt '/SECTION/' '{*}'
for part in part_*; do
process_part "$part" &
done
wait
# Use appropriate buffer sizes
csplit --buffer-size=1M large_file.txt '/pattern/' '{*}'
Automation Scripts
Log Analysis Script
#!/bin/bash
# Split and analyze logs by error types
LOG_FILE="$1"
OUTPUT_DIR="log_analysis"
if [ -z "$LOG_FILE" ]; then
echo "Usage: $0 <log_file>"
exit 1
fi
mkdir -p "$OUTPUT_DIR"
# Split by error severity
csplit -f "$OUTPUT_DIR/error_" -b '%02d.log' "$LOG_FILE" \
'/CRITICAL/' '/ERROR/' '/WARNING/' '/INFO/' '{*}'
# Analyze each section
echo "=== Log Analysis Results ==="
for error_file in "$OUTPUT_DIR"/*.log; do
if [ -f "$error_file" ]; then
echo "--- $(basename "$error_file") ---"
wc -l "$error_file"
grep -c "ERROR" "$error_file" 2>/dev/null || echo "No errors"
echo ""
fi
done
# Generate summary
echo "=== Summary ==="
echo "Total sections created: $(ls -1 "$OUTPUT_DIR"/*.log 2>/dev/null | wc -l)"
echo "Total lines processed: $(wc -l < "$LOG_FILE")"
Configuration Extractor
#!/bin/bash
# Extract configuration sections
CONFIG_FILE="$1"
SECTION_PATTERN="$2"
OUTPUT_PREFIX="config_section"
if [ -z "$CONFIG_FILE" ]; then
echo "Usage: $0 <config_file> [section_pattern]"
exit 1
fi
if [ -z "$SECTION_PATTERN" ]; then
SECTION_PATTERN="/\[.*\]/"
fi
# Create backup
cp "$CONFIG_FILE" "${CONFIG_FILE}.backup"
# Extract sections
csplit -f "$OUTPUT_PREFIX" -b '%02d.conf' "$CONFIG_FILE" "$SECTION_PATTERN" '{*}'
echo "Sections extracted:"
ls -la ${OUTPUT_PREFIX}*.conf
# Show section previews
echo -e "\n=== Section Previews ==="
for section in ${OUTPUT_PREFIX}*.conf; do
echo "--- $section ---"
head -5 "$section"
echo ""
done
Document Processor
#!/bin/bash
# Process large documents by chapters
DOCUMENT="$1"
CHAPTER_PATTERN="$2"
OUTPUT_DIR="chapters"
if [ -z "$DOCUMENT" ] || [ -z "$CHAPTER_PATTERN" ]; then
echo "Usage: $0 <document> <chapter_pattern>"
echo "Example: $0 book.txt '/Chapter [0-9]*/'"
exit 1
fi
mkdir -p "$OUTPUT_DIR"
# Split document by chapters
csplit -f "$OUTPUT_DIR/chapter_" -b '%02d.txt' "$DOCUMENT" "$CHAPTER_PATTERN" '{*}'
# Process each chapter
echo "Processing chapters..."
for chapter in "$OUTPUT_DIR"/chapter_*.txt; do
chapter_num=$(basename "$chapter" .txt | cut -d'_' -f2)
# Add chapter header
{
echo "# Chapter $chapter_num"
echo "Generated from: $DOCUMENT"
echo "Created: $(date)"
echo ""
cat "$chapter"
} > "${chapter}.processed"
mv "${chapter}.processed" "$chapter"
echo "Processed Chapter $chapter_num ($(wc -l < "$chapter") lines)"
done
echo "Document processing complete. Chapters saved in: $OUTPUT_DIR"
Troubleshooting
Common Issues
Pattern Not Found
# Pattern not found - check regex
csplit file.txt '/nonexistent/' # May create empty files
# Better - use optional pattern with empty file removal
csplit -z file.txt '/nonexistent/'
# Test pattern first
grep -n "nonexistent" file.txt || echo "Pattern not found"
# Use case-insensitive matching if needed
csplit file.txt '/[Nn]onexistent/'
Too Many or Too Few Splits
# Too many splits - limit repeats
csplit file.txt '/pattern/' '{5}' # Only 5 splits
# Not enough splits - use different pattern
csplit file.txt '/pattern1/' '/pattern2/' '/pattern3/'
# Split until end of file
csplit file.txt '/pattern/' '{*}'
Memory and Performance Issues
# Memory issues - use quiet mode
csplit -s large_file.txt '/pattern/' '{*}'
# Very large files - process in chunks
split -l 10000 huge_file.txt chunk_
for chunk in chunk_*; do
csplit "$chunk" '/pattern/' '{*}'
done
# Use temporary directory with more space
TMPDIR=/large/temp csplit massive_file.txt '/pattern/' '{*}'
Pattern Matching Issues
Regular Expression Problems
# Special characters need escaping
csplit config.txt '/\[section\]/' '{*}'
# Use exact matching
csplit file.txt '/^EXACT_MATCH$/'
# Case-sensitive matching
csplit file.txt '/[Pp]attern/'
# Complex patterns with character classes
csplit log.txt '/ERROR.*[0-9]{4}/' '{*}'
# Patterns with quantifiers
csplit data.txt '/line.{3,5}/' '{*}'
File Encoding Issues
# Handle different encodings
iconv -f latin1 -t utf8 file.txt > file_utf8.txt
csplit file_utf8.txt '/pattern/' '{*}'
# Handle Windows line endings
dos2unix file.txt
csplit file.txt '/pattern/' '{*}'
Related Commands
split- Split a file into fixed-size piecesawk- Pattern scanning and processingsed- Stream editor for filtering textgrep- Search for patterns in filesperl- Practical extraction and report languagecut- Remove sections from each line of fileshead- Output the first part of filestail- Output the last part of fileswc- Print newline, word, and byte counts
Best Practices
- Test patterns on small files before processing large datasets
- Use appropriate output prefixes for better file organization
- Handle empty files with
-zoption when appropriate - Verify split results with line counts and content checks
- Use
-koption for critical data processing to keep files on errors - Consider memory usage when splitting very large files
- Use descriptive prefixes that indicate the content or purpose
- Process splits immediately after creation to avoid disk space issues
- Backup original files before performing destructive operations
- Document splitting logic for future reference and automation
Performance Tips
- Use quiet mode (-s) for better performance on large files
- Optimize regex patterns for faster matching
- Consider file system I/O when splitting to slow storage
- Use appropriate buffer sizes for very large files
- Process splits in parallel when possible
- Compress splits immediately if storage is limited
- Use specific patterns instead of generic ones for better accuracy
- Limit split operations to avoid creating too many small files
- Consider using
splitfor simple size-based splitting - Monitor system resources during large split operations
Common Patterns and Recipes
Log File Sections
# Split Apache log by days
csplit access.log '/\[24\//{/^\[24\//+1}' '{*}'
# Split by session boundaries
csplit app.log '/Session started/' '/Session ended/' '{*}'
# Extract error sections only
csplit -f 'error_' errors.log '/FATAL:/+1' '/ERROR:/+1' '{*}'
# Split by timestamp
csplit timestamped.log '/2023-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}/' '{*}'
Configuration Extraction
# Extract specific configuration sections
csplit config.ini '/\[database\]/' '/\[cache\]/' '/\[security\]/'
# Remove empty configuration sections
csplit -z sparse.conf '/\[.*\]/' '{*}'
# Split by XML elements
csplit config.xml '/<section[^>]*>/' '</section>' '{*}'
# Extract user configurations
csplit settings.txt '/username:/' '/password:/' '{*}'
Document Processing
# Split by chapter or section markers
csplit book.txt '/Chapter [IVX]+/' '{*}'
# Extract specific sections
csplit document.txt '/Abstract/' '/Introduction/' '/Conclusion/'
# Split by page breaks
csplit formatted.txt '/\f/' '{*}'
# Process numbered sections
csplit report.txt '/[0-9]+\./' '{*}'
The csplit command excels at context-aware file splitting, making it invaluable for processing structured text files, logs, and configuration files. While it requires more understanding of patterns than simple split, it provides much more sophisticated splitting capabilities that can automate complex file processing tasks and enable granular analysis of large documents.