Command documentation sourced from the linux-command project This comprehensive command reference is part of the linux-command documentation project.
bzip2 - Block-sorting File Compressor
The bzip2 command is a sophisticated file compression utility that implements the Burrows-Wheeler block sorting text compression algorithm combined with Huffman coding. Developed by Julian Seward, bzip2 typically achieves superior compression ratios compared to gzip, especially for text files, while maintaining reasonable processing speeds. It creates compressed files with the .bz2 extension and intelligently preserves file permissions, timestamps, and ownership when possible.
Technical Architecture
Algorithm Details
Bzip2 operates through a multi-stage compression process:
- Burrows-Wheeler Transform: Reorders data to group similar characters together
- Move-to-Front Transform: Improves Huffman coding efficiency
- Run-Length Encoding: Compresses repeated character sequences
- Huffman Coding: Final entropy coding stage
Block Structure
- Block-based Compression: Processes data in configurable block sizes (100KB to 900KB)
- Memory Scaling: Larger blocks provide better compression but require more RAM
- Parallel Processing Potential: Each block can be compressed independently
- Error Recovery: CRC-32 integrity checking for each block
Performance Characteristics
- Compression Ratio: Typically 10-30% better than gzip for text files
- Speed: Slower than gzip but faster than most high-ratio compressors
- Memory Usage: Approximately 3.5MB per 100KB of block size during compression
- CPU Utilization: Single-threaded by default, but parallel implementations exist (pbzip2)
Basic Syntax
bzip2 [OPTIONS] [FILES...]
Command Options
Operation Mode Control
Compression and Decompression
-z, --compress- Force compression (default behavior)-d, --decompress- Force decompression mode-t, --test- Test compressed file integrity without extracting-c, --stdout- Write output to standard output instead of files
File Processing Control
-k, --keep- Keep (don't delete) input files after processing-f, --force- Force overwrite of existing output files-s, --small- Use reduced memory usage mode (slower, ~50% less memory)-q, --quiet- Suppress non-critical warning messages
Compression Quality Levels
-1- Fastest compression (100KB blocks, lowest memory)-2- Fast compression (200KB blocks)-3- Normal-fast compression (300KB blocks)-4- Normal compression (400KB blocks)-5- Normal compression (500KB blocks) - Recommended default-6- Good compression (600KB blocks)-7- Better compression (700KB blocks)-8- Excellent compression (800KB blocks)-9- Best compression (900KB blocks, highest memory, slowest)
Information and Help
-v, --verbose- Display detailed processing information and statistics-h, --help- Display comprehensive help message-H, --long-help- Display detailed help with examples-L, --license- Show software version and license information-V, --version- Display version number and compilation details
Usage Examples
Basic File Operations
Standard Compression
# Basic compression (replaces original with .bz2)
bzip2 document.txt
# Result: document.txt.bz2, original file deleted
# Compression with verbose output showing statistics
bzip2 -v document.txt
# Output: document.txt: 2.345:1, 5.678 bits/byte, 77.23% saved, in 0.45 secs.
# Keep original file after compression
bzip2 -k document.txt
# Result: document.txt + document.txt.bz2 (both files preserved)
# Force compression (overwrite existing .bz2 file)
bzip2 -f document.txt
# Compress with specific block size for memory constraints
bzip2 -1 document.txt # 100KB blocks, fastest, lowest memory
bzip2 -5 document.txt # 500KB blocks, balanced performance
bzip2 -9 document.txt # 900KB blocks, best compression, highest memory
Batch Compression Operations
# Compress multiple files simultaneously
bzip2 file1.txt file2.txt file3.txt
# Compress all text files in directory
bzip2 *.txt
# Compress with specific level for all files
bzip2 -9 *.log
# Compress all files matching pattern
bzip2 backup-2023-*.tar
# Recursive directory compression
find /data -type f -name "*.log" -exec bzip2 {} \;
# Compress files with progress monitoring
find /data -name "*.txt" -print0 | parallel -0 bzip2 -v
Decompression Operations
Standard Decompression
# Decompress file (replaces .bz2 with original)
bzip2 -d document.txt.bz2
# Result: document.txt, compressed file deleted
# Alternative using bunzip2 command
bunzip2 document.txt.bz2
# Decompress keeping compressed file
bzip2 -dk document.txt.bz2
# Force decompression (overwrite existing file)
bzip2 -df document.txt.bz2
# Decompress to standard output
bzip2 -dc document.txt.bz2
Advanced Decompression
# Decompress multiple files
bzip2 -d file1.txt.bz2 file2.txt.bz2 file3.txt.bz2
# Decompress all .bz2 files in directory
bzip2 -d *.bz2
# Recursive decompression
find /archive -name "*.bz2" -exec bzip2 -d {} \;
# Test files before decompression
find /compressed -name "*.bz2" -exec bzip2 -t {} \; && find /compressed -name "*.bz2" -exec bzip2 -d {} \;
Pipeline and Stream Operations
Stream Compression
# Compress command output
ls -la | bzip2 > directory_listing.bz2
# Compress database dump directly
mysqldump database_name | bzip2 > database_backup.sql.bz2
# Compress and transfer over network
tar -cf - /data | bzip2 | ssh remote-server "cat > remote_backup.tar.bz2"
# Compress multiple commands output
{ date; uptime; df -h } | bzip2 > system_info.bz2
Stream Decompression and Processing
# Decompress and process directly
bzip2 -dc data.csv.bz2 | python process_data.py
# Decompress and filter content
bzip2 -dc access_log.bz2 | grep "ERROR"
# Decompress and convert format
bzip2 -dc document.bz2 | gzip > document.gz
# Decompress remote file and process locally
ssh remote "cat file.bz2" | bzip2 -dc | process_data
# Decompress and database import
bzip2 -dc database_dump.bz2 | mysql -u user -p database_name
Advanced Pipeline Operations
# Compress and decompress in pipeline for verification
bzip2 -c document.txt | bzip2 -dc | diff document.txt -
# Multiple compression formats from same source
tar -cf - /data | tee >(gzip > data.tar.gz) | bzip2 > data.tar.bz2
# Compress with different levels for comparison
for level in {1..9}; do
bzip2 -$level -c large_file.txt > "compressed_level${level}.bz2"
done
Archive Operations
Tar Integration
# Create compressed tar archive
tar -cf - directory/ | bzip2 > archive.tar.bz2
# Extract compressed tar archive
bzip2 -dc archive.tar.bz2 | tar -xf -
# Extract to specific directory
bzip2 -dc archive.tar.bz2 | tar -xf - -C /target/directory/
# List contents of compressed tar
bzip2 -dc archive.tar.bz2 | tar -tf -
# Create tar.bz2 with specific compression level
tar -cf - source_code/ | bzip2 -9 > source_code.tar.bz2
# Extract specific files from tar.bz2
bzip2 -dc archive.tar.bz2 | tar -xf - path/to/specific/file
Advanced Archive Management
# Create incremental backup
tar -cf - /home --listed-incremental=backup.snar | bzip2 > incremental.tar.bz2
# Verify compressed archive integrity
bzip2 -t archive.tar.bz2 && echo "Archive is valid"
# Compare archive contents with filesystem
bzip2 -dc archive.tar.bz2 | tar -df - /original/directory/
# Create split archives for large datasets
tar -cf - huge_dataset | bzip2 | split -b 1G - dataset_split.
# Reconstruct split archive
cat dataset_split.* | bzip2 -dc | tar -xf -
Performance Optimization
Memory Management
Memory-Efficient Compression
# Use reduced memory mode for large files on limited systems
bzip2 -s huge_file.txt
# Process with minimal memory usage
bzip2 -1 -s document.txt # Fastest + minimal memory
# Monitor memory usage during compression
/usr/bin/time -v bzip2 large_file.txt
# Set memory limits before compression
ulimit -v 524288 # 512MB virtual memory limit
bzip2 document.txt
# Batch processing with memory control
find /large/files -name "*.txt" -exec bzip2 -s {} \;
Memory Usage Analysis
# Compare memory usage across compression levels
for level in {1..9}; do
echo "Testing level $level:"
/usr/bin/time -v bzip2 -$level test_file.txt 2>&1 | grep "Maximum resident"
echo "Size: $(ls -lh test_file.txt.bz2 | awk '{print $5}')"
echo "---"
done
# Memory profiling with detailed output
bzip2 -v large_file.txt &
pid=$!
while kill -0 $pid 2>/dev/null; do
ps -p $pid -o pid,ppid,cmd,%mem,rss,vsz
sleep 2
done
Speed Optimization
Fast Compression Scenarios
# Quick compression for temporary operations
bzip2 -1 temporary_file.txt
# Fast compression with memory optimization
bzip2 -1 -s batch_process.log
# Parallel compression using GNU parallel
find /data -name "*.log" -print0 | parallel -0 -j 4 bzip2 -1
# Background compression for non-critical files
for file in *.txt; do
bzip2 -1 "$file" &
done
wait
Compression Level Selection
# Benchmark compression levels for specific file type
benchmark_levels() {
local file="$1"
echo "Benchmarking compression levels for: $file"
printf "%-8s %-12s %-12s %-10s\n" "Level" "Size(MB)" "Ratio" "Time(sec)"
echo "---------------------------------------------"
for level in {1..9}; do
start_time=$(date +%s.%N)
bzip2 -$level -c "$file" > "test_$level.bz2"
end_time=$(date +%s.%N)
duration=$(echo "$end_time - $start_time" | bc)
size=$(stat -c%s "test_$level.bz2")
original_size=$(stat -c%s "$file")
ratio=$(echo "scale=2; $size * 100 / $original_size" | bc)
printf "%-8s %-12s %-12s %-10s\n" "$level" "${size}" "${ratio}%" "$duration"
rm "test_$level.bz2"
done
}
# Usage: benchmark_levels large_text_file.txt
Parallel Processing
Multi-Core Optimization
# Using pbzip2 (parallel bzip2) if available
pbzip2 -p 4 large_file.txt # Use 4 processors
# Manual parallel processing with GNU parallel
find /data -name "*.log" -print0 | parallel -0 -j $(nproc) bzip2
# Batch parallel compression with load balancing
compress_batch() {
local max_jobs=$(nproc)
local files=("$@")
for file in "${files[@]}"; do
while [ $(jobs -r | wc -l) -ge $max_jobs ]; do
sleep 0.1
done
bzip2 "$file" &
done
wait
}
# Group files by size for optimal parallel processing
find /data -name "*.txt" -size -10M -print0 | xargs -0 -P 4 bzip2
find /data -name "*.txt" -size +10M -print0 | xargs -0 -P 2 bzip2
File Integrity and Testing
Comprehensive Integrity Checking
Single File Verification
# Test compressed file integrity
bzip2 -t document.txt.bz2
# Output: document.txt.bz2: OK
# Test with verbose output showing statistics
bzip2 -tv document.txt.bz2
# Output: document.txt.bz2: 2.345:1, 5.678 bits/byte
# Test before decompression
if bzip2 -t important_file.bz2; then
bzip2 -d important_file.bz2
else
echo "File is corrupted!"
fi
Batch Integrity Verification
# Test all .bz2 files in current directory
bzip2 -t *.bz2
# Recursive integrity check with detailed reporting
find /archive -name "*.bz2" -exec bash -c '
file="$1"
if bzip2 -t "$file" 2>/dev/null; then
echo "✓ OK: $file"
else
echo "✗ CORRUPT: $file"
fi
' _ {} \;
# Comprehensive archive health check
check_archive_health() {
local archive_dir="$1"
local report_file="archive_health_$(date +%Y%m%d).log"
echo "Archive Health Report - $(date)" > "$report_file"
echo "=================================" >> "$report_file"
local total_files=0
local corrupted_files=0
find "$archive_dir" -name "*.bz2" -print0 | while IFS= read -r -d '' file; do
((total_files++))
if bzip2 -t "$file" 2>/dev/null; then
echo "✓ OK: $file" >> "$report_file"
else
echo "✗ CORRUPT: $file" >> "$report_file"
((corrupted_files++))
fi
done
echo "Total files checked: $total_files" >> "$report_file"
echo "Corrupted files: $corrupted_files" >> "$report_file"
echo "Health check completed. Report: $report_file"
}
Recovery Operations
# Attempt partial recovery from corrupted file
attempt_recovery() {
local corrupted_file="$1"
local recovered_file="${corrupted_file%.bz2}.recovered"
echo "Attempting recovery from: $corrupted_file"
# Try to extract what we can
bzip2 -dc "$corrupted_file" > "$recovered_file" 2>/dev/null
if [ -f "$recovered_file" ] && [ -s "$recovered_file" ]; then
local recovered_size=$(stat -c%s "$recovered_file")
echo "Partial recovery successful: $recovered_size bytes recovered"
echo "Recovered data saved to: $recovered_file"
else
echo "Recovery failed - no usable data extracted"
rm -f "$recovered_file"
fi
}
# Usage: attempt_recovery corrupted_file.bz2
Practical Applications
System Administration
Log File Management
# Automated log rotation and compression
rotate_and_compress_logs() {
local log_dir="$1"
local max_age_days="$2"
# Compress logs older than specified age
find "$log_dir" -name "*.log" -mtime +1 -exec bzip2 {} \;
# Remove very old compressed logs
find "$log_dir" -name "*.log.bz2" -mtime +$max_age_days -delete
echo "Log rotation completed for $log_dir"
}
# Compress application logs with specific naming
compress_app_logs() {
local app_log_dir="/var/log/myapp"
local date_stamp=$(date +%Y%m%d_%H%M%S)
for log_file in "$app_log_dir"/*.log; do
if [ -f "$log_file" ]; then
local base_name=$(basename "$log_file" .log)
mv "$log_file" "${app_log_dir}/${base_name}_${date_stamp}.log"
bzip2 "${app_log_dir}/${base_name}_${date_stamp}.log"
echo "Rotated: $base_name -> ${base_name}_${date_stamp}.log.bz2"
fi
done
}
Backup Operations
# Comprehensive backup strategy
create_backup() {
local source_dir="$1"
local backup_dir="$2"
local backup_name="$3"
local timestamp=$(date +%Y%m%d_%H%M%S)
local backup_file="$backup_dir/${backup_name}_${timestamp}.tar.bz2"
echo "Creating backup: $backup_file"
# Create compressed backup with integrity checking
tar -cf - "$source_dir" | bzip2 -9 > "$backup_file"
# Verify backup integrity
if bzip2 -t "$backup_file"; then
echo "✓ Backup created and verified: $backup_file"
# Generate backup statistics
local backup_size=$(stat -c%s "$backup_file")
local original_size=$(du -sb "$source_dir" | cut -f1)
local ratio=$(echo "scale=2; $backup_size * 100 / $original_size" | bc)
echo " Original size: $((original_size / 1024 / 1024)) MB"
echo " Backup size: $((backup_size / 1024 / 1024)) MB"
echo " Compression ratio: ${ratio}%"
return 0
else
echo "✗ Backup verification failed!"
rm -f "$backup_file"
return 1
fi
}
# Incremental backup with bzip2
incremental_backup() {
local source="$1"
local backup_dir="$2"
local snapshot_file="$backup_dir/incremental.snar"
local timestamp=$(date +%Y%m%d_%H%M%S)
tar --listed-incremental="$snapshot_file" \
-cf - "$source" | \
bzip2 -9 > "$backup_dir/incremental_$timestamp.tar.bz2"
echo "Incremental backup created: incremental_$timestamp.tar.bz2"
}
Database Operations
# Database backup with bzip2 compression
backup_database() {
local db_name="$1"
local backup_dir="/backups/database"
local timestamp=$(date +%Y%m%d_%H%M%S)
mkdir -p "$backup_dir"
echo "Backing up database: $db_name"
# Compressed database dump
mysqldump --single-transaction --routines --triggers "$db_name" | \
bzip2 -9 > "$backup_dir/${db_name}_$timestamp.sql.bz2"
# Verify backup
if bzip2 -t "$backup_dir/${db_name}_$timestamp.sql.bz2"; then
echo "✓ Database backup successful: ${db_name}_$timestamp.sql.bz2"
else
echo "✗ Database backup verification failed!"
return 1
fi
}
# Database restoration from compressed backup
restore_database() {
local db_name="$1"
local backup_file="$2"
echo "Restoring database: $db_name from $backup_file"
# Verify backup before restoration
if ! bzip2 -t "$backup_file"; then
echo "Error: Backup file is corrupted"
return 1
fi
# Restore database
bzip2 -dc "$backup_file" | mysql "$db_name"
if [ $? -eq 0 ]; then
echo "✓ Database restoration successful"
else
echo "✗ Database restoration failed"
return 1
fi
}
Development Workflow
Source Code Management
# Create source distribution with bzip2
create_source_distribution() {
local project_dir="$1"
local version="$2"
local dist_dir="dist"
mkdir -p "$dist_dir"
# Clean build artifacts
find "$project_dir" -name "*.o" -delete
find "$project_dir" -name "*.so" -delete
find "$project_dir" -name "build" -type d -exec rm -rf {} + 2>/dev/null
# Create compressed source archive
tar -cf - "$project_dir" | bzip2 -9 > "$dist_dir/${project_dir}_v${version}.tar.bz2"
echo "Source distribution created: ${project_dir}_v${version}.tar.bz2"
}
# Build artifact compression
compress_build_artifacts() {
local build_dir="build"
local artifacts_dir="artifacts"
mkdir -p "$artifacts_dir"
# Compress log files
find "$build_dir" -name "*.log" -exec bzip2 {} \;
# Compress test reports
find "$build_dir" -name "*report*" -exec bzip2 {} \;
# Archive entire build directory
tar -cf - "$build_dir" | bzip2 -9 > "$artifacts_dir/build_$(date +%Y%m%d).tar.bz2"
echo "Build artifacts compressed and archived"
}
Data Processing Pipeline
# Large dataset processing with streaming
process_large_dataset() {
local input_file="$1"
local output_file="$2"
echo "Processing large dataset: $input_file"
# Decompress, process, and recompress in pipeline
bzip2 -dc "$input_file" | \
python process_data.py | \
bzip2 -9 > "$output_file"
echo "Processing completed: $output_file"
}
# Data format conversion with compression
convert_data_format() {
local input_file="$1"
local output_format="$2"
case "$output_format" in
"json")
bzip2 -dc "$input_file" | python csv_to_json.py | bzip2 -9 > "${input_file%.csv}.json.bz2"
;;
"xml")
bzip2 -dc "$input_file" | python csv_to_xml.py | bzip2 -9 > "${input_file%.csv}.xml.bz2"
;;
*)
echo "Unsupported format: $output_format"
return 1
;;
esac
echo "Converted to $output_format format"
}
Automation and Scripting
Production Scripts
Automated Archive Management
#!/bin/bash
# Automated archive management script
set -euo pipefail
ARCHIVE_DIR="/data/archives"
COMPRESS_DIR="/data/to_compress"
LOG_FILE="/var/log/archive_manager.log"
MAX_FILE_SIZE="1G"
RETENTION_DAYS=365
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE"
}
compress_files() {
local files_compressed=0
local total_original_size=0
local total_compressed_size=0
find "$COMPRESS_DIR" -type f -size +$MAX_FILE_SIZE -print0 | while IFS= read -r -d '' file; do
log_message "Compressing: $file"
local original_size=$(stat -c%s "$file")
total_original_size=$((total_original_size + original_size))
# Compress with optimal settings for large files
bzip2 -9 -k "$file"
local compressed_file="${file}.bz2"
local compressed_size=$(stat -c%s "$compressed_file")
total_compressed_size=$((total_compressed_size + compressed_size))
local ratio=$(echo "scale=2; $compressed_size * 100 / $original_size" | bc)
log_message " Compressed to ${ratio}% of original size"
# Move compressed file to archive
mv "$compressed_file" "$ARCHIVE_DIR/"
((files_compressed++))
# Verify compressed file
if ! bzip2 -t "$ARCHIVE_DIR/$(basename "$compressed_file")"; then
log_message "ERROR: Compression verification failed for $compressed_file"
continue
fi
# Remove original after successful compression
rm "$file"
log_message " Original file removed"
done
if [ "$total_original_size" -gt 0 ]; then
local overall_ratio=$(echo "scale=2; $total_compressed_size * 100 / $total_original_size" | bc)
log_message "Compression summary: $files_compressed files, overall ratio: ${overall_ratio}%"
fi
}
cleanup_old_archives() {
local files_removed=0
find "$ARCHIVE_DIR" -name "*.bz2" -mtime +$RETENTION_DAYS -print0 | while IFS= read -r -d '' file; do
log_message "Removing old archive: $file"
rm "$file"
((files_removed++))
done
log_message "Cleanup completed: $files_removed files removed"
}
# Main execution
log_message "Starting archive management process"
compress_files
cleanup_old_archives
log_message "Archive management process completed"
Health Monitoring Script
#!/bin/bash
# Archive health monitoring with alerting
MONITOR_DIR="/data/archives"
ALERT_THRESHOLD=5
ALERT_EMAIL="admin@company.com"
TEMP_REPORT="/tmp/archive_health_$(date +%s).txt"
check_archive_health() {
local total_files=0
local corrupted_files=0
local total_size=0
echo "Archive Health Report - $(date)" > "$TEMP_REPORT"
echo "=================================" >> "$TEMP_REPORT"
echo "" >> "$TEMP_REPORT"
find "$MONITOR_DIR" -name "*.bz2" -print0 | while IFS= read -r -d '' file; do
((total_files++))
local file_size=$(stat -c%s "$file")
total_size=$((total_size + file_size))
if bzip2 -t "$file" 2>/dev/null; then
echo "✓ OK: $(basename "$file") ($(echo "scale=1; $file_size/1024/1024" | bc)MB)" >> "$TEMP_REPORT"
else
echo "✗ CORRUPT: $(basename "$file") ($(echo "scale=1; $file_size/1024/1024" | bc)MB)" >> "$TEMP_REPORT"
((corrupted_files++))
fi
done
echo "" >> "$TEMP_REPORT"
echo "Summary:" >> "$TEMP_REPORT"
echo " Total files: $total_files" >> "$TEMP_REPORT"
echo " Corrupted files: $corrupted_files" >> "$TEMP_REPORT"
echo " Total archive size: $(echo "scale=1; $total_size/1024/1024/1024" | bc)GB" >> "$TEMP_REPORT"
if [ "$corrupted_files" -gt 0 ]; then
echo " CORRUPTION RATE: $(echo "scale=2; $corrupted_files * 100 / $total_files" | bc)%" >> "$TEMP_REPORT"
fi
return "$corrupted_files"
}
# Run health check
check_archive_health
corruption_count=$?
if [ "$corruption_count" -gt "$ALERT_THRESHOLD" ]; then
echo "HIGH CORRUPTION RATE DETECTED!"
mail -s "Archive Health Alert: $corruption_count corrupted files" "$ALERT_EMAIL" < "$TEMP_REPORT"
fi
echo "Health check completed. $corruption_count corrupted files found."
cat "$TEMP_REPORT"
# Cleanup temporary report
rm -f "$TEMP_REPORT"
Integration with Other Tools
Cloud Storage Integration
# Upload compressed files to cloud storage
upload_to_cloud() {
local file="$1"
local cloud_path="$2"
# Decompress and stream to cloud storage
bzip2 -dc "$file" | aws s3 cp - "s3://mybucket/$cloud_path"
echo "Uploaded decompressed content to s3://mybucket/$cloud_path"
}
# Download and compress from cloud storage
download_from_cloud() {
local cloud_path="$1"
local local_file="$2"
# Download and compress locally
aws s3 cp "s3://mybucket/$cloud_path" - | bzip2 -9 > "$local_file"
echo "Downloaded and compressed: $local_file"
}
Database Integration
# Compressed database export for multiple database systems
export_database_compressed() {
local db_type="$1"
local db_name="$2"
local output_file="$3"
case "$db_type" in
"mysql")
mysqldump --single-transaction "$db_name" | bzip2 -9 > "$output_file"
;;
"postgresql")
pg_dump "$db_name" | bzip2 -9 > "$output_file"
;;
"sqlite")
sqlite3 "$db_name" ".dump" | bzip2 -9 > "$output_file"
;;
*)
echo "Unsupported database type: $db_type"
return 1
;;
esac
echo "Database exported and compressed: $output_file"
}
Error Handling and Troubleshooting
Common Issues and Solutions
Memory Management
# Out of memory during compression
handle_memory_issues() {
local file="$1"
echo "Handling memory constraints for: $file"
# Check available memory
local available_mem=$(free -m | awk 'NR==2{print $7}')
if [ "$available_mem" -lt 1024 ]; then # Less than 1GB available
echo "Low memory detected, using minimal memory mode"
bzip2 -1 -s "$file"
elif [ "$available_mem" -lt 2048 ]; then # Less than 2GB available
echo "Moderate memory, using balanced settings"
bzip2 -5 "$file"
else
echo "Sufficient memory, using maximum compression"
bzip2 -9 "$file"
fi
}
Disk Space Management
# Handle disk space constraints
manage_disk_space() {
local file="$1"
local target_dir="$2"
# Check available disk space
local available_space=$(df -k --output=avail "$target_dir" | tail -1)
local file_size=$(stat -c%s "$file")
local estimated_compressed=$((file_size / 4)) # Estimate 75% compression
if [ "$estimated_compressed" -gt "$available_space" ]; then
echo "Insufficient disk space for compression"
# Try compressing to different filesystem
local temp_dir="/tmp/bzip2_temp"
mkdir -p "$temp_dir"
bzip2 -c "$file" > "$temp_dir/$(basename "$file").bz2"
if [ $? -eq 0 ]; then
echo "Compressed successfully to temporary location"
echo "Please free up space and move: $temp_dir/$(basename "$file").bz2"
else
echo "Compression failed even with temporary location"
return 1
fi
else
echo "Sufficient disk space available"
bzip2 "$file"
fi
}
Related Commands
bunzip2- Decompress bzip2 compressed filesbzip2recover- Recover damaged bzip2 filesbzcat- View bzip2 compressed filesbzless- View bzip2 files page by pagebzmore- View bzip2 files page by pagepbzip2- Parallel bzip2 implementationgzip- Alternative compression utilityxz- High-ratio compression utilitytar- Archive utility with bzip2 support
Best Practices
Compression Strategy
- Choose appropriate compression level based on use case and system resources
- Use level 9 for long-term storage where space is premium
- Use level 1-3 for temporary compression where speed is important
- Use level 5-7 for balanced performance in most scenarios
- Test compression ratios with your specific data types
Performance Optimization
- Use parallel bzip2 (pbzip2) on multi-core systems for large files
- Monitor memory usage and adjust compression levels accordingly
- Use
-soption on memory-constrained systems - Compress during off-peak hours for large batch operations
- Consider SSD storage for improved compression performance
Data Management
- Always verify compressed files with
-tbefore deleting originals - Use
-koption when preserving original files during testing - Implement integrity checking for critical archives
- Maintain backup copies of important compressed files
- Use descriptive filenames with dates and versions
Error Prevention
- Check available disk space before compression operations
- Monitor system resources during large compression tasks
- Use appropriate error handling in automated scripts
- Test compressed file integrity after important operations
- Implement recovery procedures for corrupted archives
Performance Tips
- Higher compression levels provide better ratios but use more memory and CPU
- Level 5 offers the best balance of speed and compression for most files
- Parallel processing can dramatically speed up compression on multi-core systems
- Memory usage scales with block size (3.5MB per 100KB block)
- SSD storage improves compression speed, especially for large files
- File type matters - text files compress better than binary files
- Batch processing is more efficient than individual file operations
- Pipeline operations avoid intermediate file I/O overhead
Comparison with Other Compression Tools
| Feature | bzip2 | gzip | xz | zip |
|---|---|---|---|---|
| Compression Ratio | Good | Fair | Excellent | Fair |
| Speed | Medium | Fast | Slow | Medium |
| Memory Usage | Medium | Low | High | Medium |
| Multi-core Support | Limited | Limited | Good | Limited |
| Archive Support | External | External | External | Built-in |
| Recovery Features | Good | Basic | Good | Basic |
| Compression Levels | 1-9 | 1-9 | 0-9 | 0-9 |
| Block-based | Yes | No | No | No |
The bzip2 command represents an excellent balance between compression ratio and processing speed, making it ideal for text-heavy archives, backup systems, and bandwidth-constrained transfers. Its robust error checking and recovery features add an extra layer of data protection that is crucial for mission-critical applications.