Core Concepts
This section covers the fundamental concepts in SyncEngine.
File State and Change Detection
SyncEngine maintains three views of your files:
Source State: Current files in the source location
Destination State: Current files in the destination location
Last Known State: Files as they were during the last sync
By comparing these three states, SyncEngine can determine what happened to each file:
Created: File exists now but didn’t exist in last state
Modified: File exists now with different content than last state
Deleted: File existed in last state but doesn’t exist now
Renamed/Moved: File with same content exists at different path
Unchanged: File exists with same content as last state
Conflict: File was modified in both locations since last sync
State Comparison Matrix
Last State |
Source |
Destination |
Interpretation |
|---|---|---|---|
None |
Exists |
None |
Created at source |
None |
None |
Exists |
Created at destination |
Exists |
Modified |
Same |
Modified at source |
Exists |
Same |
Modified |
Modified at destination |
Exists |
Modified |
Modified |
Conflict (both changed) |
Exists |
None |
Same |
Deleted at source |
Exists |
Same |
None |
Deleted at destination |
Exists |
None |
None |
Deleted both sides |
Comparison Modes
Comparison modes control how SyncEngine determines if two files are identical. This is crucial for deciding whether to skip a file or sync it.
Available Comparison Modes
SyncEngine provides five comparison modes, each optimized for different scenarios:
HASH_THEN_MTIME (Default)
Balanced approach that uses hash when available, falls back to mtime:
from syncengine.models import ComparisonMode, SyncConfig
config = SyncConfig(
comparison_mode=ComparisonMode.HASH_THEN_MTIME
)
How it works:
Compare file sizes first (fast check)
If both files have hash values, compare hashes
If hash unavailable, compare modification times
Files with matching hash are considered identical, even if mtime differs
SIZE_ONLY
Only compares file sizes, ignores hash and mtime:
config = SyncConfig(
comparison_mode=ComparisonMode.SIZE_ONLY
)
How it works:
Files with same size are considered identical
Hash and mtime are completely ignored
When sizes differ in TWO_WAY mode → CONFLICT (cannot determine newer file)
When sizes differ in one-way mode → Uses sync direction
Use cases:
Encrypted storage where hash is unavailable or unreliable
Cloud vaults where mtime is upload time, not original file time
Scenarios where hash computation is too expensive
HASH_ONLY
Strict content verification using only hash, ignores size and mtime:
config = SyncConfig(
comparison_mode=ComparisonMode.HASH_ONLY
)
How it works:
Only compares content hashes
Raises error if hash is unavailable
When hashes differ in TWO_WAY mode → CONFLICT (cannot determine newer file)
When hashes differ in one-way mode → Uses sync direction
Use cases:
Content-critical applications requiring strict verification
Systems where mtime is completely unreliable
Hash is always available and trusted
MTIME_ONLY
Fast time-based sync without hash computation:
config = SyncConfig(
comparison_mode=ComparisonMode.MTIME_ONLY
)
How it works:
Only compares modification times (±2 second tolerance)
Ignores file size and hash
When mtimes differ → Newer file wins
Use cases:
Performance-critical scenarios with reliable timestamps
Large files where hash computation is expensive
Systems with accurate clock synchronization
SIZE_AND_MTIME
Balanced approach for systems without hash support:
config = SyncConfig(
comparison_mode=ComparisonMode.SIZE_AND_MTIME
)
How it works:
Files must match in BOTH size AND mtime (±2 second tolerance)
Hash is completely ignored
When files differ → Newer file wins (uses mtime)
Use cases:
Storage backends that don’t provide content hashes
Reliable systems with accurate timestamps
Balance between performance and accuracy
Comparison Mode Behavior Matrix
Mode |
Files Match If |
Files Differ Direction |
Best For |
|---|---|---|---|
HASH_THEN_MTIME |
Size equal AND (hash equal OR hash unavailable) |
Uses mtime to determine newer |
Most scenarios (default) |
SIZE_ONLY |
Size equal |
TWO_WAY: CONFLICT, One-way: sync direction |
Encrypted vaults, unreliable mtime |
HASH_ONLY |
Hash equal |
TWO_WAY: CONFLICT, One-way: sync direction |
Content-critical, unreliable mtime |
MTIME_ONLY |
Mtime equal (±2s) |
Uses mtime to determine newer |
Performance-critical, reliable clocks |
SIZE_AND_MTIME |
Size equal AND mtime equal (±2s) |
Uses mtime to determine newer |
No hash support, reliable timestamps |
Important: SIZE_ONLY and HASH_ONLY with TWO_WAY Sync
When using SIZE_ONLY or HASH_ONLY comparison modes with TWO_WAY sync mode,
files that differ will result in CONFLICT because:
These modes are chosen specifically when mtime is unreliable
Without reliable mtime, the engine cannot determine which file is newer
In one-way sync modes, the sync direction determines which file wins
Example: Encrypted Vault Sync
from syncengine import SyncEngine
from syncengine.models import ComparisonMode, SyncConfig
from syncengine.modes import SyncMode
# Vault doesn't provide content hashes
# Vault mtime is upload time, not original file mtime
config = SyncConfig(
comparison_mode=ComparisonMode.SIZE_ONLY
)
# For initial upload, use SOURCE_TO_DESTINATION
# This avoids conflicts since sync direction is clear
engine = SyncEngine(mode=SyncMode.SOURCE_TO_DESTINATION)
stats = engine.sync_pair(pair, config=config)
# Files with same size are considered identical
# No re-uploads on subsequent syncs!
Example: Strict Content Verification
config = SyncConfig(
comparison_mode=ComparisonMode.HASH_ONLY
)
# Requires hash on both sides
# Ignores timestamps completely
# Ensures content integrity
stats = engine.sync_pair(pair, config=config)
Example: Fast Time-Based Sync
config = SyncConfig(
comparison_mode=ComparisonMode.MTIME_ONLY
)
# Skips hash computation for large files
# Relies on accurate timestamps
# Much faster for large datasets
stats = engine.sync_pair(pair, config=config)
Sync Actions
Based on the detected changes and the sync mode, SyncEngine determines which actions to take:
Upload Actions
UPLOAD_NEW: Upload a new file to destinationUPLOAD_UPDATE: Upload changes to an existing destination fileUPLOAD_RESTORE: Re-upload a file that was deleted at destination
Download Actions
DOWNLOAD_NEW: Download a new file from destinationDOWNLOAD_UPDATE: Download changes to an existing source fileDOWNLOAD_RESTORE: Re-download a file that was deleted at source
Delete Actions
DELETE_SOURCE: Delete a file from sourceDELETE_DESTINATION: Delete a file from destination
Other Actions
NO_ACTION: File is already in sync, no action neededCONFLICT: Manual resolution required
Rename and Move Detection
SyncEngine can detect when files are renamed or moved (not just deleted and re-added):
How It Works
Scanner creates a hash of each file’s content
When comparing states, files are matched by content hash
If a file with the same hash appears at a different path, it’s recognized as a rename/move
The rename/move is replicated to the other side instead of delete+upload
Benefits
Faster sync (no re-upload of large files)
Preserves file history/metadata
More accurate representation of changes
Reduced bandwidth usage
Example:
# Before sync:
# Source: /docs/report.pdf (hash: abc123)
# Destination: /docs/report.pdf (hash: abc123)
# User renames at source:
# Source: /docs/annual_report_2024.pdf (hash: abc123)
# After sync with rename detection:
# Source: /docs/annual_report_2024.pdf (hash: abc123)
# Destination: /docs/annual_report_2024.pdf (hash: abc123)
# Action: RENAME (not DELETE + UPLOAD)
Conflict Resolution
Conflicts occur when the same file is modified in both locations since the last sync.
Conflict Resolution Strategies
NEWEST_WINS (default)
The file with the most recent modification time wins:
from syncengine import ConflictResolution
pair = SyncPair(
...,
conflict_resolution=ConflictResolution.NEWEST_WINS
)
SOURCE_WINS
Source file always wins conflicts:
pair = SyncPair(
...,
conflict_resolution=ConflictResolution.SOURCE_WINS
)
DESTINATION_WINS
Destination file always wins conflicts:
pair = SyncPair(
...,
conflict_resolution=ConflictResolution.DESTINATION_WINS
)
MANUAL
Conflicts are reported but not resolved automatically:
def handle_conflict(conflict_info):
print(f"Conflict: {conflict_info.path}")
print(f"Source modified: {conflict_info.source_mtime}")
print(f"Destination modified: {conflict_info.dest_mtime}")
# Return 'source', 'destination', or 'skip'
return 'source'
pair = SyncPair(
...,
conflict_resolution=ConflictResolution.MANUAL,
conflict_handler=handle_conflict
)
Ignore Patterns
SyncEngine uses gitignore-style patterns to exclude files from sync.
Pattern Syntax
*.tmp- Ignore all .tmp files*.log- Ignore all .log files/build/- Ignore build directory at rootbuild/- Ignore all build directories**/node_modules/- Ignore node_modules anywhere!important.log- Don’t ignore important.log (negation)*.py[cod]- Ignore .pyc, .pyo, .pyd files#comment- Comments (ignored)
Creating an Ignore File
Create a .syncignore file in your source root:
# Ignore compiled Python files
*.pyc
__pycache__/
# Ignore OS files
.DS_Store
Thumbs.db
# Ignore development files
.vscode/
.idea/
*.swp
# Ignore build artifacts
build/
dist/
*.egg-info/
# Ignore logs
*.log
# But keep important logs
!critical.log
Using Ignore Patterns Programmatically
from syncengine import IgnoreFileManager
ignore_manager = IgnoreFileManager()
# Add individual patterns
ignore_manager.add_pattern("*.tmp")
ignore_manager.add_pattern("*.log")
# Load from file
ignore_manager.load_from_file(".syncignore")
# Check if path should be ignored
if ignore_manager.should_ignore("test.tmp"):
print("File is ignored")
# Use with sync pair
pair = SyncPair(
...,
ignore_manager=ignore_manager
)
State Management
State management is crucial for efficient incremental syncs.
State Directory Structure
.sync_state/
├── source_tree.json # Last known source state
├── destination_tree.json # Last known destination state
└── sync_metadata.json # Sync metadata
State Files
source_tree.json
Stores information about each file in the source:
{
"path/to/file.txt": {
"hash": "abc123...",
"size": 1024,
"mtime": 1609459200.0,
"is_dir": false
}
}
destination_tree.json
Stores information about each file in the destination:
{
"path/to/file.txt": {
"id": 12345,
"hash": "abc123...",
"size": 1024,
"mtime": 1609459200.0,
"is_dir": false
}
}
State Manager API
from syncengine import SyncStateManager
# Create state manager
state_manager = SyncStateManager("/path/to/.sync_state")
# Load previous state
source_tree = state_manager.load_source_tree()
dest_tree = state_manager.load_destination_tree()
# Save new state after sync
state_manager.save_source_tree(new_source_tree)
state_manager.save_destination_tree(new_dest_tree)
# Clear state (force full resync)
state_manager.clear()
Concurrency and Performance
SyncEngine uses concurrent operations for efficiency.
Concurrency Model
SyncEngine uses two types of concurrency limits:
Transfer Limit: Maximum concurrent uploads/downloads
Operations Limit: Maximum concurrent file operations (list, delete, etc.)
from syncengine import ConcurrencyLimits
limits = ConcurrencyLimits(
transfers=5, # Max 5 concurrent uploads/downloads
operations=10 # Max 10 concurrent file operations
)
Choosing Limits
Transfer Limit
Too high: May saturate bandwidth, cause timeouts
Too low: Underutilizes bandwidth, slower sync
Recommended: 3-10 depending on bandwidth and file sizes
Operations Limit
Too high: May overwhelm storage API, cause rate limiting
Too low: Slower listing/deletion of many small files
Recommended: 10-50 depending on storage API limits
Performance Tips
Use state management: Dramatically speeds up incremental syncs
Optimize concurrency: Balance based on your use case
Use ignore patterns: Skip unnecessary files
Choose appropriate sync mode: Don’t use TWO_WAY if you only need one-way
Monitor progress: Use progress callbacks to identify bottlenecks
Pause, Resume, and Cancel
Control sync execution at runtime.
Basic Usage
from syncengine import SyncPauseController
import threading
controller = SyncPauseController()
engine = SyncEngine(
...,
pause_controller=controller
)
# Start sync in background
def run_sync():
stats = engine.sync_pair(pair)
print(f"Sync complete: {stats}")
sync_thread = threading.Thread(target=run_sync)
sync_thread.start()
# Pause sync
controller.pause()
print("Sync paused")
# Resume sync
controller.resume()
print("Sync resumed")
# Cancel sync
controller.cancel()
print("Sync cancelled")
sync_thread.join()
Pause Behavior
When paused:
Current operations complete
No new operations start
State is preserved
Can resume at any time
Cancel Behavior
When cancelled:
Current operations complete
No new operations start
State is saved (partial sync)
Cannot resume (need to start new sync)
Progress Tracking
Monitor sync progress with detailed callbacks.
Progress Events
SyncEngine emits various progress events:
scan_start: Starting to scan filesscan_progress: Scanning progressscan_complete: Scan completesync_start: Starting sync operationsupload_start: Starting uploadupload_progress: Upload progress (bytes transferred)upload_complete: Upload completedownload_start: Starting downloaddownload_progress: Download progress (bytes transferred)download_complete: Download completedelete_start: Starting deletedelete_complete: Delete completesync_complete: All sync operations complete
Progress Callback
from syncengine import SyncProgressTracker, SyncProgressEvent
def on_progress(event: SyncProgressEvent):
if event.type == "upload_progress":
percent = (event.bytes_transferred / event.total_bytes) * 100
print(f"Uploading {event.file_path}: {percent:.1f}%")
elif event.type == "download_progress":
percent = (event.bytes_transferred / event.total_bytes) * 100
print(f"Downloading {event.file_path}: {percent:.1f}%")
elif event.type == "sync_complete":
print(f"Sync complete: {event.stats}")
tracker = SyncProgressTracker(callback=on_progress)
engine = SyncEngine(
...,
progress_tracker=tracker
)
Custom Progress UI
You can build custom progress UIs using the progress callbacks:
class ProgressUI:
def __init__(self):
self.current_file = None
self.total_files = 0
self.completed_files = 0
def on_progress(self, event: SyncProgressEvent):
if event.type == "sync_start":
self.total_files = event.total_files
print(f"Starting sync of {self.total_files} files")
elif event.type == "upload_start":
self.current_file = event.file_path
print(f"Uploading: {self.current_file}")
elif event.type == "upload_complete":
self.completed_files += 1
print(f"Completed {self.completed_files}/{self.total_files}")
# ... handle other events
ui = ProgressUI()
tracker = SyncProgressTracker(callback=ui.on_progress)
Next Steps
Sync Modes Reference - Detailed explanation of each sync mode
Storage Protocols - Implement custom storage backends
Examples - Advanced usage examples
API Reference - Complete API documentation