Core Concepts

This section covers the fundamental concepts in SyncEngine.

File State and Change Detection

SyncEngine maintains three views of your files:

Source State: Current files in the source location
Destination State: Current files in the destination location
Last Known State: Files as they were during the last sync

By comparing these three states, SyncEngine can determine what happened to each file:

Created: File exists now but didn’t exist in last state
Modified: File exists now with different content than last state
Deleted: File existed in last state but doesn’t exist now
Renamed/Moved: File with same content exists at different path
Unchanged: File exists with same content as last state
Conflict: File was modified in both locations since last sync

State Comparison Matrix

Last State	Source	Destination	Interpretation
None	Exists	None	Created at source
None	None	Exists	Created at destination
Exists	Modified	Same	Modified at source
Exists	Same	Modified	Modified at destination
Exists	Modified	Modified	Conflict (both changed)
Exists	None	Same	Deleted at source
Exists	Same	None	Deleted at destination
Exists	None	None	Deleted both sides

Comparison Modes

Comparison modes control how SyncEngine determines if two files are identical. This is crucial for deciding whether to skip a file or sync it.

Available Comparison Modes

SyncEngine provides five comparison modes, each optimized for different scenarios:

HASH_THEN_MTIME (Default)

Balanced approach that uses hash when available, falls back to mtime:

from syncengine.models import ComparisonMode, SyncConfig

config = SyncConfig(
    comparison_mode=ComparisonMode.HASH_THEN_MTIME
)

How it works:

Compare file sizes first (fast check)
If both files have hash values, compare hashes
If hash unavailable, compare modification times
Files with matching hash are considered identical, even if mtime differs

SIZE_ONLY

Only compares file sizes, ignores hash and mtime:

config = SyncConfig(
    comparison_mode=ComparisonMode.SIZE_ONLY
)

How it works:

Files with same size are considered identical
Hash and mtime are completely ignored
When sizes differ in TWO_WAY mode → CONFLICT (cannot determine newer file)
When sizes differ in one-way mode → Uses sync direction

Use cases:

Encrypted storage where hash is unavailable or unreliable
Cloud vaults where mtime is upload time, not original file time
Scenarios where hash computation is too expensive

HASH_ONLY

Strict content verification using only hash, ignores size and mtime:

config = SyncConfig(
    comparison_mode=ComparisonMode.HASH_ONLY
)

How it works:

Only compares content hashes
Raises error if hash is unavailable
When hashes differ in TWO_WAY mode → CONFLICT (cannot determine newer file)
When hashes differ in one-way mode → Uses sync direction

Use cases:

Content-critical applications requiring strict verification
Systems where mtime is completely unreliable
Hash is always available and trusted

MTIME_ONLY

Fast time-based sync without hash computation:

config = SyncConfig(
    comparison_mode=ComparisonMode.MTIME_ONLY
)

How it works:

Only compares modification times (±2 second tolerance)
Ignores file size and hash
When mtimes differ → Newer file wins

Use cases:

Performance-critical scenarios with reliable timestamps
Large files where hash computation is expensive
Systems with accurate clock synchronization

SIZE_AND_MTIME

Balanced approach for systems without hash support:

config = SyncConfig(
    comparison_mode=ComparisonMode.SIZE_AND_MTIME
)

How it works:

Files must match in BOTH size AND mtime (±2 second tolerance)
Hash is completely ignored
When files differ → Newer file wins (uses mtime)

Use cases:

Storage backends that don’t provide content hashes
Reliable systems with accurate timestamps
Balance between performance and accuracy

Comparison Mode Behavior Matrix

Mode	Files Match If	Files Differ Direction	Best For
HASH_THEN_MTIME	Size equal AND (hash equal OR hash unavailable)	Uses mtime to determine newer	Most scenarios (default)
SIZE_ONLY	Size equal	TWO_WAY: CONFLICT, One-way: sync direction	Encrypted vaults, unreliable mtime
HASH_ONLY	Hash equal	TWO_WAY: CONFLICT, One-way: sync direction	Content-critical, unreliable mtime
MTIME_ONLY	Mtime equal (±2s)	Uses mtime to determine newer	Performance-critical, reliable clocks
SIZE_AND_MTIME	Size equal AND mtime equal (±2s)	Uses mtime to determine newer	No hash support, reliable timestamps

Important: SIZE_ONLY and HASH_ONLY with TWO_WAY Sync

When using SIZE_ONLY or HASH_ONLY comparison modes with TWO_WAY sync mode, files that differ will result in CONFLICT because:

These modes are chosen specifically when mtime is unreliable
Without reliable mtime, the engine cannot determine which file is newer
In one-way sync modes, the sync direction determines which file wins

Example: Encrypted Vault Sync

from syncengine import SyncEngine
from syncengine.models import ComparisonMode, SyncConfig
from syncengine.modes import SyncMode

# Vault doesn't provide content hashes
# Vault mtime is upload time, not original file mtime
config = SyncConfig(
    comparison_mode=ComparisonMode.SIZE_ONLY
)

# For initial upload, use SOURCE_TO_DESTINATION
# This avoids conflicts since sync direction is clear
engine = SyncEngine(mode=SyncMode.SOURCE_TO_DESTINATION)
stats = engine.sync_pair(pair, config=config)

# Files with same size are considered identical
# No re-uploads on subsequent syncs!

Example: Strict Content Verification

config = SyncConfig(
    comparison_mode=ComparisonMode.HASH_ONLY
)

# Requires hash on both sides
# Ignores timestamps completely
# Ensures content integrity
stats = engine.sync_pair(pair, config=config)

Example: Fast Time-Based Sync

config = SyncConfig(
    comparison_mode=ComparisonMode.MTIME_ONLY
)

# Skips hash computation for large files
# Relies on accurate timestamps
# Much faster for large datasets
stats = engine.sync_pair(pair, config=config)

Sync Actions

Based on the detected changes and the sync mode, SyncEngine determines which actions to take:

Upload Actions

UPLOAD_NEW: Upload a new file to destination
UPLOAD_UPDATE: Upload changes to an existing destination file
UPLOAD_RESTORE: Re-upload a file that was deleted at destination

Download Actions

DOWNLOAD_NEW: Download a new file from destination
DOWNLOAD_UPDATE: Download changes to an existing source file
DOWNLOAD_RESTORE: Re-download a file that was deleted at source

Delete Actions

DELETE_SOURCE: Delete a file from source
DELETE_DESTINATION: Delete a file from destination

Other Actions

NO_ACTION: File is already in sync, no action needed
CONFLICT: Manual resolution required

Rename and Move Detection

SyncEngine can detect when files are renamed or moved (not just deleted and re-added):

How It Works

Scanner creates a hash of each file’s content
When comparing states, files are matched by content hash
If a file with the same hash appears at a different path, it’s recognized as a rename/move
The rename/move is replicated to the other side instead of delete+upload

Benefits

Faster sync (no re-upload of large files)
Preserves file history/metadata
More accurate representation of changes
Reduced bandwidth usage

Example:

# Before sync:
# Source: /docs/report.pdf (hash: abc123)
# Destination: /docs/report.pdf (hash: abc123)

# User renames at source:
# Source: /docs/annual_report_2024.pdf (hash: abc123)

# After sync with rename detection:
# Source: /docs/annual_report_2024.pdf (hash: abc123)
# Destination: /docs/annual_report_2024.pdf (hash: abc123)
# Action: RENAME (not DELETE + UPLOAD)

Conflict Resolution

Conflicts occur when the same file is modified in both locations since the last sync.

Conflict Resolution Strategies

NEWEST_WINS (default)

The file with the most recent modification time wins:

from syncengine import ConflictResolution

pair = SyncPair(
    ...,
    conflict_resolution=ConflictResolution.NEWEST_WINS
)

SOURCE_WINS

Source file always wins conflicts:

pair = SyncPair(
    ...,
    conflict_resolution=ConflictResolution.SOURCE_WINS
)

DESTINATION_WINS

Destination file always wins conflicts:

pair = SyncPair(
    ...,
    conflict_resolution=ConflictResolution.DESTINATION_WINS
)

MANUAL

Conflicts are reported but not resolved automatically:

def handle_conflict(conflict_info):
    print(f"Conflict: {conflict_info.path}")
    print(f"Source modified: {conflict_info.source_mtime}")
    print(f"Destination modified: {conflict_info.dest_mtime}")
    # Return 'source', 'destination', or 'skip'
    return 'source'

pair = SyncPair(
    ...,
    conflict_resolution=ConflictResolution.MANUAL,
    conflict_handler=handle_conflict
)

Ignore Patterns

SyncEngine uses gitignore-style patterns to exclude files from sync.

Pattern Syntax

*.tmp - Ignore all .tmp files
*.log - Ignore all .log files
/build/ - Ignore build directory at root
build/ - Ignore all build directories
**/node_modules/ - Ignore node_modules anywhere
!important.log - Don’t ignore important.log (negation)
*.py[cod] - Ignore .pyc, .pyo, .pyd files
#comment - Comments (ignored)

Creating an Ignore File

Create a .syncignore file in your source root:

# Ignore compiled Python files
*.pyc
__pycache__/

# Ignore OS files
.DS_Store
Thumbs.db

# Ignore development files
.vscode/
.idea/
*.swp

# Ignore build artifacts
build/
dist/
*.egg-info/

# Ignore logs
*.log

# But keep important logs
!critical.log

Using Ignore Patterns Programmatically

from syncengine import IgnoreFileManager

ignore_manager = IgnoreFileManager()

# Add individual patterns
ignore_manager.add_pattern("*.tmp")
ignore_manager.add_pattern("*.log")

# Load from file
ignore_manager.load_from_file(".syncignore")

# Check if path should be ignored
if ignore_manager.should_ignore("test.tmp"):
    print("File is ignored")

# Use with sync pair
pair = SyncPair(
    ...,
    ignore_manager=ignore_manager
)

State Management

State management is crucial for efficient incremental syncs.

State Directory Structure

.sync_state/
├── source_tree.json       # Last known source state
├── destination_tree.json  # Last known destination state
└── sync_metadata.json     # Sync metadata

State Files

source_tree.json

Stores information about each file in the source:

{
  "path/to/file.txt": {
    "hash": "abc123...",
    "size": 1024,
    "mtime": 1609459200.0,
    "is_dir": false
  }
}

destination_tree.json

Stores information about each file in the destination:

{
  "path/to/file.txt": {
    "id": 12345,
    "hash": "abc123...",
    "size": 1024,
    "mtime": 1609459200.0,
    "is_dir": false
  }
}

State Manager API

from syncengine import SyncStateManager

# Create state manager
state_manager = SyncStateManager("/path/to/.sync_state")

# Load previous state
source_tree = state_manager.load_source_tree()
dest_tree = state_manager.load_destination_tree()

# Save new state after sync
state_manager.save_source_tree(new_source_tree)
state_manager.save_destination_tree(new_dest_tree)

# Clear state (force full resync)
state_manager.clear()

Concurrency and Performance

SyncEngine uses concurrent operations for efficiency.

Concurrency Model

SyncEngine uses two types of concurrency limits:

Transfer Limit: Maximum concurrent uploads/downloads
Operations Limit: Maximum concurrent file operations (list, delete, etc.)

from syncengine import ConcurrencyLimits

limits = ConcurrencyLimits(
    transfers=5,      # Max 5 concurrent uploads/downloads
    operations=10     # Max 10 concurrent file operations
)

Choosing Limits

Transfer Limit

Too high: May saturate bandwidth, cause timeouts
Too low: Underutilizes bandwidth, slower sync
Recommended: 3-10 depending on bandwidth and file sizes

Operations Limit

Too high: May overwhelm storage API, cause rate limiting
Too low: Slower listing/deletion of many small files
Recommended: 10-50 depending on storage API limits

Performance Tips

Use state management: Dramatically speeds up incremental syncs
Optimize concurrency: Balance based on your use case
Use ignore patterns: Skip unnecessary files
Choose appropriate sync mode: Don’t use TWO_WAY if you only need one-way
Monitor progress: Use progress callbacks to identify bottlenecks

Pause, Resume, and Cancel

Control sync execution at runtime.

Basic Usage

from syncengine import SyncPauseController
import threading

controller = SyncPauseController()
engine = SyncEngine(
    ...,
    pause_controller=controller
)

# Start sync in background
def run_sync():
    stats = engine.sync_pair(pair)
    print(f"Sync complete: {stats}")

sync_thread = threading.Thread(target=run_sync)
sync_thread.start()

# Pause sync
controller.pause()
print("Sync paused")

# Resume sync
controller.resume()
print("Sync resumed")

# Cancel sync
controller.cancel()
print("Sync cancelled")

sync_thread.join()

Pause Behavior

When paused:

Current operations complete
No new operations start
State is preserved
Can resume at any time

Cancel Behavior

When cancelled:

Current operations complete
No new operations start
State is saved (partial sync)
Cannot resume (need to start new sync)

Progress Tracking

Monitor sync progress with detailed callbacks.

Progress Events

SyncEngine emits various progress events:

scan_start: Starting to scan files
scan_progress: Scanning progress
scan_complete: Scan complete
sync_start: Starting sync operations
upload_start: Starting upload
upload_progress: Upload progress (bytes transferred)
upload_complete: Upload complete
download_start: Starting download
download_progress: Download progress (bytes transferred)
download_complete: Download complete
delete_start: Starting delete
delete_complete: Delete complete
sync_complete: All sync operations complete

Progress Callback

from syncengine import SyncProgressTracker, SyncProgressEvent

def on_progress(event: SyncProgressEvent):
    if event.type == "upload_progress":
        percent = (event.bytes_transferred / event.total_bytes) * 100
        print(f"Uploading {event.file_path}: {percent:.1f}%")

    elif event.type == "download_progress":
        percent = (event.bytes_transferred / event.total_bytes) * 100
        print(f"Downloading {event.file_path}: {percent:.1f}%")

    elif event.type == "sync_complete":
        print(f"Sync complete: {event.stats}")

tracker = SyncProgressTracker(callback=on_progress)
engine = SyncEngine(
    ...,
    progress_tracker=tracker
)

Custom Progress UI

You can build custom progress UIs using the progress callbacks:

class ProgressUI:
    def __init__(self):
        self.current_file = None
        self.total_files = 0
        self.completed_files = 0

    def on_progress(self, event: SyncProgressEvent):
        if event.type == "sync_start":
            self.total_files = event.total_files
            print(f"Starting sync of {self.total_files} files")

        elif event.type == "upload_start":
            self.current_file = event.file_path
            print(f"Uploading: {self.current_file}")

        elif event.type == "upload_complete":
            self.completed_files += 1
            print(f"Completed {self.completed_files}/{self.total_files}")

        # ... handle other events

ui = ProgressUI()
tracker = SyncProgressTracker(callback=ui.on_progress)

Next Steps

Sync Modes Reference - Detailed explanation of each sync mode
Storage Protocols - Implement custom storage backends
Examples - Advanced usage examples
API Reference - Complete API documentation