Core Concepts

This section covers the fundamental concepts in SyncEngine.

File State and Change Detection

SyncEngine maintains three views of your files:

  1. Source State: Current files in the source location

  2. Destination State: Current files in the destination location

  3. Last Known State: Files as they were during the last sync

By comparing these three states, SyncEngine can determine what happened to each file:

  • Created: File exists now but didn’t exist in last state

  • Modified: File exists now with different content than last state

  • Deleted: File existed in last state but doesn’t exist now

  • Renamed/Moved: File with same content exists at different path

  • Unchanged: File exists with same content as last state

  • Conflict: File was modified in both locations since last sync

State Comparison Matrix

Last State

Source

Destination

Interpretation

None

Exists

None

Created at source

None

None

Exists

Created at destination

Exists

Modified

Same

Modified at source

Exists

Same

Modified

Modified at destination

Exists

Modified

Modified

Conflict (both changed)

Exists

None

Same

Deleted at source

Exists

Same

None

Deleted at destination

Exists

None

None

Deleted both sides

Comparison Modes

Comparison modes control how SyncEngine determines if two files are identical. This is crucial for deciding whether to skip a file or sync it.

Available Comparison Modes

SyncEngine provides five comparison modes, each optimized for different scenarios:

HASH_THEN_MTIME (Default)

Balanced approach that uses hash when available, falls back to mtime:

from syncengine.models import ComparisonMode, SyncConfig

config = SyncConfig(
    comparison_mode=ComparisonMode.HASH_THEN_MTIME
)

How it works:

  1. Compare file sizes first (fast check)

  2. If both files have hash values, compare hashes

  3. If hash unavailable, compare modification times

  4. Files with matching hash are considered identical, even if mtime differs

SIZE_ONLY

Only compares file sizes, ignores hash and mtime:

config = SyncConfig(
    comparison_mode=ComparisonMode.SIZE_ONLY
)

How it works:

  1. Files with same size are considered identical

  2. Hash and mtime are completely ignored

  3. When sizes differ in TWO_WAY mode → CONFLICT (cannot determine newer file)

  4. When sizes differ in one-way mode → Uses sync direction

Use cases:

  • Encrypted storage where hash is unavailable or unreliable

  • Cloud vaults where mtime is upload time, not original file time

  • Scenarios where hash computation is too expensive

HASH_ONLY

Strict content verification using only hash, ignores size and mtime:

config = SyncConfig(
    comparison_mode=ComparisonMode.HASH_ONLY
)

How it works:

  1. Only compares content hashes

  2. Raises error if hash is unavailable

  3. When hashes differ in TWO_WAY mode → CONFLICT (cannot determine newer file)

  4. When hashes differ in one-way mode → Uses sync direction

Use cases:

  • Content-critical applications requiring strict verification

  • Systems where mtime is completely unreliable

  • Hash is always available and trusted

MTIME_ONLY

Fast time-based sync without hash computation:

config = SyncConfig(
    comparison_mode=ComparisonMode.MTIME_ONLY
)

How it works:

  1. Only compares modification times (±2 second tolerance)

  2. Ignores file size and hash

  3. When mtimes differ → Newer file wins

Use cases:

  • Performance-critical scenarios with reliable timestamps

  • Large files where hash computation is expensive

  • Systems with accurate clock synchronization

SIZE_AND_MTIME

Balanced approach for systems without hash support:

config = SyncConfig(
    comparison_mode=ComparisonMode.SIZE_AND_MTIME
)

How it works:

  1. Files must match in BOTH size AND mtime (±2 second tolerance)

  2. Hash is completely ignored

  3. When files differ → Newer file wins (uses mtime)

Use cases:

  • Storage backends that don’t provide content hashes

  • Reliable systems with accurate timestamps

  • Balance between performance and accuracy

Comparison Mode Behavior Matrix

Mode

Files Match If

Files Differ Direction

Best For

HASH_THEN_MTIME

Size equal AND (hash equal OR hash unavailable)

Uses mtime to determine newer

Most scenarios (default)

SIZE_ONLY

Size equal

TWO_WAY: CONFLICT, One-way: sync direction

Encrypted vaults, unreliable mtime

HASH_ONLY

Hash equal

TWO_WAY: CONFLICT, One-way: sync direction

Content-critical, unreliable mtime

MTIME_ONLY

Mtime equal (±2s)

Uses mtime to determine newer

Performance-critical, reliable clocks

SIZE_AND_MTIME

Size equal AND mtime equal (±2s)

Uses mtime to determine newer

No hash support, reliable timestamps

Important: SIZE_ONLY and HASH_ONLY with TWO_WAY Sync

When using SIZE_ONLY or HASH_ONLY comparison modes with TWO_WAY sync mode, files that differ will result in CONFLICT because:

  • These modes are chosen specifically when mtime is unreliable

  • Without reliable mtime, the engine cannot determine which file is newer

  • In one-way sync modes, the sync direction determines which file wins

Example: Encrypted Vault Sync

from syncengine import SyncEngine
from syncengine.models import ComparisonMode, SyncConfig
from syncengine.modes import SyncMode

# Vault doesn't provide content hashes
# Vault mtime is upload time, not original file mtime
config = SyncConfig(
    comparison_mode=ComparisonMode.SIZE_ONLY
)

# For initial upload, use SOURCE_TO_DESTINATION
# This avoids conflicts since sync direction is clear
engine = SyncEngine(mode=SyncMode.SOURCE_TO_DESTINATION)
stats = engine.sync_pair(pair, config=config)

# Files with same size are considered identical
# No re-uploads on subsequent syncs!

Example: Strict Content Verification

config = SyncConfig(
    comparison_mode=ComparisonMode.HASH_ONLY
)

# Requires hash on both sides
# Ignores timestamps completely
# Ensures content integrity
stats = engine.sync_pair(pair, config=config)

Example: Fast Time-Based Sync

config = SyncConfig(
    comparison_mode=ComparisonMode.MTIME_ONLY
)

# Skips hash computation for large files
# Relies on accurate timestamps
# Much faster for large datasets
stats = engine.sync_pair(pair, config=config)

Sync Actions

Based on the detected changes and the sync mode, SyncEngine determines which actions to take:

Upload Actions

  • UPLOAD_NEW: Upload a new file to destination

  • UPLOAD_UPDATE: Upload changes to an existing destination file

  • UPLOAD_RESTORE: Re-upload a file that was deleted at destination

Download Actions

  • DOWNLOAD_NEW: Download a new file from destination

  • DOWNLOAD_UPDATE: Download changes to an existing source file

  • DOWNLOAD_RESTORE: Re-download a file that was deleted at source

Delete Actions

  • DELETE_SOURCE: Delete a file from source

  • DELETE_DESTINATION: Delete a file from destination

Other Actions

  • NO_ACTION: File is already in sync, no action needed

  • CONFLICT: Manual resolution required

Rename and Move Detection

SyncEngine can detect when files are renamed or moved (not just deleted and re-added):

How It Works

  1. Scanner creates a hash of each file’s content

  2. When comparing states, files are matched by content hash

  3. If a file with the same hash appears at a different path, it’s recognized as a rename/move

  4. The rename/move is replicated to the other side instead of delete+upload

Benefits

  • Faster sync (no re-upload of large files)

  • Preserves file history/metadata

  • More accurate representation of changes

  • Reduced bandwidth usage

Example:

# Before sync:
# Source: /docs/report.pdf (hash: abc123)
# Destination: /docs/report.pdf (hash: abc123)

# User renames at source:
# Source: /docs/annual_report_2024.pdf (hash: abc123)

# After sync with rename detection:
# Source: /docs/annual_report_2024.pdf (hash: abc123)
# Destination: /docs/annual_report_2024.pdf (hash: abc123)
# Action: RENAME (not DELETE + UPLOAD)

Conflict Resolution

Conflicts occur when the same file is modified in both locations since the last sync.

Conflict Resolution Strategies

NEWEST_WINS (default)

The file with the most recent modification time wins:

from syncengine import ConflictResolution

pair = SyncPair(
    ...,
    conflict_resolution=ConflictResolution.NEWEST_WINS
)

SOURCE_WINS

Source file always wins conflicts:

pair = SyncPair(
    ...,
    conflict_resolution=ConflictResolution.SOURCE_WINS
)

DESTINATION_WINS

Destination file always wins conflicts:

pair = SyncPair(
    ...,
    conflict_resolution=ConflictResolution.DESTINATION_WINS
)

MANUAL

Conflicts are reported but not resolved automatically:

def handle_conflict(conflict_info):
    print(f"Conflict: {conflict_info.path}")
    print(f"Source modified: {conflict_info.source_mtime}")
    print(f"Destination modified: {conflict_info.dest_mtime}")
    # Return 'source', 'destination', or 'skip'
    return 'source'

pair = SyncPair(
    ...,
    conflict_resolution=ConflictResolution.MANUAL,
    conflict_handler=handle_conflict
)

Ignore Patterns

SyncEngine uses gitignore-style patterns to exclude files from sync.

Pattern Syntax

  • *.tmp - Ignore all .tmp files

  • *.log - Ignore all .log files

  • /build/ - Ignore build directory at root

  • build/ - Ignore all build directories

  • **/node_modules/ - Ignore node_modules anywhere

  • !important.log - Don’t ignore important.log (negation)

  • *.py[cod] - Ignore .pyc, .pyo, .pyd files

  • #comment - Comments (ignored)

Creating an Ignore File

Create a .syncignore file in your source root:

# Ignore compiled Python files
*.pyc
__pycache__/

# Ignore OS files
.DS_Store
Thumbs.db

# Ignore development files
.vscode/
.idea/
*.swp

# Ignore build artifacts
build/
dist/
*.egg-info/

# Ignore logs
*.log

# But keep important logs
!critical.log

Using Ignore Patterns Programmatically

from syncengine import IgnoreFileManager

ignore_manager = IgnoreFileManager()

# Add individual patterns
ignore_manager.add_pattern("*.tmp")
ignore_manager.add_pattern("*.log")

# Load from file
ignore_manager.load_from_file(".syncignore")

# Check if path should be ignored
if ignore_manager.should_ignore("test.tmp"):
    print("File is ignored")

# Use with sync pair
pair = SyncPair(
    ...,
    ignore_manager=ignore_manager
)

State Management

State management is crucial for efficient incremental syncs.

State Directory Structure

.sync_state/
├── source_tree.json       # Last known source state
├── destination_tree.json  # Last known destination state
└── sync_metadata.json     # Sync metadata

State Files

source_tree.json

Stores information about each file in the source:

{
  "path/to/file.txt": {
    "hash": "abc123...",
    "size": 1024,
    "mtime": 1609459200.0,
    "is_dir": false
  }
}

destination_tree.json

Stores information about each file in the destination:

{
  "path/to/file.txt": {
    "id": 12345,
    "hash": "abc123...",
    "size": 1024,
    "mtime": 1609459200.0,
    "is_dir": false
  }
}

State Manager API

from syncengine import SyncStateManager

# Create state manager
state_manager = SyncStateManager("/path/to/.sync_state")

# Load previous state
source_tree = state_manager.load_source_tree()
dest_tree = state_manager.load_destination_tree()

# Save new state after sync
state_manager.save_source_tree(new_source_tree)
state_manager.save_destination_tree(new_dest_tree)

# Clear state (force full resync)
state_manager.clear()

Concurrency and Performance

SyncEngine uses concurrent operations for efficiency.

Concurrency Model

SyncEngine uses two types of concurrency limits:

  1. Transfer Limit: Maximum concurrent uploads/downloads

  2. Operations Limit: Maximum concurrent file operations (list, delete, etc.)

from syncengine import ConcurrencyLimits

limits = ConcurrencyLimits(
    transfers=5,      # Max 5 concurrent uploads/downloads
    operations=10     # Max 10 concurrent file operations
)

Choosing Limits

Transfer Limit

  • Too high: May saturate bandwidth, cause timeouts

  • Too low: Underutilizes bandwidth, slower sync

  • Recommended: 3-10 depending on bandwidth and file sizes

Operations Limit

  • Too high: May overwhelm storage API, cause rate limiting

  • Too low: Slower listing/deletion of many small files

  • Recommended: 10-50 depending on storage API limits

Performance Tips

  1. Use state management: Dramatically speeds up incremental syncs

  2. Optimize concurrency: Balance based on your use case

  3. Use ignore patterns: Skip unnecessary files

  4. Choose appropriate sync mode: Don’t use TWO_WAY if you only need one-way

  5. Monitor progress: Use progress callbacks to identify bottlenecks

Pause, Resume, and Cancel

Control sync execution at runtime.

Basic Usage

from syncengine import SyncPauseController
import threading

controller = SyncPauseController()
engine = SyncEngine(
    ...,
    pause_controller=controller
)

# Start sync in background
def run_sync():
    stats = engine.sync_pair(pair)
    print(f"Sync complete: {stats}")

sync_thread = threading.Thread(target=run_sync)
sync_thread.start()

# Pause sync
controller.pause()
print("Sync paused")

# Resume sync
controller.resume()
print("Sync resumed")

# Cancel sync
controller.cancel()
print("Sync cancelled")

sync_thread.join()

Pause Behavior

When paused:

  • Current operations complete

  • No new operations start

  • State is preserved

  • Can resume at any time

Cancel Behavior

When cancelled:

  • Current operations complete

  • No new operations start

  • State is saved (partial sync)

  • Cannot resume (need to start new sync)

Progress Tracking

Monitor sync progress with detailed callbacks.

Progress Events

SyncEngine emits various progress events:

  • scan_start: Starting to scan files

  • scan_progress: Scanning progress

  • scan_complete: Scan complete

  • sync_start: Starting sync operations

  • upload_start: Starting upload

  • upload_progress: Upload progress (bytes transferred)

  • upload_complete: Upload complete

  • download_start: Starting download

  • download_progress: Download progress (bytes transferred)

  • download_complete: Download complete

  • delete_start: Starting delete

  • delete_complete: Delete complete

  • sync_complete: All sync operations complete

Progress Callback

from syncengine import SyncProgressTracker, SyncProgressEvent

def on_progress(event: SyncProgressEvent):
    if event.type == "upload_progress":
        percent = (event.bytes_transferred / event.total_bytes) * 100
        print(f"Uploading {event.file_path}: {percent:.1f}%")

    elif event.type == "download_progress":
        percent = (event.bytes_transferred / event.total_bytes) * 100
        print(f"Downloading {event.file_path}: {percent:.1f}%")

    elif event.type == "sync_complete":
        print(f"Sync complete: {event.stats}")

tracker = SyncProgressTracker(callback=on_progress)
engine = SyncEngine(
    ...,
    progress_tracker=tracker
)

Custom Progress UI

You can build custom progress UIs using the progress callbacks:

class ProgressUI:
    def __init__(self):
        self.current_file = None
        self.total_files = 0
        self.completed_files = 0

    def on_progress(self, event: SyncProgressEvent):
        if event.type == "sync_start":
            self.total_files = event.total_files
            print(f"Starting sync of {self.total_files} files")

        elif event.type == "upload_start":
            self.current_file = event.file_path
            print(f"Uploading: {self.current_file}")

        elif event.type == "upload_complete":
            self.completed_files += 1
            print(f"Completed {self.completed_files}/{self.total_files}")

        # ... handle other events

ui = ProgressUI()
tracker = SyncProgressTracker(callback=ui.on_progress)

Next Steps