Core Concepts ============= This section covers the fundamental concepts in SyncEngine. File State and Change Detection -------------------------------- SyncEngine maintains three views of your files: 1. **Source State**: Current files in the source location 2. **Destination State**: Current files in the destination location 3. **Last Known State**: Files as they were during the last sync By comparing these three states, SyncEngine can determine what happened to each file: * **Created**: File exists now but didn't exist in last state * **Modified**: File exists now with different content than last state * **Deleted**: File existed in last state but doesn't exist now * **Renamed/Moved**: File with same content exists at different path * **Unchanged**: File exists with same content as last state * **Conflict**: File was modified in both locations since last sync State Comparison Matrix ~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 20 20 20 40 * - Last State - Source - Destination - Interpretation * - None - Exists - None - Created at source * - None - None - Exists - Created at destination * - Exists - Modified - Same - Modified at source * - Exists - Same - Modified - Modified at destination * - Exists - Modified - Modified - Conflict (both changed) * - Exists - None - Same - Deleted at source * - Exists - Same - None - Deleted at destination * - Exists - None - None - Deleted both sides Comparison Modes ---------------- Comparison modes control how SyncEngine determines if two files are identical. This is crucial for deciding whether to skip a file or sync it. Available Comparison Modes ~~~~~~~~~~~~~~~~~~~~~~~~~~ SyncEngine provides five comparison modes, each optimized for different scenarios: **HASH_THEN_MTIME** (Default) Balanced approach that uses hash when available, falls back to mtime: .. code-block:: python from syncengine.models import ComparisonMode, SyncConfig config = SyncConfig( comparison_mode=ComparisonMode.HASH_THEN_MTIME ) How it works: 1. Compare file sizes first (fast check) 2. If both files have hash values, compare hashes 3. If hash unavailable, compare modification times 4. Files with matching hash are considered identical, even if mtime differs **SIZE_ONLY** Only compares file sizes, ignores hash and mtime: .. code-block:: python config = SyncConfig( comparison_mode=ComparisonMode.SIZE_ONLY ) How it works: 1. Files with same size are considered identical 2. Hash and mtime are completely ignored 3. When sizes differ in TWO_WAY mode → CONFLICT (cannot determine newer file) 4. When sizes differ in one-way mode → Uses sync direction Use cases: * Encrypted storage where hash is unavailable or unreliable * Cloud vaults where mtime is upload time, not original file time * Scenarios where hash computation is too expensive **HASH_ONLY** Strict content verification using only hash, ignores size and mtime: .. code-block:: python config = SyncConfig( comparison_mode=ComparisonMode.HASH_ONLY ) How it works: 1. Only compares content hashes 2. Raises error if hash is unavailable 3. When hashes differ in TWO_WAY mode → CONFLICT (cannot determine newer file) 4. When hashes differ in one-way mode → Uses sync direction Use cases: * Content-critical applications requiring strict verification * Systems where mtime is completely unreliable * Hash is always available and trusted **MTIME_ONLY** Fast time-based sync without hash computation: .. code-block:: python config = SyncConfig( comparison_mode=ComparisonMode.MTIME_ONLY ) How it works: 1. Only compares modification times (±2 second tolerance) 2. Ignores file size and hash 3. When mtimes differ → Newer file wins Use cases: * Performance-critical scenarios with reliable timestamps * Large files where hash computation is expensive * Systems with accurate clock synchronization **SIZE_AND_MTIME** Balanced approach for systems without hash support: .. code-block:: python config = SyncConfig( comparison_mode=ComparisonMode.SIZE_AND_MTIME ) How it works: 1. Files must match in BOTH size AND mtime (±2 second tolerance) 2. Hash is completely ignored 3. When files differ → Newer file wins (uses mtime) Use cases: * Storage backends that don't provide content hashes * Reliable systems with accurate timestamps * Balance between performance and accuracy Comparison Mode Behavior Matrix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 20 20 20 40 * - Mode - Files Match If - Files Differ Direction - Best For * - HASH_THEN_MTIME - Size equal AND (hash equal OR hash unavailable) - Uses mtime to determine newer - Most scenarios (default) * - SIZE_ONLY - Size equal - TWO_WAY: CONFLICT, One-way: sync direction - Encrypted vaults, unreliable mtime * - HASH_ONLY - Hash equal - TWO_WAY: CONFLICT, One-way: sync direction - Content-critical, unreliable mtime * - MTIME_ONLY - Mtime equal (±2s) - Uses mtime to determine newer - Performance-critical, reliable clocks * - SIZE_AND_MTIME - Size equal AND mtime equal (±2s) - Uses mtime to determine newer - No hash support, reliable timestamps Important: SIZE_ONLY and HASH_ONLY with TWO_WAY Sync ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When using ``SIZE_ONLY`` or ``HASH_ONLY`` comparison modes with ``TWO_WAY`` sync mode, files that differ will result in **CONFLICT** because: * These modes are chosen specifically when **mtime is unreliable** * Without reliable mtime, the engine cannot determine which file is newer * In one-way sync modes, the sync direction determines which file wins Example: Encrypted Vault Sync ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from syncengine import SyncEngine from syncengine.models import ComparisonMode, SyncConfig from syncengine.modes import SyncMode # Vault doesn't provide content hashes # Vault mtime is upload time, not original file mtime config = SyncConfig( comparison_mode=ComparisonMode.SIZE_ONLY ) # For initial upload, use SOURCE_TO_DESTINATION # This avoids conflicts since sync direction is clear engine = SyncEngine(mode=SyncMode.SOURCE_TO_DESTINATION) stats = engine.sync_pair(pair, config=config) # Files with same size are considered identical # No re-uploads on subsequent syncs! Example: Strict Content Verification ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python config = SyncConfig( comparison_mode=ComparisonMode.HASH_ONLY ) # Requires hash on both sides # Ignores timestamps completely # Ensures content integrity stats = engine.sync_pair(pair, config=config) Example: Fast Time-Based Sync ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python config = SyncConfig( comparison_mode=ComparisonMode.MTIME_ONLY ) # Skips hash computation for large files # Relies on accurate timestamps # Much faster for large datasets stats = engine.sync_pair(pair, config=config) Sync Actions ------------ Based on the detected changes and the sync mode, SyncEngine determines which actions to take: Upload Actions ~~~~~~~~~~~~~~ * ``UPLOAD_NEW``: Upload a new file to destination * ``UPLOAD_UPDATE``: Upload changes to an existing destination file * ``UPLOAD_RESTORE``: Re-upload a file that was deleted at destination Download Actions ~~~~~~~~~~~~~~~~ * ``DOWNLOAD_NEW``: Download a new file from destination * ``DOWNLOAD_UPDATE``: Download changes to an existing source file * ``DOWNLOAD_RESTORE``: Re-download a file that was deleted at source Delete Actions ~~~~~~~~~~~~~~ * ``DELETE_SOURCE``: Delete a file from source * ``DELETE_DESTINATION``: Delete a file from destination Other Actions ~~~~~~~~~~~~~ * ``NO_ACTION``: File is already in sync, no action needed * ``CONFLICT``: Manual resolution required Rename and Move Detection -------------------------- SyncEngine can detect when files are renamed or moved (not just deleted and re-added): How It Works ~~~~~~~~~~~~ 1. Scanner creates a hash of each file's content 2. When comparing states, files are matched by content hash 3. If a file with the same hash appears at a different path, it's recognized as a rename/move 4. The rename/move is replicated to the other side instead of delete+upload Benefits ~~~~~~~~ * Faster sync (no re-upload of large files) * Preserves file history/metadata * More accurate representation of changes * Reduced bandwidth usage Example: .. code-block:: python # Before sync: # Source: /docs/report.pdf (hash: abc123) # Destination: /docs/report.pdf (hash: abc123) # User renames at source: # Source: /docs/annual_report_2024.pdf (hash: abc123) # After sync with rename detection: # Source: /docs/annual_report_2024.pdf (hash: abc123) # Destination: /docs/annual_report_2024.pdf (hash: abc123) # Action: RENAME (not DELETE + UPLOAD) Conflict Resolution ------------------- Conflicts occur when the same file is modified in both locations since the last sync. Conflict Resolution Strategies ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **NEWEST_WINS** (default) The file with the most recent modification time wins: .. code-block:: python from syncengine import ConflictResolution pair = SyncPair( ..., conflict_resolution=ConflictResolution.NEWEST_WINS ) **SOURCE_WINS** Source file always wins conflicts: .. code-block:: python pair = SyncPair( ..., conflict_resolution=ConflictResolution.SOURCE_WINS ) **DESTINATION_WINS** Destination file always wins conflicts: .. code-block:: python pair = SyncPair( ..., conflict_resolution=ConflictResolution.DESTINATION_WINS ) **MANUAL** Conflicts are reported but not resolved automatically: .. code-block:: python def handle_conflict(conflict_info): print(f"Conflict: {conflict_info.path}") print(f"Source modified: {conflict_info.source_mtime}") print(f"Destination modified: {conflict_info.dest_mtime}") # Return 'source', 'destination', or 'skip' return 'source' pair = SyncPair( ..., conflict_resolution=ConflictResolution.MANUAL, conflict_handler=handle_conflict ) Ignore Patterns --------------- SyncEngine uses gitignore-style patterns to exclude files from sync. Pattern Syntax ~~~~~~~~~~~~~~ * ``*.tmp`` - Ignore all .tmp files * ``*.log`` - Ignore all .log files * ``/build/`` - Ignore build directory at root * ``build/`` - Ignore all build directories * ``**/node_modules/`` - Ignore node_modules anywhere * ``!important.log`` - Don't ignore important.log (negation) * ``*.py[cod]`` - Ignore .pyc, .pyo, .pyd files * ``#comment`` - Comments (ignored) Creating an Ignore File ~~~~~~~~~~~~~~~~~~~~~~~~ Create a ``.syncignore`` file in your source root: .. code-block:: text # Ignore compiled Python files *.pyc __pycache__/ # Ignore OS files .DS_Store Thumbs.db # Ignore development files .vscode/ .idea/ *.swp # Ignore build artifacts build/ dist/ *.egg-info/ # Ignore logs *.log # But keep important logs !critical.log Using Ignore Patterns Programmatically ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from syncengine import IgnoreFileManager ignore_manager = IgnoreFileManager() # Add individual patterns ignore_manager.add_pattern("*.tmp") ignore_manager.add_pattern("*.log") # Load from file ignore_manager.load_from_file(".syncignore") # Check if path should be ignored if ignore_manager.should_ignore("test.tmp"): print("File is ignored") # Use with sync pair pair = SyncPair( ..., ignore_manager=ignore_manager ) State Management ---------------- State management is crucial for efficient incremental syncs. State Directory Structure ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text .sync_state/ ├── source_tree.json # Last known source state ├── destination_tree.json # Last known destination state └── sync_metadata.json # Sync metadata State Files ~~~~~~~~~~~ **source_tree.json** Stores information about each file in the source: .. code-block:: json { "path/to/file.txt": { "hash": "abc123...", "size": 1024, "mtime": 1609459200.0, "is_dir": false } } **destination_tree.json** Stores information about each file in the destination: .. code-block:: json { "path/to/file.txt": { "id": 12345, "hash": "abc123...", "size": 1024, "mtime": 1609459200.0, "is_dir": false } } State Manager API ~~~~~~~~~~~~~~~~~ .. code-block:: python from syncengine import SyncStateManager # Create state manager state_manager = SyncStateManager("/path/to/.sync_state") # Load previous state source_tree = state_manager.load_source_tree() dest_tree = state_manager.load_destination_tree() # Save new state after sync state_manager.save_source_tree(new_source_tree) state_manager.save_destination_tree(new_dest_tree) # Clear state (force full resync) state_manager.clear() Concurrency and Performance ---------------------------- SyncEngine uses concurrent operations for efficiency. Concurrency Model ~~~~~~~~~~~~~~~~~ SyncEngine uses two types of concurrency limits: 1. **Transfer Limit**: Maximum concurrent uploads/downloads 2. **Operations Limit**: Maximum concurrent file operations (list, delete, etc.) .. code-block:: python from syncengine import ConcurrencyLimits limits = ConcurrencyLimits( transfers=5, # Max 5 concurrent uploads/downloads operations=10 # Max 10 concurrent file operations ) Choosing Limits ~~~~~~~~~~~~~~~ **Transfer Limit** * Too high: May saturate bandwidth, cause timeouts * Too low: Underutilizes bandwidth, slower sync * Recommended: 3-10 depending on bandwidth and file sizes **Operations Limit** * Too high: May overwhelm storage API, cause rate limiting * Too low: Slower listing/deletion of many small files * Recommended: 10-50 depending on storage API limits Performance Tips ~~~~~~~~~~~~~~~~ 1. **Use state management**: Dramatically speeds up incremental syncs 2. **Optimize concurrency**: Balance based on your use case 3. **Use ignore patterns**: Skip unnecessary files 4. **Choose appropriate sync mode**: Don't use TWO_WAY if you only need one-way 5. **Monitor progress**: Use progress callbacks to identify bottlenecks Pause, Resume, and Cancel -------------------------- Control sync execution at runtime. Basic Usage ~~~~~~~~~~~ .. code-block:: python from syncengine import SyncPauseController import threading controller = SyncPauseController() engine = SyncEngine( ..., pause_controller=controller ) # Start sync in background def run_sync(): stats = engine.sync_pair(pair) print(f"Sync complete: {stats}") sync_thread = threading.Thread(target=run_sync) sync_thread.start() # Pause sync controller.pause() print("Sync paused") # Resume sync controller.resume() print("Sync resumed") # Cancel sync controller.cancel() print("Sync cancelled") sync_thread.join() Pause Behavior ~~~~~~~~~~~~~~ When paused: * Current operations complete * No new operations start * State is preserved * Can resume at any time Cancel Behavior ~~~~~~~~~~~~~~~ When cancelled: * Current operations complete * No new operations start * State is saved (partial sync) * Cannot resume (need to start new sync) Progress Tracking ----------------- Monitor sync progress with detailed callbacks. Progress Events ~~~~~~~~~~~~~~~ SyncEngine emits various progress events: * ``scan_start``: Starting to scan files * ``scan_progress``: Scanning progress * ``scan_complete``: Scan complete * ``sync_start``: Starting sync operations * ``upload_start``: Starting upload * ``upload_progress``: Upload progress (bytes transferred) * ``upload_complete``: Upload complete * ``download_start``: Starting download * ``download_progress``: Download progress (bytes transferred) * ``download_complete``: Download complete * ``delete_start``: Starting delete * ``delete_complete``: Delete complete * ``sync_complete``: All sync operations complete Progress Callback ~~~~~~~~~~~~~~~~~ .. code-block:: python from syncengine import SyncProgressTracker, SyncProgressEvent def on_progress(event: SyncProgressEvent): if event.type == "upload_progress": percent = (event.bytes_transferred / event.total_bytes) * 100 print(f"Uploading {event.file_path}: {percent:.1f}%") elif event.type == "download_progress": percent = (event.bytes_transferred / event.total_bytes) * 100 print(f"Downloading {event.file_path}: {percent:.1f}%") elif event.type == "sync_complete": print(f"Sync complete: {event.stats}") tracker = SyncProgressTracker(callback=on_progress) engine = SyncEngine( ..., progress_tracker=tracker ) Custom Progress UI ~~~~~~~~~~~~~~~~~~ You can build custom progress UIs using the progress callbacks: .. code-block:: python class ProgressUI: def __init__(self): self.current_file = None self.total_files = 0 self.completed_files = 0 def on_progress(self, event: SyncProgressEvent): if event.type == "sync_start": self.total_files = event.total_files print(f"Starting sync of {self.total_files} files") elif event.type == "upload_start": self.current_file = event.file_path print(f"Uploading: {self.current_file}") elif event.type == "upload_complete": self.completed_files += 1 print(f"Completed {self.completed_files}/{self.total_files}") # ... handle other events ui = ProgressUI() tracker = SyncProgressTracker(callback=ui.on_progress) Next Steps ---------- * :doc:`sync_modes` - Detailed explanation of each sync mode * :doc:`protocols` - Implement custom storage backends * :doc:`examples` - Advanced usage examples * :doc:`api_reference` - Complete API documentation