Checkpoint file

Once the crawler is running, it will write checkpoint information and statistics in:

~/.fscrawler/{job_name}/_checkpoint.json

The checkpoint file serves multiple purposes:

Progress tracking: It tracks the current state of the scan, including which directories have been processed and which are pending.
Resume capability: If FSCrawler is stopped or crashes during a scan, it will automatically resume from where it left off when restarted.
Scan history: After a successful scan, the checkpoint stores when the last scan completed (scan_end_time) and when the next scan should run (next_check).

The information in this file includes:

scan_id: unique identifier for the current scan session
state: current state (RUNNING, PAUSED, STOPPED, COMPLETED, ERROR)
scan_start_time: when the current/last scan started
scan_end_time: when the last scan completed (only for COMPLETED state)
next_check: next time the job will be checked for new files
current_path: the directory currently being processed
pending_paths: directories waiting to be processed
completed_paths: directories that have been fully processed
files_processed: total number of files indexed during the scan
files_deleted: total number of files removed during the scan
retry_count: number of retry attempts after network errors
last_error: last error message encountered (if any)

For example, a checkpoint for a completed scan:

{
  "files_deleted": 0,
  "files_processed": 100,
  "next_check": "2025-07-01T12:15:00",
  "scan_end_time": "2025-07-01T12:00:00",
  "state": "COMPLETED"
}

A checkpoint for a running scan:

{
  "completed_paths": ["/data/documents", "/data/documents/processed"],
  "current_path": "/data/documents/subfolder",
  "files_deleted": 0,
  "files_processed": 50,
  "pending_paths": ["/data/documents/other"],
  "retry_count": 0,
  "scan_id": "abc123-def456",
  "scan_start_time": "2025-07-01T12:00:00",
  "state": "RUNNING"
}

Forcing a new scan

If you don’t want to wait for the next scheduled scan, you can manually edit the ~/.fscrawler/{job_name}/_checkpoint.json file and set next_check to the current time or to null. FSCrawler will then start a new scan at most after 5 seconds.

You can also use the REST API to check the status and force a scan by clearing the checkpoint. See REST service for more details.

Migration from previous versions

Note

In versions prior to 3.0, FSCrawler used a _status.json file to store scan information. When upgrading to 3.0 or later, FSCrawler will automatically migrate the data from _status.json to the new _checkpoint.json format. The old _status.json file will be removed after successful migration.