Checkpoint file
Once the crawler is running, it will write checkpoint information and statistics in:
~/.fscrawler/{job_name}/_checkpoint.json
The checkpoint file serves multiple purposes:
Progress tracking: It tracks the current state of the scan, including which directories have been processed and which are pending.
Resume capability: If FSCrawler is stopped or crashes during a scan, it will automatically resume from where it left off when restarted.
Scan history: After a successful scan, the checkpoint stores when the last scan completed (
scan_end_time) and when the next scan should run (next_check).
The information in this file includes:
scan_id: unique identifier for the current scan sessionstate: current state (RUNNING,PAUSED,STOPPED,COMPLETED,ERROR)scan_start_time: when the current/last scan startedscan_end_time: when the last scan completed (only forCOMPLETEDstate)next_check: next time the job will be checked for new filescurrent_path: the directory currently being processedpending_paths: directories waiting to be processedcompleted_paths: directories that have been fully processedfiles_processed: total number of files indexed during the scanfiles_deleted: total number of files removed during the scanretry_count: number of retry attempts after network errorslast_error: last error message encountered (if any)
For example, a checkpoint for a completed scan:
{
"state": "COMPLETED",
"scan_end_time": "2025-07-01T12:00:00",
"next_check": "2025-07-01T12:15:00",
"files_processed": 100,
"files_deleted": 0
}
A checkpoint for a running scan:
{
"scan_id": "abc123-def456",
"state": "RUNNING",
"scan_start_time": "2025-07-01T12:00:00",
"current_path": "/data/documents/subfolder",
"pending_paths": ["/data/documents/other"],
"completed_paths": ["/data/documents", "/data/documents/processed"],
"files_processed": 50,
"files_deleted": 0,
"retry_count": 0
}
Forcing a new scan
If you don’t want to wait for the next scheduled scan, you can manually edit the
~/.fscrawler/{job_name}/_checkpoint.json file and set next_check to the
current time or to null. FSCrawler will then start a new scan at most after 5 seconds.
You can also use the REST API to check the status and force a scan by clearing the checkpoint. See REST service for more details.
Migration from previous versions
Note
In versions prior to 2.10, FSCrawler used a _status.json file to store scan information.
When upgrading to 2.10 or later, FSCrawler will automatically migrate the data from
_status.json to the new _checkpoint.json format. The old _status.json file
will be removed after successful migration.