Testing FSCrawler CLI

A convenience shell script is provided to spin up the full EDOT observability stack and run FSCrawler against the built-in test documents in a single command.

Prerequisites

  • Java 17+

  • Maven 3.3+

  • Docker and either docker compose (v2) or docker-compose (v1)

  • curl, unzip

Quick start

From the project root:

# Full run: build → start docker stack → crawl with OTel tracing
./distribution/test-scripts/test-fscrawler-cli.sh

Note

distribution/test-scripts/test-fscrawler-cli.sh is a generated file — the project version is injected by Maven resource filtering at build time. The source template is distribution/src/test/scripts/test-fscrawler-cli.sh. After a version bump, regenerate it with:

mvn generate-test-resources -pl distribution

The script will:

  1. Build the distribution ZIP (mvn clean package -DskipTests -Ddocker.skip)

  2. Start Elasticsearch + Kibana + EDOT Collector via docker-compose

  3. Unzip the distribution into /tmp/fscrawler-edot-test/

  4. Create a job config pointing to test-documents/src/main/resources/documents/

  5. Set OTEL_* environment variables and launch FSCrawler with the REST API enabled (--rest)

FSCrawler runs until you press Ctrl+C. While it is running:

  • Elasticsearchhttp://localhost:9200/test-edot/_search

  • Kibana APMhttp://localhost:5601Observability → APM → service fscrawler

Options

Flag

Effect

--skip-build

Reuse the existing distribution ZIP (skip Maven build)

--skip-docker

Assume the docker-compose stack is already running

--no-otel

Disable OTel tracing (OTEL_SDK_DISABLED=true); crawl without EDOT Collector

--log-level=<level>

Set the FSCrawler log level (default: info; try debug or trace)

--help

Print usage

Examples:

# Iterate quickly: keep docker stack running, only rebuild + recrawl
./distribution/test-scripts/test-fscrawler-cli.sh --skip-docker

# Rebuild and crawl, but disable tracing (baseline comparison)
./distribution/test-scripts/test-fscrawler-cli.sh --no-otel

# Fastest iteration: nothing to (re)build, stack already up
./distribution/test-scripts/test-fscrawler-cli.sh --skip-build --skip-docker

# Debug log level with stack already running
./distribution/test-scripts/test-fscrawler-cli.sh --skip-build --skip-docker --log-level=debug

What to look for in Kibana APM

After the crawl, open Kibana at http://localhost:5601. Navigate to Observability → APM → Services → fscrawler.

You should see traces containing the following spans in a waterfall view:

fscrawler.crawl                      ← one span per run
└─ fscrawler.directory.traverse
   └─ fscrawler.directory.process    ← one per directory
      └─ fscrawler.file.index        ← one per file
         └─ fscrawler.tika.extract   ← Tika text extraction
fscrawler.es.bulk                    ← Elasticsearch bulk calls

Auto-instrumented spans (HTTP, ES client, etc.) also appear as children of the root trace thanks to the elastic-otel-javaagent.

Stopping the stack

When you’re done, stop the docker-compose services:

docker compose -f contrib/docker-compose-example-edot/docker-compose.yml down

Or to also remove the volumes (wipes Elasticsearch data):

docker compose -f contrib/docker-compose-example-edot/docker-compose.yml down -v