Getting Started

You need to have at least Java 17 and have properly configured JAVA_HOME to point to your Java installation directory. For example on MacOS if you are using sdkman you can define in your ~/.bash_profile file:

export JAVA_HOME="~/.sdkman/candidates/java/current"

Start FSCrawler

Start FSCrawler with:

bin/fscrawler

FSCrawler will read a local file (default to ~/.fscrawler/fscrawler/_settings.yaml). If the file does not exist, you can ask to create it using the --setup command.

$ bin/fscrawler --setup
17:40:33,905 INFO  [f.console] You can edit the settings in [~/.fscrawler/fscrawler/_settings.yaml]. Then, you can run again fscrawler without the --setup option.

Create a directory named /tmp/es or c:\tmp\es, add some files you want to index in it and start again:

$ bin/fscrawler
17:41:45,395 INFO  [f.p.e.c.f.FsCrawlerImpl] FSCrawler is now connected to Elasticsearch version [9.0.0]
17:41:45,395 INFO  [f.p.e.c.f.FsCrawlerImpl] FSCrawler started in watch mode. It will run unless you stop it with CTRL+C.
17:41:45,395 INFO  [f.p.e.c.f.FsParser] FS crawler started for [fscrawler] for [/tmp/es] every [15m]

If you did not create the directory, FSCrawler will complain until you fix it:

17:41:45,396 INFO  [f.p.e.c.f.FsParser] Run #1: job [fscrawler]: starting...
17:41:45,397 WARN  [f.p.e.c.f.FsParser] Error while crawling /tmp/es: /tmp/es doesn't exists.

Searching for docs

This is a common use case in elasticsearch, we want to search for something! ;-)

// GET docs/doc/_search
{
  "query" : {
    "query_string": {
      "query": "I am searching for something !"
    }
  }
}

See Search examples for more examples.

Ignoring folders

If you would like to ignore some folders to be scanned, just add a .fscrawlerignore file in it. The folder content and all sub folders will be ignored.

For more information, read Includes and excludes.