Welcome to FSCrawler’s documentation!

Welcome to the FS Crawler for Elasticsearch.

This crawler helps to index binary documents such as PDF, Open Office, MS Office.

Main features:

  • Local file system (or a mounted drive) crawling and index new files, update existing ones and removes old ones.
  • Remote file system over SSH/FTP crawling.
  • REST interface to let you “upload” your binary documents to elasticsearch.

Note

FS Crawler 2.9 is using Tika 2.2.1 and:

License

Important

This software is licensed under the Apache 2 license, quoted below.

Copyright 2011-2022 David Pilato

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Incompatible 3rd party library licenses

Some libraries are not Apache2 compatible. Therefore they are not packaged with FSCrawler so you need to download and add manually them to the lib directory:

See pdfbox documentation for more details.

Special thanks

Thanks to JetBrains for the IntelliJ IDEA License!

Jet Brains