Version 2.10

Breaking changes

  • If you want to exclude a specific folder, you need to use a wildcard character at the end of the folder name. For example, to exclude the folder /tmp/foo, you need to use /tmp/foo/*. Thanks to dadoonet.

  • The way we run docker images has changed. We don’t need anymore to specify the fscrawler binary. So running docker run -it -v ~/.fscrawler:/root/.fscrawler -v /documents:/tmp/es:ro dadoonet/fscrawler job_name is

enough. Thanks to dadoonet.

New

  • Add support for automatic semantic search when using a 8.17+ version with a trial or enterprise license. See Semantic search. Warning: this might slow down the ingestion process. Thanks to dadoonet.

  • Add support for Elastic cloud serverless. Thanks to dadoonet.

  • Using the REST API _document, you can now fetch a document from the local dir, from an http website or from an S3 bucket. See REST service. Thanks to dadoonet.

  • You can now remove a document in Elasticsearch using FSCrawler _document endpoint. See REST service. Thanks to dadoonet.

  • Implement our own HTTP Client for Elasticsearch. Thanks to dadoonet.

  • Add option to set path to custom tika config file. See Local FS settings. Thanks to iadcode.

  • Support for Index Templates. See Mappings. Thanks to dadoonet.

  • Support for Aliases. You can now index to an alias. Thanks to dadoonet.

  • Support for Access Token and Api Keys instead of Basic Authentication. See Using Credentials (Security). Thanks to dadoonet.

  • Allow loading external jars. This adds a new external directory from where jars can be loaded to the FSCrawler JVM. For example, you could provide your own Custom Tika Parser code. See Directory layout. Thanks to dadoonet.

  • Add temporal information in folder index. Thanks to bdauvissat

  • Add support for external metadata files while crawling, defaults to .meta.yml. See External Tags Thanks to dadoonet.

Fix

  • fs.ocr.enabled was always false. Thanks to ywjung.

  • Do not hide YAML parsing errors. Thanks to dadoonet.

  • Fix duration parsing for the day unit d. Thanks to dadoonet.

Deprecated

  • The _upload REST endpoint has been deprecated. Please now use the _document endpoint. Thanks to dadoonet.

  • Support for Elasticsearch 6.x is deprecated. Thanks to dadoonet.

  • Support for Basic Authentication is deprecated. You should use API keys instead. Thanks to dadoonet.

Updated

Removed

  • Remove the specific distributions depending on Elastic version. Thanks to dadoonet.

Thanks to @dadoonet, @ywjung, @iadcode, @bdauvissat for this release!