Workplace Search settings¶
New in version 2.7.
FSCrawler can now send documents to Workplace Search.
Although this won’t be needed in the future, it is still mandatory to have access to the elasticsearch instance running behind Workplace Search. In this section of the documentation, we will only cover the specifics for workplace search. Please refer to Elasticsearch settings chapter.
To easily start locally with Workplace Search, follow the steps:
Check-out the source code on GitHub:
git clone firstname.lastname@example.org:dadoonet/fscrawler.git cd fscrawler cd contrib/docker-compose-workplacesearch docker-compose up
This will start Elasticsearch, Kibana (not used) and Workplace Search.
- Wait for it to start and open http://127.0.0.1:3002/ws.
enterprise_searchas the login and
changemeas the password.
- Click on “Add sources” button and choose Custom API.
- Name it
fscrawlerand click on “Create Custom API Source” button.
- Copy the “Access Token” value. We will mention it as
ACCESS_TOKENfor the rest of this documentation.
- Copy the “Key” value. We will mention it as
KEYfor the rest of this documentation.
Here is a list of Workplace Search settings (under
||None (Must be set)||Keys|
||None (Must be set)||Keys|
||Documents Repository URL|
Once you have created your Custom API and have the
KEY, you can add to your existing
FSCrawler configuration file:
name: "test" workplace_search: access_token: "ACCESS_TOKEN" key: "KEY"
When using Workplace Search, FSCrawler will by default connect to
which is the default when running a local node on your machine.
Of course, in production, you would probably change this and connect to a production cluster:
name: "test" workplace_search: access_token: "ACCESS_TOKEN" key: "KEY" server: "http://wpsearch.mycompany.com:3002"
Running on Cloud¶
The easiest way to get started is to deploy Enterprise Search on Elastic Cloud Service.
Then you can define the following:
name: "test" elasticsearch: username: "elastic" password: "PASSWORD" nodes: - cloud_id: "CLOUD_ID" workplace_search: access_token: "ACCESS_TOKEN" key: "KEY" server: "https://XYZ.ent-search.ZONE.CLOUD_PROVIDER.elastic-cloud.com"
CLOUD_ID by values coming from the Elastic Console.
And get the
KEY from your Enterprise Search deployment once you have created the
Custom API source as seen previously.
FSCrawler is using bulks to send data to Workplace Search. By default the
bulk is executed every 100 operations or every 5 seconds. You can change
default settings using
name: "test" workplace_search: bulk_size: 1000 flush_interval: "2s"
Documents Repository URL¶
The URL that will be used to give access to your users to the source document is
prefixed by default with
http://127.0.0.1. That means that if you are able to run
a Web Server locally which can serve the directory you defined in
(see Root directory), your users will be able to click in the Workplace Search interface
to have access to the documents.
Of course, in production, you would probably change this and connect to
another url. This can be done by changing the
name: "test" workplace_search: access_token: "ACCESS_TOKEN" key: "KEY" url_prefix: "https://repository.mycompany.com/docs"
fs.url is set to
/tmp/es and you have indexed a document named
/tmp/es/path/to/foobar.txt, the default url will be
If you change
same document will be served as