Workplace Search settings¶
New in version 2.7.
FSCrawler can now send documents to Workplace Search.
Contents
Note
Although this won’t be needed in the future, it is still mandatory to have access to the elasticsearch instance running behind Workplace Search. In this section of the documentation, we will only cover the specifics for workplace search. Please refer to Elasticsearch settings chapter.
Hint
To easily start locally with Workplace Search, follow the steps:
git clone git@github.com:dadoonet/fscrawler.git
cd fscrawler
cd contrib/docker-compose-workplacesearch
docker-compose up
This will start Elasticsearch, Kibana and Workplace Search. Wait for it to start. http://0.0.0.0:5601/app/enterprise_search/workplace_search must be available before continuing.
Here is a list of Workplace Search settings (under workplace_search.
prefix):
Name | Default value | Documentation |
---|---|---|
workplace_search.id |
None | Custom Source ID |
workplace_search.name |
Local files for job + Job Name | Custom Source Name |
workplace_search.username |
same as for elasticsearch | Secrets |
workplace_search.password |
same as for elasticsearch | Secrets |
workplace_search.server |
http://127.0.0.1:3002 |
Server |
workplace_search.bulk_size |
100 |
Bulk settings |
workplace_search.flush_interval |
"5s" |
Bulk settings |
workplace_search.url_prefix |
http://127.0.0.1 |
Documents Repository URL |
Note
At least, one of the settings under workplace_search.
prefix must be set if you want to activate
the Workplace Search output. Otherwise, it will use Elasticsearch as the output.
Secrets¶
FSCrawler is using the username/password capabilities of the Workplace Search API. The default values are the ones you defined in Elasticsearch configuration (see Elasticsearch settings). So the following settings will just work:
name: "test"
elasticsearch:
username: "elastic"
password: "PASSWORD"
workplace_search:
name: "My fancy custom source name"
But if you want to create another user (recommended) for FSCrawler like fscrawler
, you can define it as follows:
name: "test"
elasticsearch:
username: "elastic"
password: "PASSWORD"
workplace_search:
username: "fscrawler"
password: "FSCRAWLER_PASSWORD"
Custom Source Management¶
When starting, FSCrawler will check if a Custom Source already exists with the name that you used for the job.
Custom Source ID¶
When a Custom Source is found with the same name, the KEY
of the Custom Source is automatically fetched and applied
to the workplace search job settings.
If you already have defined a Custom API in Workplace Search Admin UI <http://0.0.0.0:5601/app/enterprise_search/workplace_search>
and have the KEY
, you can add it to your existing FSCrawler configuration file:
name: "test"
elasticsearch:
username: "elastic"
password: "PASSWORD"
workplace_search:
id: "KEY"
Tip
If you let FSCrawler creates the Custom Source for you, it is recommended to manually edit the job settings
and provide the workplace_search.id
. So if you rename the Custom Source, FSCrawler won’t try to create it again.
Custom Source Name¶
You can specify the custom source name you want to use when FSCrawler creates it automatically:
name: "test"
elasticsearch:
username: "elastic"
password: "PASSWORD"
workplace_search:
name: "My fancy custom source name"
Tip
By default, FSCrawler will use as the name Local files for JOB_NAME
where JOB_NAME
is
the FSCrawler name
setting value. So the following job settings:
name: "test"
elasticsearch:
username: "elastic"
password: "PASSWORD"
workplace_search:
username: "fscrawler"
password: "FSCRAWLER_PASSWORD"
will use Local files for test
as the Custom Source name in Workplace Search.
Automatic Custom Source Creation¶
If the Custom Source id is not provided and no Custom Source exists with the same name, it will create automatically
the Custom Source for you with all the default settings, which are read from
~/.fscrawler/_default/7/_wpsearch_settings.json
. You can read its content from
the source.
If you want to define your own settings, you can either define your own Custom Source using the Workplace Search
Administration UI or define a ~/.fscrawler/_default/7/_wpsearch_settings.json
document
which contains the settings you wish before starting FSCrawler.
See Workplace Search documentation
for more details.
Define explicit settings per job¶
Let’s say you created a job named job_name
and you are sending
documents against a workplace search instance running version 7.x
.
If you create the following file, it will be picked up at job start time instead of the default ones:
~/.fscrawler/{job_name}/_mappings/7/_wpsearch_settings.json
Server¶
When using Workplace Search, FSCrawler will by default connect to http://127.0.0.1:3002
which is the default when running a local node on your machine.
Of course, in production, you would probably change this and connect to a production cluster:
name: "test"
elasticsearch:
username: "elastic"
password: "PASSWORD"
workplace_search:
server: "http://wpsearch.mycompany.com:3002"
Running on Cloud¶
The easiest way to get started is to deploy Enterprise Search on Elastic Cloud Service.
Then you can define the following:
name: "test"
elasticsearch:
username: "elastic"
password: "PASSWORD"
nodes:
- cloud_id: "CLOUD_ID"
workplace_search:
server: "URL"
Note
Change the PASSWORD
, CLOUD_ID
and URL
by values coming from the Elastic Console.
URL
is something like https://XYZ.ent-search.ZONE.CLOUD_PROVIDER.elastic-cloud.com
.
Bulk settings¶
FSCrawler is using bulks to send data to Workplace Search. By default the
bulk is executed every 100 operations or every 5 seconds. You can change
default settings using workplace_search.bulk_size
and workplace_search.flush_interval
:
name: "test"
elasticsearch:
username: "elastic"
password: "PASSWORD"
workplace_search:
bulk_size: 1000
flush_interval: "2s"
Documents Repository URL¶
The URL that will be used to give access to your users to the source document is
prefixed by default with http://127.0.0.1
. That means that if you are able to run
a Web Server locally which can serve the directory you defined in fs.url
setting
(see Root directory), your users will be able to click in the Workplace Search interface
to have access to the documents.
Of course, in production, you would probably change this and connect to
another url. This can be done by changing the workplace_search.url_prefix
setting:
name: "test"
elasticsearch:
username: "elastic"
password: "PASSWORD"
workplace_search:
url_prefix: "https://repository.mycompany.com/docs"
Note
If fs.url
is set to /tmp/es
and you have indexed a document named
/tmp/es/path/to/foobar.txt
, the default url will be http://127.0.0.1/path/to/foobar.txt
.
If you change workplace_search.url_prefix
to https://repository.mycompany.com/docs
, the
same document will be served as https://repository.mycompany.com/docs/path/to/foobar.txt
.