Tutorial
This tutorial use case is:
Search for the resumes (PDF or Word file which resides in One drive or local) and search for anything in the content using Kibana. For example location worked or the previous company, etc.
Prerequisites
Java 11+ must be installed
JAVA_HOME
must be defined
Install Elastic stack
Download Elasticsearch
Download Kibana
Start Elasticsearch server
Start Kibana server
Check that Kibana is running by opening http://localhost:5601
Start FSCrawler
Download FSCrawler. See Download FSCrawler.
Open a terminal and navigate to the
fscrawler
folder.Type:
# On Linux/Mac
bin/fscrawler resumes
# On Windows
.\bin\fscrawler resumes
It will ask “Do you want to create it (Y/N)?”. Answer
Y
.Go to the FSCrawler configuration folder to edit the job configuration. The FSCrawler configuration folder named
.fscrawler
is by default in the user home directory, likeC:\Users\myuser
on Windows platform or~
on Linux/MacOS. In this folder, you will find another folder namedresumes
. Enter this folder:
# On Linux/Mac
cd ~/.fscrawler/resumes
# On Windows
cd C:\Users\myuser\.fscrawler\resumes
Edit the
_settings.yaml
file which is in this folder and change theurl
value to your folder which contains the resumes you would like to index:
---
name: "resumes"
fs:
# On Linux
url: "/path/to/resumes"
# On Windows
url: "c:\\path\\to\\resumes"
Start again FSCrawler:
# On Linux/Mac
bin/fscrawler resumes
# On Windows
.\bin\fscrawler resumes
FSCrawler should index all the documents inside your directory.
Note
If you want to start again reindexing from scratch instead of monitoring the changes, stop FSCrawler, restart it
with the --restart
option:
# On Linux/Mac
bin/fscrawler resumes --restart
# On Windows
.\bin\fscrawler resumes --restart
Create Index pattern
Open Kibana
Go to the Management page
Open the Index Patterns page under Kibana settings.
Click on
Create index pattern
Type
resumes
in the input box. Don’t forget to remove the star*
that is automatically added by default by Kibana.
Choose the date field you’d like to use if you want to be able to filter documents by date. Use
file.created
field if you want to filter by file creation date,file.last_modified
to filter by last modification date orfile.indexing_date
if you want to filter by the date when the document has been indexed into elasticsearch. You can also choose not to use the time filter (the last option).
Click on “Create index pattern”. You should see something like:
Search for the CVs
Open Kibana
Go to the Discover page
Depending on the date you selected in the Create Index pattern step, you should see something similar to the following image. If you don’t see it, you probably have to adjust the time picker to make sure you are looking at the right period of time.
You can select the fields you’d like to display in the result page, such as
content
,file.filename
,file.extension
,file.url
,file.filesize
, etc.
Of course, you can search for content, like
collaborateurs
here and see the highlighted content.
Adding new files
Just copy new files in the resumes
folder. It could take up to 15 minutes for FSCrawler to
detect the change. This is the default value for update_rate
option. You can also change this
value. See Update rate.
Note
On some OS, moving files won’t touch the modified date and the “new” files won’t be detected. It’s then better probably to copy the files instead.
You might have to “touch” the files like:
touch /path/to/resumes/CV2.pdf
Just hit the Kibana refresh button and see the changes.