Tutorial
This tutorial use case is:
Search for the resumes (PDF or Word file which resides in One drive or local) and search for anything in the content using Kibana. For example location worked or the previous company, etc.
Prerequisites
Java 17+ must be installed
JAVA_HOMEmust be defined
Install Elastic stack
Download Elasticsearch
Download Kibana
Start Elasticsearch server
Start Kibana server
Check that Kibana is running by opening http://localhost:5601
Start FSCrawler
Download FSCrawler. See Using docker.
Open a terminal and navigate to the
fscrawlerfolder.Type:
# On Linux/Mac
bin/fscrawler --setup resumes
# On Windows
.\bin\fscrawler --setup resumes
It will create a sample configuration file.
Go to the FSCrawler configuration folder to edit the job configuration. The FSCrawler configuration folder named
.fscrawleris by default in the user home directory, likeC:\Users\myuseron Windows platform or~on Linux/MacOS. In this folder, you will find another folder namedresumes. Enter this folder:
# On Linux/Mac
cd ~/.fscrawler/resumes
# On Windows
cd C:\Users\myuser\.fscrawler\resumes
Edit the
_settings.yamlfile which is in this folder and change theurlvalue to your folder which contains the resumes you would like to index:
---
name: "resumes"
fs:
# On Linux
url: "/path/to/resumes"
# On Windows
url: "c:\\path\\to\\resumes"
Start again FSCrawler:
# On Linux/Mac
bin/fscrawler resumes
# On Windows
.\bin\fscrawler resumes
FSCrawler should index all the documents inside your directory.
Note
If you want to start again reindexing from scratch instead of monitoring the changes, stop FSCrawler, restart it
with the --restart option:
# On Linux/Mac
bin/fscrawler resumes --restart
# On Windows
.\bin\fscrawler resumes --restart
Create Index pattern
Open Kibana
Go to the Management page
Open the Index Patterns page under Kibana settings.
Click on
Create index patternType
resumesin the input box. Don’t forget to remove the star*that is automatically added by default by Kibana.
Choose the date field you’d like to use if you want to be able to filter documents by date. Use
file.createdfield if you want to filter by file creation date,file.last_modifiedto filter by last modification date orfile.indexing_dateif you want to filter by the date when the document has been indexed into elasticsearch. You can also choose not to use the time filter (the last option).
Click on “Create index pattern”. You should see something like:
Search for the CVs
Open Kibana
Go to the Discover page
Depending on the date you selected in the Create Index pattern step, you should see something similar to the following image. If you don’t see it, you probably have to adjust the time picker to make sure you are looking at the right period of time.
You can select the fields you’d like to display in the result page, such as
content,file.filename,file.extension,file.url,file.filesize, etc.
Of course, you can search for content, like
collaborateurshere and see the highlighted content.
Adding new files
Just copy new files in the resumes folder. It could take up to 15 minutes for FSCrawler to
detect the change. This is the default value for update_rate option. You can also change this
value. See Update rate.
Note
On some OS, moving files won’t touch the modified date and the “new” files won’t be detected. It’s then better probably to copy the files instead.
You might have to “touch” the files like:
touch /path/to/resumes/CV2.pdf
Just hit the Kibana refresh button and see the changes.