In part two of this tutorial series, we begin the official integration of Elasticsearch into a Django project. You can refer to part one as to why I chose Elasticsearch over other full-text search engines.

The main objective with these tutorials is to make full-text search work on real data. Unlike the Elasticsearch tutorials that I’ve encountered when I was first learning it, I want this one to be self-contained. What I mean by that is even if you’re coming with zero knowledge of Elasticsearch, you should be able to make search work just the way you want it from these tutorials alone.

In order to make the whole process a bit more concrete, we’ll be developing a project called RevAggregator (for Review Aggregator). We’ll use a bunch of customer reviews that I scraped from sites like Amazon and Best Buy to create our search documents. We’ll also write some HTML and CSS to present our search data. Later on, we’ll also see how we can make Elasticsearch index and make our documents searchable in near real-time.

Assumptions

This tutorial assumes that you have a working knowledge of Django, especially its model-view-controller design paradigm. Since this post is not a tutorial on Django itself, I’ll direct you to a more thorough explanation of the concepts found here. In addition, some familiarity with the UNIX (Linux, Mac OS X) command line would be helpful.

Requirements

I’ll be doing this tutorial on a 64-bit Ubuntu 16.04 system. If you’re on Mac OS, the command line instructions are the same for the most part. Unfortunately, I’m not too familiar with Windows’ command line so most of the stuff that we’ll be doing on the command line won’t translate directly. Aside from that, the rest of the project should be OS-independent. As for system requirements, 1GB RAM, 20GB HDD, and 1 CPU should be enough. The default settings on some of the libraries we’ll be using won’t allow the Elasticsearch server to run on anything less.

Initial Setup

In this part of the tutorial, we’ll mainly focus on installing the requirements and setting up our work environment. Here’s a list of all the tools we would need

  • Java
  • Python 2.7
  • Elasticsearch
  • PostgreSQL
  • psycopg2
  • Virtualenv
  • Django
  • elasticsearch-dsl
  • elasticsearch.py
  • Jupyter Notebook

Java, Elasticsearch, and PostgreSQL are system-wide installations. We need Java because Elasticsearch is written in Java and needs to run in a Java Virtual Environment. We need PostgreSQL because it’s our main data storage. You could use MySQL or any other SQL database you like but this tutorial will assume you’re using PostgreSQL. Psycopg2 is a PostgreSQL-Django adaptor. Virtualenv is a Python feature that will help us “sandbox” our RevAggregator project. Elasticsearch-dsl and elasticsearch.py are the low-level wrappers that we’ll be using to create our Elasticsearch queries. The only optional feature here is Jupyter Notebook. We’ll be using the Django shell a lot. The default shell is not very pretty. But Jupyter prettifies it by giving it color and nice formatting like IPython.

We install all system-level dependencies in the sudo mode. To start the installation process, open up your terminal and issue

Then issue the following command

If there’s no Java installed, you should get something like this

In that case, do the following

It will ask you for some license agreement. Accept and proceed. Now issue java -version again and you should get detailed version info like so

Elasticsearch Installation

Install Elasticsearch version 5.6.2 (the newest version as of this writing) as follows

Verify that the server is properly installed and running by issuing

or

You should get something like this

We haven’t started the Elasticsearch server so that’s expected. Now to start the server, issue

(Permission Issues: Do not run elasticsearch as a sudo user. It will not run. You will get an error like the one below when you try to start the server

To fix that, you need to take ownership of the /usr/bin/elasticsearch-5.6.2 directory like so

Usually, both user_name and group_name are the same, unless you’ve manually changed the group name. In my case, I had biz:biz for user_name:group_name. After issuing the command, do ln -l and verify that you are, indeed, the owner of elasticsearch-5.6.2 and not root).

Note that when you issue the above command, the server will start running in the foreground. You should see a bunch of output on the screen. That’s not really helpful. Instead, you want to run it in the background and disown it so you can continue to use the terminal. To do so, issue

Check if the server is up with

You should get a JSON response like so

You can also open your browser and type http://127.0.0.1:9200. You should get the same response. The most important thing in the response is the version number. Make sure it’s >=5.x. If not, go back and re-install the correct version of Elasticsearch. And If you’re going to do that, you would first need to purge the current install with the following commands

Now that Elasticsearch is up and running, we’ll move on to the database setup.

Database Installation

In this section we’ll Install PostgreSQL and change the default configurations for Django. As before, check if there is any Postgres install

If you get no psql found, continue with the the commands below to install the latest version of PostgreSQL.

Update source list

Add repo signing key

Update repo

Install the client

Check version

You should get version >=9.6.

Start the server

Let’s create a database and a user. We haven’t even installed Django yet, but I think now is a good time to get done with database-related configurations. In the terminal, type the following command to enter the psql shell

Once inside the shell, type \l (little ‘L’) to list all the databases currently available. You should see a list of a few default databases that looks something like this

To add a new database called revaggregator and a new user called aggregator, copy and paste the following commands into the shell (a side note: you might get some errors if your password is not alphanumeric)

Issue \l again and verify that you have something that looks like this

where database revaggregator has owner aggregator. We are now done with our database setup. Type \q to exit the psql shell. Don’t forget the following credentials. We’ll need them later.

Virtualenv and Django Installation

We’ll install all Python-related libraries and frameworks inside of a virtual environment. If you’re not familiar with Python virtual environments, I recommend you read this tutorial.

Let’s install pip and virtualenv. In the terminal, issue

Before we create a virtual environment for our project, we need to install python development tools, which contain the python header files for the C compiler, which is needed for the Jupyter Notebook. Issue the following command in the terminal

After that’s done, cd into a directory where you would like to store all project-related resources. Mine is at /home/biz. Inside your parent directory, issue the following command in the terminal

It should install pip, Python 2.7, and other dependencies. After installation, my virtual environment is stored at /home/biz/djangoprojects.

Now let’s install project dependencies. First, activate the virtual environment like so

(To deactivate, simply issue deactivate from any directory inside the virtual environment).

Run the following commands inside an active environment

Note that we’re specifying the version of Django we want to install. If you do pip install django, you’re going to run into some problems. That will install the latest version of Django, which is 2.0.1 as of this writing. Django version 1.11 is the latest version supported by Python 2.7.

Now that all of our dependencies have been installed, let’s create our project. Issue the following command in the terminal

This will create a Django project called revaggregator. In reality, it doesn’t really matter what you call your project because it’s not seen by Django. It’s just a container for your apps, which Django does see. But you should give your project an informative name because your main app inside the project folder is given the same name as the project itself by Django.

After you’ve created a new project, you should have a directory listing that looks something like this

Don’t worry about what each of the files is for. We’ll talk about them in a future post.

We’ve now finished our initial setup. We’ve installed all the main dependencies to get us up and running. Later on, we might have to install additional libraries but we’ll hold off until they’re needed. In part 3 of the tutorial, we’ll add some models to our revaggregator app to hold our data. We’ll then use the model to create our search index.