As the year turned into 2018, I decided to ditch WordPress, which I had been using for over 12 years as my preferred CMS. I had many reasons to do this, but the biggest motivation was the opportunity to try something new and ditch the bloat and clutter of WordPress for a simpler, more elegant arrangement of things. Motivated by another adopter, Mark Edmondson, I decided to try Hugo (pun intended).
Hugo is written in Go, and is fairly easy to use if you are familiar with markdown and command line for your operating system. The trick about a static site is to store all the content in static files on your file server. There is no relational database to reference, which means that a static site can be very fast and requires maintenance.
One of the biggest problems for me was how to set up the site search. Without a database or web server that generates dynamic HTML documents, finding a suitable way to index content in the browser and respond quickly and efficiently to search queries seemed like an insurmountable task.
I tried a number of things initially, including:
Algolia, which I had to give up because I have a lot of content for their free tier.
js running on my NodeJS virtual machine in the Google cloud, which I had to give up because I got billed for eg $400 for December maintenance alone.
A custom solution that digested Hugo-generated JSON and parsed it for jQuery search directly in the browser, which I had to give up because downloading an indexed JSON file of about 5MB per page is not conducive to a good user experience.
After the failed experiment with lunr.js, I still wanted to give Google App Engine another chance. I’ve been in love with App Engine ever since I published my first version of GTM Tools on it. Well, as it turns out, App Engine has a really useful and flexible search API for Python, which seems to be specifically designed to work with JSON generated by Hugo on a static site!
Simmer . Newsletter
Subscribe to the Simmer newsletter to get the latest news and content from Simo Ahava right in your inbox!
My setup looks like this:
Hugo’s config file is configured to output a file
index.jsonIn the public directory, with all my site content ready for indexing.
A script that publishes this JSON file in an App Engine project.
The App Engine project uses the Python Search API client to create an index for this JSON.
The App Engine project also provides an HTTP endpoint where my site performs all search queries. Each request is processed as a search query, and the result is returned in an HTTP response.
The beauty of using the Search API is that I’m well below the quota limits for the free version, so I don’t have to pay a dime to make it fully functional!
1. Modify the configuration file
It’s easy to make the change in Hugo’s config file, because Hugo has built in support for generating JSON in a format that most search libraries will digest. In the configuration file, you need to find a file
output Configuration and add
"JSON" As one of the output file
home Content type. So it looks something like this:
[output] home = [ "HTML", "RSS", "JSON" ]
This configuration change creates a file
index.json A file in the root of your public folder whenever the Hugo project was created.
Here is an example of what a blog post might look like in this file:
[ "uri": "https://www.simoahava.com/upcoming-talks/", "title": "Upcoming Talks", "tags": , "description": "My upcoming conference talks and events", "content": "17 March 2018: MeasureCamp London 20 March 2018: SMX München 19 April 2018: Advanced GTM Workshop (Hamburg) 24 May 2018: NXT Nordic (Oslo) 20 September 2018: Advanced GTM Workshop (Hamburg) 14-16 November 2018: SMXL Milan I enjoy presenting at conferences and meetups, and I have a track record of hundreds of talks since 2013, comprising keynotes, conference presentations, workshops, seminars, and public trainings. Audience sizes have varied between 3 and 2,000.nMy favorite topics revolve around web analytics development and analytics customization, but Iu0026rsquo;m more than happy to talk about integrating analytics into organizations, knowledge transfer, improving technical skills, digital marketing, and content creation.nSome of my conference slides can be found at SlideShare.nFor a sample, hereu0026rsquo;s a talk I gave at Reaktor Breakpoint in 2015.n You can contact me at simo (at) simoahava.com for enquiring about my availability for your event.n" ]
2. Publishing script
The deployment script is part of the Bash code that builds the Hugo website, copies a file
index.json in my research project folder, then publish the entire research project in App Engine. This is what it looks like:
cd ~/Documents/Projects/www-simoahava-com/ rm -rf public hugo cp public/index.json ../www-simoahava-com-search/ rm -rf public cd ~/Documents/Projects/www-simoahava-com-search/ gcloud app deploy curl https://search-www-simoahava-com.appspot.com/update
hugo The command builds the site and creates the public folder. From the public folder, file
index.json Then it is copied to my research project folder, which is later published in App Engine with the command
gcloud app deploy. Finally, a
curl Command to my custom endpoint makes sure that my Python script updates the search index with the latest version of
3. Python code running in App Engine
In App Engine I simply created a new project with an easy to remember name as the endpoint. I didn’t add any invoices to the account, because I set myself a challenge to create a free search API for my site.
See this documentation for a quick start guide on how to get started with Python and App Engine. Focus especially on how to set up an App Engine project (you don’t need to enable billing), how to install and configure a file
gcloud Command line tools for your project.
Python code looks like this.
#!/usr/bin/python from urlparse import urlparse from urlparse import parse_qs import json import re import webapp2 from webapp2_extras import jinja2 from google.appengine.api import search # Index name for your search documents _INDEX_NAME = 'search-www-simoahava-com' def create_document(title, uri, description, tags, content): """Create a search document with ID generated from the post title""" doc_id = re.sub('[s+]', '', title) document = search.Document( doc_id=doc_id, fields=[ search.TextField(name='title', value=title), search.TextField(name='uri', value=uri), search.TextField(name='description', value=description), search.TextField(name='tags', value=json.dumps(tags)), search.TextField(name='content', value=content) ] ) return document def add_document_to_index(document): index = search.Index(_INDEX_NAME) index.put(document) class BaseHandler(webapp2.RequestHandler): """The other handlers inherit from this class. Provides some helper methods for rendering a template.""" @webapp2.cached_property def jinja2(self): return jinja2.get_jinja2(app=self.app) class ProcessQuery(BaseHandler): """Handles search requests for comments.""" def get(self): """Handles a get request with a query.""" uri = urlparse(self.request.uri) query = '' if uri.query: query = parse_qs(uri.query) query = query['q'] index = search.Index(_INDEX_NAME) compiled_query = search.Query( query_string=json.dumps(query), options=search.QueryOptions( sort_options=search.SortOptions(match_scorer=search.MatchScorer()), limit=1000, returned_fields=['title', 'uri', 'description'] ) ) results = index.search(compiled_query) json_results = 'results': , 'query': json.dumps(query) for document in results.results: search_result = for field in document.fields: search_result[field.name] = field.value json_results['results'].append(search_result) self.response.headers.add('Access-Control-Allow-Origin', 'https://www.simoahava.com') self.response.write(json.dumps(json_results)) class UpdateIndex(BaseHandler): """Updates the index using index.json""" def get(self): with open('index.json') as json_file: data = json.load(json_file) for post in data: title = post.get('title', '') uri = post.get('uri', '') description = post.get('description', '') tags = post.get('tags', ) content = post.get('content', '') doc = create_document(title, uri, description, tags, content) add_document_to_index(doc) application = webapp2.WSGIApplication( [('/', ProcessQuery), ('/update', UpdateIndex)], debug=True)
In the end, I am bound by requests for
/ end point for
/update to me
UpdateIndex. In other words, these are the two endpoints I serve.
index.json file, and for every single content piece inside (blog posts, pages, etc.), it grabs an extension
content Parameters from JSON content, and generates documentation for each state. Then each document is added to the index.
This is how the Search API can be used to translate any JSON file into a valid search index, against which you can then create queries.
Queries are made by polling in
/?q=<keyword> End point, where
keyword Matches a valid query against the search API’s query engine. Each query is processed by
ProcessQuery, which takes the query term, polls the search index with that term, and then aggregates a result for all documents returned by the search index for that query (in ordered order). This result is then pushed into the JSON response to the client.
The Search API gives you plenty of room for index optimization and for compiling complex queries. I’ve opted for a fairly normal approach, which may result in some odd outliers, like docs which should obviously be at the top of the list of relevant results ending up at the end, but I’m still very happy with the way the API’s strength is.
Finally, I need some client-side code to produce the search results page. Since Hugo doesn’t have a web server, I can’t do the search server side – it has to be done in the client. This is one case where a static site loses some of its luster when compared to its counterpart with a web server and server-side processing capabilities. The Hugo site is created and published once, so there is no dynamic generation of HTML pages after creation – everything has to happen in the client.
Anyway, the search form on my site is very simple. It just looks like this:
<form id="search" action="/search/"> <input name="q" type="text" class="form-control input--xlarge" placeholder="Search blog..." autocomplete="off"> </form>
When the form is submitted, it makes a GET request to
/search/ A page on my site, adding everything that was written in the field as
q Query parameter, so the URL becomes something like
(function($) var printSearchResults = function(results) // Update the page DOM with the search results... ; var endpoint = 'https://search-www-simoahava-com.appspot.com'; var getQuery = function() &)q=/.test(window.location.search)) return undefined; var parts = window.location.search.substring(1).split('&'); var query = parts.map(function(part) var temp = part.split('='); return temp === 'q' ? temp : false; ); return query ; $(document).ready(function() var query = getQuery(); if (typeof query === 'undefined') printSearchResults(); return; else $.get(endpoint + '?q=' + query, function(data) printSearchResults(JSON.parse(data)); ); ); )(window.jQuery)
To keep things simple, I’ve only included relevant parts of code that can be used elsewhere as well. In short, when a file
/search/ The page loads, whatever is included as a file value
q The query parameter is immediately sent to the search API endpoint. The response is then processed and included in the search results page.
So, if the page URL is
https://search-www-simoahava-com.appspot.com/?q=google+tag+manager. You can visit this URL to see what the response looks like.
This response has been processed, and the search results page has been created.
This is how I chose to build on-site search using Hugo’s flexibility combined with the powerful search API offered by Google App Engine.
Based on the limited amount of searching I did, it’s as good a solution as any, and seems pretty fast without compromising the power of the search query engine. However, as more content accumulates, it is conceivable that the query engine will either become slower or start hitting the free tier quotas, at which point I will need to rethink my approach.
The weak link at the moment is that everything is done on the client side. This means that, contrary to the philosophy of static sites, a lot of processing takes place in the browser. But I’m not sure how to avoid that, since static site doesn’t give you server-side processor capabilities.
At this time, I’m willing to make a trade-off, but I’m eager to hear feedback if the search is inaccurate or not working properly for you.