ASync your search using Elasticsearch

Siva Gollapalli
2 min readAug 19, 2022

Most of you know what elastic search is. It would be great if you could check out here before proceeding. Generally, we use it to implement search functionality in a given application. Elasticsearch can seamlessly handle vast amounts of data and provides a useful API to search throw that data. But, due to the enormous amount of data, sometimes it takes longer to produce results. In those cases, we async your search, and later we will fetch the results when we want. To do this we don’t need any additional libraries. Elasticsearch has supported this in-built from version 7.7. Let’s see how we can use it in our Ruby application.

To interact with an elastic search cluster using Ruby client.

gem install elasticsearch

Note: For now we can do this with Ruby applications but not with Rails applications since elasticsearch-rails dependencies haven’t been up-to-date with the latest ruby client. Let’s see some code now:

# creates 10k records in elasticsearchrequire 'faker'
require 'elasticsearch/persistence'
require 'elasticsearch'
class Movie
attr_reader :attributes
def initialize(attributes={})
@attributes = attributes
end
def to_hash
@attributes
end
end
class MyRepository
include Elasticsearch::Persistence::Repository
end
client = Elasticsearch::Client.new(url: 'http://localhost:9200')
repository = MyRepository.new(client: client)
10000.times do
movie = Movie.new(title: Faker::Movie.title)
repository.save(movie)
p "saving..."
end

let’s async our search

require 'elasticsearch'
require 'pry'
require 'faker'
client = Elasticsearch::Client.new(url: 'http://localhost:9200')q = {
"query": {
"match": {
"title": Faker::Movie.title.split(' ').last
}
}
}
response = client.async_search.submit(body: q)
if response.body['is_running']
id = response.body['id']
loop do
p 'searching.....'
response = client.async_search.get(id: id)
sleep 2
pp response
break unless response.body['is_running']
end
end
pp response.body['response']['hits']

client.async_search.submit will submit our query to elastic search. Sometimes if results are immediately available then it returns results otherwise it returns a unique identifier where we can pull results later. Here is the sample response.

#<Elasticsearch::API::Response:0x00007faa189c99f0
@response=
#<Elastic::Transport::Transport::Response:0x00007faa189c9a40
@body=
{"id"=> "Fnd2T2xYWlJqUXYtb2R1UkNZeG52OFEcZ3gyX1dfNUdRdE9LbmkzdVpHalVRUToxNDQ1NQ==",
"is_partial"=>true,
"is_running"=>true,
"start_time_in_millis"=>1660647145813,
"expiration_time_in_millis"=>1661079145813,
"response"=>
{"took"=>1028,
"timed_out"=>false,
"terminated_early"=>false,
"num_reduce_phases"=>0,
"_shards"=>{"total"=>1, "successful"=>0, "skipped"=>0, "failed"=>0},
"hits"=>
{"total"=>{"value"=>0, "relation"=>"gte"},
"max_score"=>nil,
"hits"=>[]}}},
@headers=
{"x-elastic-product"=>"Elasticsearch",
"content-type"=>"application/json",
"content-length"=>"317"},
@status=200>>

We can retrieve results partially also whenever they are available using client.async_search.get(id: id) . Once we are done with our processing we can delete the search using client.async_search.delete(id: id) . By default results are available for 5 days which can be overridden using keep_alive a parameter. Also, it throws an error if the size of the results crosses 10MB which can be overridden using max_async_search_response_size at the cluster level.

I hope this article provides enough idea on how to use async search. Any comments/suggestions are welcome.

Happy asyncing!!! :)

--

--