Like A Girl

Pushing the conversation on gender equality.

Code Like A Girl

5 Different ways to synchronize data from MongoDB to ElasticSearch

#elasticsearch + #mongoDB by @apurva1590

MongoDB + ElasticSearch

Many times, you might find the need to migrate data from MongoDB to Elasticsearch in bulk. Elasticsearch facilitates full text search of your data, while MongoDB excels at storing it. Using MongoDB to store your data and Elasticsearch for search is a common architecture. This tutorial shows you how to use different tools or plugins to quickly copy or synchronize data from MongoDB to Elasticsearch.

1. mongo-connector:

mongo-connector is a real-time sync service as a package of python, which is a generic connection system that you can use to integrate MongoDB with another system with simple CRUD operational semantics. It creates a pipeline from a mongodb cluster to one or more target systems, such as Solr, Elasticsearch, or another MongoDB cluster.

mongo-connector needs mongo to run in replica-set mode, sync data in mongo to the target then tails the mongo oplog, keeping up with operations in MongoDB in real-time. It needs a package named “elastic2_doc_manager” to write data to ES.

elasticsearch-mongo-connector-mongodb

mongo-connector copies your documents from MongoDB to your target system. Afterwards, it constantly performs updates on the target system to keep MongoDB and the target in sync. The connector supports both Sharded Clusters and standalone Replica Sets, hiding the internal complexities such as rollbacks and chunk migrations.

Reference link is : mongo-connector

2. elasticsearch-river-mongodb:

Elasticsearch provides ability to enhance the basic functionality by plugins, which are easy to use and develop. They can be used for analysis, discovery, monitoring, data synchronization and many others. Rivers is a group of plugins used for data synchronization between database and elasticsearch.

There is a mongoDB river plugin for data synchronization, named “elasticsearch-river-mongodb”.

elasticsearch-river-mongodb

When document is inserted to MongoDB, database is created (if it doesn’t exist), along with schema for that particular record. Then, our data is stored. When more data comes in, the schema is updated. After inserting document in MongoDB configured as replica set, it is also stored in oplog collection.The mentioned collection is operations log configured as capped collection, which keeps a rolling record of all operations that modify the data stored in databases.

River plugin monitors this collection and forwards new operations to elasticsearch according to its configuration. That means that all insert, update and delete operations are forwarded to elasticsearch automatically. Missing index with default configuration was created automatically while indexing data in ES.

Reference link is: elasticsearch-river-mongodb

3. Logstash:

Logstash is an open source data collection engine with real-time pipelining capabilities. Logstash can dynamically unify data from disparate sources and normalize the data into destinations of your choice.

We can take advantage of buffering , inputting, outputting and filtering abilities from logstash by adding a mongo input and ES output plugin to get this job done. JDBC input plugin is one of the choices, but it needs JDBC driver support.

mongodb-logstach-mongodb

The Logstash event processing pipeline has three stages: inputs → filters → outputs. Inputs generate events, filters modify them, and outputs ship them elsewhere. Inputs and outputs support codecs that enable you to encode or decode the data as it enters or exits the pipeline without having to use a separate filter. Logstash accelerates your insights by harnessing a greater volume and variety of data.

Reference link is: Logstash

4. Transporter:

Transporter tool is a good choice to synchronize data once you want to export mongo data to another ES server. Transporter also can export data from or to other type of data store.

This is a wonderful open source utility tool, developed by Compose (a cloud platform for databases), that takes care of this task very efficiently.

mongodb-transporter-elasticsearch

It’s important to know that the transporter synchronizing only once. When the job is done, the transporter comes to its end.

Reference link is: Transporter

5. Mongoosastic:

We can use Mongoosastic module for storing-in-both-sides purpose when we use Nodejs as a web server container. When one document needs to be stored, Mongoosastic can commit the changes to both mongo and ES.

mongodb-mongoosastic-elasticsearch

The advantage is that data can be stored in both mongo and ES simultaneously, and the downside is that overhead may be caused in CUD operation efficiency. And inconsistent data might be generated when one type of the db store failed. And the server framework is not flexible enough for db migrating.

Reference link is: Mongoosastic