How to Benchmark MongoDB with YCSB Workloads on All-Flash Block Storage?

Dawood Munavar Jun 03 - 5 min read

Audio : Listen to This Blog.

Introduction:

MongoDB is a very popular open-source scale-out NoSQL database. It powers modern big data analytics applications that require low latency read and writes, high availability, and advanced data management.

Usecases:

Key use cases for MongoDB includes real-time analytics, product catalogs, content management, and mobile applications.
The storage solution with MongoDB offers the following key benefits:

  1. Scalability provided by the storage array, which can modularly scale storage with MongoDB 
  2. Non-Disruptive Operations with high availability to deliver maximum uptime and stable performance, specifically during drive failure.
  3. Predictable high performance with low latency, providing excellent response time for the most demanding analytics applications built on MongoDB.

Summary:

This reference document showcases a validated end-to-end solution design for efficiently deploying a virtualized scale-out MongoDB NoSQL database on the block storage using VMware vSphere Hypervisor Environment.

Note: Storage array hosts MongoDB virtual machines (VMs) and databases.
mongodb-vm

MongoDB Architecture:

Sharding divides the dataset and distributes the data over multiple servers, or shards. Each shard is an independent database. Collectively, the shards make up a single logical database.

Shard stores data. In a production shard cluster, each shard own the responsibility of maintaining a replica set. This strategy facilitates high availability and data consistency.
mongodb-architecture

Storage Architecture:

storage-architecture

4 Key Steps to Remember before Benchmarking MongoDB with YCSB Workloads on All-Flash Block Storage

1.Server Configurations

  • 2 Fibre Channel 16Gbps ports.
  • ESXi 6.7 Installed on hosts.
  • Fibre Channel switch connectivity.

2.Storage controller Configuration

  • 2-node HA pair.
  • 4 Fibre Channel 16Gbps ports.
  • SSD Drives.

3.Configurations on ESXi Host

1.Created three volumes of each 500G on storage controller.
2.Mapped the three volumes to 3-node ESXi hosts from storage controller.
3.Created 3 VMFS6 Datastores on hosts (i.e one storage volume created as one VMFS6 Datastore)
esxi-host

  1. Created 8 Virtual Machines with 2vdisks (i.e 8 VMs resides in 3 Datastores).
  2. Size of vdisk1-16G Thin Prov (where RHEL OS is installed), vdisk2-500G Thin Prov (used as DB – Data disk)

vdisk

  1. Login to all 8 VMs, created partition on 500G disk (added as vdisk2), mount it on the hosts.
[root@localhost opt]# fdisk /dev/sdb
[root@localhost opt]# mkfs -t ext4 /dev/sdb
mke2fs 1.42.9 (28-Dec-2013)
[root@localhost ~]# mkdir -p /data/db1; mount /dev/sdb /data/db1
[root@localhost ~]# cat /etc/fstab | grep sdb
/dev/sdb1 /data/db1 ext4 defaults 0 0

4.Configuring MongoDB on VMs:

Prerequisites:

3 RHEL 7 VMs as Config Replica Sets
10.20.178.30  configsvr1 
10.20.178.92  configsvr2
10.20.178.96  configsvr3

4 RHEL 7 VMs as Shard Replica Sets
10.20.178.93 shardsvr1
10.20.178.95 shardsvr2
10.20.178.94 shardsvr3
10.20.178.46 shardsvr4
1 RHEL 7 VMs as mongos/Query Router
10.20.178.91 mongos

Step1: Edit the hosts file on each VMs,

[root@configsvr1 ~]# cat /etc/hosts
10.20.178.30 configsvr1
10.20.178.92 configsvr2
10.20.178.96 configsvr3
10.20.178.93 shardsvr1
10.20.178.95 shardsvr2
10.20.178.94 shardsvr3
10.20.178.46 shardsvr4
10.20.178.91 mongos

Step2: Install MongoDB on all instances/VMs

[root@configsvr1 ~]# mongod --version
db version v4.0.9
git version: fc525e2d9b0e4bceff5c2201457e564362909765
OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
allocator: tcmalloc
modules: none
build environment:
distmod: rhel70
distarch: x86_64
target_arch: x86_64

Step3: Create Config Server Replica Set

Change the DB storage path to your own directory in /etc/mongod.conf.
We will use '/data/db1' on all the VM’s,
storage:
 dbPath: /data/db1

Step4: Create the Shard Replica Sets

Change the default storage to your specific directory in /etc/mongod.conf.
storage:
 dbPath: /data/db1

Step 5 – Configure mongos/Query Router

mongos --configdb "replconfig01/configsvr1:27017,configsvr2:27017,configsvr3:27017"

Step 6 – Add shards to mongos/Query Router

mongo --host mongos --port 27017
sh.addShard( "shardreplica01/shardsvr1:27017")
sh.addShard( "shardreplica01/shardsvr2:27017")
sh.addShard( "shardreplica01/shardsvr3:27017")
sh.addShard( "shardreplica01/shardsvr4:27017")
mongos> sh.status()
--- Sharding Status ---

sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5cb8d38dd5a157f0f404f31f")
}
shards:
{  "_id" : "shardreplica01",  "host" : "shardreplica01/shardsvr1:27017,shardsvr2:27017,shardsvr3:27017,shardsvr4:27017",  "state" : 1 }
active mongoses:
"4.0.9" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled:  yes
Currently running:  no
Failed balancer rounds in last 5 attempts:  0
Migration Results for the last 24 hours:
No recent migrations
databases:
{  "_id" : "config",  "primary" : "config",  "partitioned" : true }

Performance Validation with YCSB:
YCSB is a popular Java open-source specification and program suite developed at Yahoo! to compare the relative performance of various NoSQL databases. Its workloads are used in various comparative studies of NoSQL databases
Workloads Used: A, B, C.

  • Workload A: Update heavy workload: 50/50% Mix of Reads/Writes
  • Workload B: Read mostly workload: 95/5% Mix of Reads/Writes
  • Workload C: Read-only: 100% reads.

The following command was used to run workload A,B & C where threads were 8, 16, 32, and 64:

/root/YCSB/bin/ycsb run mongodb -s -P /root/YCSB/workloads/workloada -p mongodb.url="mongodb://mongos:27017/ycsb1" -p operationcount=120000000 -p maxexecutiontime=1800 -threads 8
/root/YCSB/bin/ycsb run mongodb -s -P /root/YCSB/workloads/workloadb -p mongodb.url="mongodb://mongos:27017/ycsb1" -p operationcount=120000000 -p maxexecutiontime=1800 -threads 8
/root/YCSB/bin/ycsb run mongodb -s -P /root/YCSB/workloads/workloadc -p mongodb.url="mongodb://mongos:27017/ycsb1" -p operationcount=120000000 -p maxexecutiontime=1800 -threads 8
/root/YCSB/bin/ycsb run mongodb -s -P /root/YCSB/workloads/workloadd -p mongodb.url="mongodb://mongos:27017/ycsb1" -p operationcount=120000000 -p maxexecutiontime=1800 -threads 8

MongoDB

References:
https://www.howtoforge.com/tutorial/deploying-mongodb-sharded-cluster-on-centos-7/
https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload
https://www.netapp.com/us/media/tr-4600.pdf

Leave a Reply

MSys has developed solutions for almost all sought-after platforms, programming languages, and operating systems. Deep expertise and innovation help us deliver projects in a highly cost-effective manner. We deploy engineers onsite or offsite based on specific requirements of the customer. Read our Storage Services Broacher.