Tag Archives: mongodb

MongoDB 3.0 upgrade in production : step 4 victory !

In my last post, I explained the new hope we had in following some newly added recommended steps before trying to migrate our production cluster to mongoDB 3.0 WiredTiger.

The most demanding step was migrating all our production servers data storage filesystems to XFS which obviously required a resync of each node… But we ended up being there pretty fast and were about to try again as 3.0.5 was getting ready, until we saw this bug coming !

I guess you can understand why we decided to wait for 3.0.6… which eventually got released with a more peaceful changelog this time.

The 3.0.6 crash test

We decided as usual to test the migration to WiredTiger in two phases.

  1. Migrate all our secondaries to the WiredTiger engine (full resync). Wait a week to see if this has any effect on our cluster.
  2. Switch all the MMapv1 primary nodes to secondary and let our WiredTiger secondary nodes become the primary nodes of our cluster. Pray hard that this time it will not break under our workload.

Step 1 results were good, nothing major changed and even our mongo dumps were still working this time (yay!). One week later, everything was still working smoothly.

Step 2 was the big challenge which failed horribly last time. Needless to say that we were quite stressed when doing the switch. But it worked smoothly and nothing broke + performances gains were huge !

The results

Nothing speaks better than metrics, so I’ll just comment them quickly as they speak by themselves. I obviously can’t disclose the scales sorry.

Insert-only operations gained 25x performance

2015-09-14-173545_531x276_scrot

 

 

 

 

 

 

 

Upsert-heavy operations gained 5x performance

2015-09-14-173748_530x275_scrot

 

 

 

 

 

 

 

Disk I/O also showed mercy to the disk overall usage. This is due to WiredTiger superior caching and disk flushing mechanisms.

2015-09-14-173954_409x254_scrot

 

 

 

 

 

 

Disk usage decreased dramatically thanks to WiredTiger compression

2015-09-14-174137_317x180_scrot

 

 

 

 

The last and next step

As of today, we still run our secondaries with the MMapv1 engine and are waiting a few weeks to see if anything goes wrong in the long run. Shall we need to roll back, we’d be able to do so very easily.

Then when we get enough uptime using WiredTiger, we will make the final switch to a full Roarring production cluster !

MongoDB 3.0 upgrade in production : step 3 hope

In our previous attempt to upgrade our production cluster to 3.0, we had to roll back from the WiredTiger engine on primary servers.

Since then, we switched back our whole cluster to 3.0 MMAPv1 which has brought us some better performances than 2.6 with no instability.

Production checklist

We decided to use this increase in performance to allow us some time to fulfil the entire production checklist from MongoDB, especially the migration to XFS. We’re slowly upgrading our servers kernels and resynchronising our data set after migrating from ext4 to XFS.

Ironically, the strong recommendation of XFS in the production checklist appeared 3 days after our failed attempt at WiredTiger… This is frustrating but gives some kind of hope.

I’ll keep on posting on our next steps and results.

Our hero WiredTiger Replica Set

While we were battling with our production cluster, we got a spontaneous major increase in the daily volumes from another platform which was running on a single Replica Set. This application is write intensive and very disk I/O bound. We were killing the disk I/O with almost a continuous 100% usage on the disk write queue.

Despite our frustration with WiredTiger so far, we decided to give it a chance considering that this time we were talking about a single Replica Set. We were very happy to see WiredTiger keep up to its promises with an almost shocking serenity.

Disk I/O went down dramatically, almost as if nothing was happening any more. Compression did magic on our disk usage and our application went Roarrr !

MongoDB 3.0 upgrade in production : step 2 failed

In my previous post regarding the migration of our production cluster to mongoDB 3.0 WiredTiger, we successfully upgraded all the secondaries of our replica-sets with decent performances and (almost, read on) no breakage.

Step 2 plan

The next step of our migration was to test our work load on WiredTiger primaries. After all, this is where the new engine would finally demonstrate all its capabilities.

  • We thus scheduled a step down from our 3.0 MMAPv1 primary servers so that our WiredTiger secondaries would take over.
  • Not migrating the primaries was a safety net in case something went wrong… And boy it went so wrong we’re glad we played it safe that way !
  • We rolled back after 10 minutes of utter bitterness.

The failure

After all the wait and expectation, I can’t express our level of disappointment at work when we saw that the WiredTiger engine could not handle our work load. Our application started immediately to throw 50 to 250 WriteConflict errors per minute !

Turns out that we are affected by this bug and that, of course, we’re not the only ones. So far it seems that it affects collections with :

  • heavy insert / update work loads
  • an unique index (or compound index)

The breakages

We also discovered that we’re affected by a weird mongodump new behaviour where the dumped BSON file does not contain the number of¬†documents that mongodump said it was exporting. This is clearly a new problem because it happened right after all our secondaries switched to WiredTiger.

Since we have to ensure a strong consistency of our exports and that the mongoDB guys don’t seem so keen on moving on the bug (which I surely can understand) there is a large possibility that we’ll have to roll back even the WiredTiger secondaries altogether.

Not to mention that since the 3.0 version, we experience some CPU overloads crashing the entire server on our MMAPv1 primaries that we’re still trying to tackle before opening another JIRA bug…

Sad panda

Of course, any new major release such as 3.0 causes its headaches and brings its lot of bugs. We were ready for this hence the safety steps we took to ensure that we could roll back on any problem.

But as a long time advocate of mongoDB I must admit my frustration, even more after the time it took to get this 3.0 out and all the expectations that came with it.

I hope I can share some better news on the next blog post.

uWSGI, gevent and pymongo 3 threads mayhem

This is a quick heads-up post about a behaviour change when running a gevent based application using the new pymongo 3 driver under uWSGI and its gevent loop.

I was naturally curious about testing this brand new and major update of the python driver for mongoDB so I just played it dumb : update and give a try on our existing code base.

The first thing I noticed instantly is that a vast majority of our applications were suddenly unable to reload gracefully and were force killed by uWSGI after some time !

worker 1 (pid: 9839) is taking too much time to die...NO MERCY !!!

uWSGI’s gevent-wait-for-hub

All our applications must be able to be gracefully reloaded at any time. Some of them are spawning quite a few greenlets on their own so as an added measure of making sure we never loose any running greenlet we use the gevent-wait-for-hub option, which is described as follow :

wait for gevent hub's death instead of the control greenlet

… which does not mean a lot but is explained in a previous uWSGI changelog :

During shutdown only the greenlets spawned by uWSGI are taken in account,
and after all of them are destroyed the process will exit.

This is different from the old approach where the process wait for
ALL the currently available greenlets (and monkeypatched threads).

If you prefer the old behaviour just specify the option gevent-wait-for-hub

pymongo 3

Compared to its previous 2.x versions, one of the overall key aspect of the new pymongo 3 driver is its intensive usage of threads to handle server discovery and connection pools.

Now we can relate this very fact to the gevent-wait-for-hub behaviour explained above :

the process wait for ALL the currently available greenlets
(and monkeypatched threads)

This explained why our applications were hanging until the reload-mercy (force kill) timeout option of uWSGI hit the fan !

conclusion

When using pymongo 3 with the gevent-wait-for-hub option, you have to keep in mind that all of pymongo’s threads (so monkey patched threads) are considered as active greenlets and will thus be waited for termination before uWSGI recycles the worker !

Two options come in mind to handle this properly :

  1. stop using the gevent-wait-for-hub option and change your code to use a gevent pool group to make sure that all of your important greenlets are taken care of when a graceful reload happens (this is how we do it today, the gevent-wait-for-hub option usage was just over protective for us).
  2. modify your code to properly close all your pymongo connections on graceful reloads.

Hope this will save some people the trouble of debugging this ūüėČ

MongoDB 3.0 upgrade in production : first steps

We’ve been running a nice mongoDB cluster in production for several years now in my company.

This cluster suits quite a wide range of use cases from very simple configuration collections to complex queried ones and real time analytics. This versatility has been the strong point of mongoDB for us since the start as it allows different teams to address their different problems using the same technology. We also run some dedicated replica sets for other purposes and network segmentation reasons.

We’ve waited a long time to see the latest 3.0 release features happening. The new WiredTiger storage engine hit the fan at the right time for us since we had reached the limits of our main production cluster and were considering alternatives.

So as surprising it may seem, it’s the first of our mongoDB architecture we’re upgrading to v3.0 as it has become a real necessity.

This post is about sharing our first experience about an ongoing and carefully planned major upgrade of a production cluster and does not claim to be a definitive migration guide.

Upgrade plan and hardware

The upgrade process is well covered in the mongoDB documentation already but I will list the pre-migration base specs of every node of our cluster.

  • mongodb v2.6.8
  • RAID1 spinning HDD 15k rpm for the OS (Gentoo Linux)
  • RAID10 4x SSD for mongoDB files under LVM
  • 64 GB RAM

Our overall philosophy is to keep most of the configuration parameters to their default values to start with. We will start experimenting with them when we have sufficient metrics to compare with later.

Disk (re)partitioning considerations

The master-get-all-the-writes architecture is still one of the main limitation of mongoDB and this does not change with v3.0 so you obviously need to challenge your current disk layout to take advantage of the new WiredTiger engine.

mongoDB 2.6 MMAPv1

Considering our cluster data size, we were forced to use our 4 SSD in a RAID10 as it was the best compromise to preserve performance while providing sufficient data storage capacity.

We often reached the limits of our I/O and moved the journal out of the RAID10 to the mostly idle OS RAID1 with no significant improvements.

mongoDB 3.0 WiredTiger

The main consideration point for us is the new feature allowing to store the indexes in a separate directory. So we anticipated the data storage consumption reduction thanks to the snappy compression and decided to split our RAID10 in two dedicated RAID1.

Our test layout so far is :

  • RAID1 SSD for the data
  • RAID1 SSD for the indexes and journal

Our first node migration

After migrating our mongos and config servers to 3.0, we picked our worst performing secondary node to test the actual migration to WiredTiger. After all, we couldn’t do worse right ?

We are aware that the strong suit of WiredTiger is actually about having the writes directed to it and will surely share our experience of this aspect later.

compression is bliss

To make this comparison accurate, we resynchronized this node totally before migrating to WiredTiger so we could compare a non fragmented MMAPv1 disk usage with the WiredTiger compressed one.

While I can’t disclose the actual values, compression worked like a charm for us with a gain ratio of 3,2 on disk usage (data + indexes) which is way beyond our expectations !

This is the DB Storage graph from MMS, showing a gain ratio of 4 surely due to indexes being in a separate disk now.

2015-05-07-115324_461x184_scrot

 

 

 

 

memory usage

As with the disk usage, the node had been running hot on MMAPv1 before the actual migration so we can compare memory allocation/consumption of both engines.

There again the memory management of WiredTiger and its cache shows great improvement. For now, we left the default setting which has WiredTiger limit its cache to half the available memory of the system. We’ll experiment with this setting later on.

2015-05-07-115347_459x177_scrot

 

 

 

 

connections

This I’m still not sure of the actual cause yet but the connections count is higher and more steady than before on this node.

2015-05-07-123449_454x183_scrot

First impressions

The node is running smooth for several hours now. We are getting acquainted to the new metrics and statistics from WiredTiger. The overall node and I/O load is better than before !

While all the above graphs show huge improvements there is no major change from our applications point of view. We didn’t expect any since this is only one node in a whole cluster and that the main benefits will also come from master node migrations.

I’ll continue to share our experience and progress about our mongoDB 3.0 upgrade.

mongoDB 3.0.1

This is a quite awaited version bump coming to portage and I’m glad to announce it’s made its way to the tree today !

I’ll right away thank a lot Tomas Mozes and Darko Luketic for their amazing help, feedback and patience !

mongodb-3.0.1

I introduced quite some changes in this ebuild which I wanted to share with you and warn you about. MongoDB upstream have stripped quite a bunch of things out of the main mongo core repository which I have in turn split into ebuilds.

Major changes :

  • respect upstream’s optimization flags : unless in debug build, user’s optimization flags will be ignored to prevent crashes and weird behaviour.
  • shared libraries for C/C++ are not built by the core mongo respository anymore, so I removed the static-libs USE flag.
  • various dependencies optimization to trigger a rebuild of mongoDB when one of its linked dependency changes.

app-admin/mongo-tools

The new tools USE flag allows you to pull a new ebuild named app-admin/mongo-tools which installs the commands listed below. Obviously, you can now just install this package if you only need those tools on your machine.

  • mongodump / mongorestore
  • mongoexport / mongoimport
  • mongotop
  • mongofiles
  • mongooplog
  • mongostat
  • bsondump

app-admin/mms-agent

The MMS agent has now some real version numbers and I don’t have to host their source on Gentoo’s infra woodpecker. At the moment there is only the monitoring agent available, shall anyone request the backup one, I’ll be glad to add its support too.

dev-libs/mongo-c(xx)-driver

I took this opportunity to add the dev-libs/mongo-cxx-driver to the tree and bump the mongo-c-driver one. Thank you to Balint SZENTE for his insight on this.

mongoDB v2.6.1

This is a great pleasure to announce the version bump of mongoDB to the brand new v2.6 stable branch !

This bump is not trivial and comes with a lot of changes, please read carefully as you will have to modify your mongodb configuration files !

ebuild changes

As a long time request and to be more in line with upstream’s recommendations (and systemd support) I moved the configuration of the mongoDB daemons to /etc so make sure to adapt to the new YAML format.

  • the mongodb configuration moved from /etc/conf.d/mongodb to the¬†new YAML formatted /etc/mongodb.conf
  • the mongos configuration moved from /etc/conf.d/mongos to the new YAML formatted¬†/etc/mongos.conf
  • the MMS agent configuration file has moved to /etc/mms-agent.conf

The init scripts also have been taken care of :

  • new and modern mongodb, mongos and mms-agent init scripts
  • their /etc/conf.d/ configuration files are only used to modify the init script’s behavior

highlights

The changelog is long and the goal of this post is not to give you an already well covered topic on the release notes but here are my favorite features :

  • MongoDB preserves the order of the document fields¬†following write operations.
  • A new write protocol integrates write operations with write concerns. The protocol also provides improved support for bulk operations.
  • MongoDB can now use index intersection to fulfill queries supported by more than one index.
  • Index Filters¬†to limit which indexes can become the winning plan for a query.
  • Background index build allowed on secondaries.
  • New cleanupOrphaned command to remove orphaned documents from a shard.
  • usePowerOf2Sizes is now the default allocation strategy for all new collections.
  • Removed upward limit of 20 000 connections for the maxIncomingConnections for mongod and mongos.
  • New cursor.maxTimeMS() and corresponding maxTimeMS option for commands to specify a time limit.

Make sure you follow the official upgrade plan to upgrade from a previous version, this release is not a simple drop-in replacement.

thanks

Special thanks go to Johan Bergström for his continuous efforts and responsiveness as well as Mike Limansky and Jason A. Donenfeld.

mongoDB 2.4.10 & pymongo 2.7

I’m pleased to announce those latest mongoDB related bumps. The next version bump will be for the brand new mongoDB 2.6 for which I’ll add some improvements to the Gentoo ebuild so stay tuned ūüėČ

mongodb-2.4.10

  • ¬†fixes some memory leaks
  • start elections if more than one primary is detected
  • fixes issues about indexes building and replication on secondaries
  • chunk size is decreased to 255 KB (from 256 KB) to avoid overhead with usePowerOf2Sizes option

All mongodb-2.4.10 changelog here.

pymongo-2.7

  • of course, the main feature is the mongoDB 2.6 support
  • new bulk write API (I love it)
  • much improved concurrency control for MongoClient
  • support for GridFS queries

All pymongo-2.7 changelog here.

mongoDB v2.4.9/v2.2.7, rabbitMQ v3.2.3

Quick post about some recent bumps.

mongodb-2.4.9 & mongodb-2.2.7

IMPORTANT : These versions fix a mongos bug which could lead it to report a write as successful when it was not. This affects all versions of MongoDB prior to and including v2.4.8.

Stay tuned on mongoDB, the next post will probably talk about the release of pymongo v2.7 which supports some neat futures from the upcoming mongoDB v2.6 series.

rabbitMQ-3.2.3

I skipped a bump post when releasing the v3.2.2 so you should check out the v3.2.3 changelog as well if you’re willing to know more about those bug fix releases.

mongoDB v2.4.8, rabbitMQ v3.2.1, rsyslog v7.4.6

mongodb-2.4.8

You should consider this important update if you have a cluster running v2.4.7. It contains a fix for the config servers which can have them possibly disagree on chunks hashes and thus prevent mongos to start or balancing to happen. See this bug for more info.

rabbitMQ-3.2.1

The famous message queuing server got a nice bunch of bug fixes on a lot of its modules along with some interesting additions such as :

  • support for federated queues
  • report client authentication errors during connection establishment explicitly using connection.close
  • inform clients when memory or disk alarms are set or cleared
  • allow policies to target queues or exchanges or both
  • offer greater control over threshold at which messages are paged to disk
  • allow missing exchanges & queues to be deleted and unbound without generating an AMQP error
  • implement consumer priorities

Full changelog here and here.

rsyslog-7.4.6

This is a bug fix release, nothing too big about it as reported by Thomas D (thanks again).

Please note that rsyslog-7.4.4 is being stabilized, mainly for security purposes.