The current CouchDB ecosystem

Over the last few weeks I have started to familiarise myself with the current state of the CouchDB ecosystem. I hope that this will be the first of several posts in which I will be able to detail some of the things I been able to learn about CouchDB so far, and also in the future when we are finally able to put this database into production.

With all the different companies, similar sounding projects names and technology buzzwords surrounding CouchDB, it can often seem very confusing, and some people have found it difficult to sift through all the jargon, and come away with a firm grasp on exactly who and what is behind CouchDB.

Let’s start off by defining some of the players and an overview of what they provide:

1) Apache CouchDB – is an open source, document-oriented database, it is part of the new breed of databases commonly referred to by some as NoSQL datastores.  CouchDB uses JSON to store documents, and you can interact with the database using a RESTful JSON API.

2) Couchbase – is a company that provides software and enterprise support for several projects that are based on the CouchDB source code. Lets take a look at a couple of these projects in more detail:

  • Membase Server (Couchbase) – is an elastic, distributed, key-value database management system optimized for storing data for web applications. For anyone already familiar with with memcached, Membase is basically memcached on steroids, it allows you to build a distributed cluster of memcached instances, and it provides options for both persistent and non persistant storage. Another nice feature provided by Membase is a web gui that is displays all sorts of useful statistics that can help you understand exactly what is going on with your servers.
  • Couchbase Single Server – is the software package that you would download if you were looking for a replacement for (or an equivalent to) Couchdb. This is basically the stock Apache Couchdb source code, with the geocouch extension enabled, as well as some additional patches provided by the developers that work for Couchbase.
  • Couchbase Server 2.0 – represents the future of both the Couchbase and Membase codebases. Server 2.0 basically removes the SQLite backend that is currently being used by Membase, and replaces it with CouchDB. At this point Server 2.0 has been released with a ‘Developer Preview’ status, and thus I do not believe it is quite ready for production use. Currently Couchbase Server 2.0 only allows you to access the data stored in the backend CouchDB database using the memcache protocol (you have to go through memcache to access the data stored in CouchDB). Future versions (3.0, etc) promise to allow you to access the data via both the memcache protocols as well as the CoucbDB RESTful JSON API, but this is currently not the case, and will  most likely not be available for some time.

3) BigCouch – an open source version of CouchDB written in Erlang, that allows scaling beyond a master/slave architecture via database sharding, A BigCouch deployment will be seen as a single large CouchDB instance from the application perspective.

4) Couchdb-lounge – an open source project which uses Nginx and Python to provide a proxy based framework to achieve additional scaling beyond a master/slave architecture for Couchdb.

5) Cloudant – an enterprise software company which provides CouchDB hosting, enterprise support, as well as being the company behind BigCouch.

Ubuntu 11.10 + Gnome Shell + ATI drivers + multiple monitors

**UPDATE**

Dane (see comments) pointed out that ATI has in fact released the 11.10 version of their drivers, I went ahead and gave them a try and using them broke most things for me.

Once I booted back in to Gnome…I had some of the Gnome3 look and feel…but everything else (menus, icons, etc) were clearly from Gnome2.  I reinstalled version 11.9 and everything was back to normal.  This update might work for some other setups…but for now I’ll just stick with the version that is working 95% of the time.

——————————————————————————————————————————————————————

I was finally able to get a working desktop using Ubuntu 11.10, Gnome Shell, Gnome 3.2 along with my Radeon HD 2400 XT video card.  The adventure started a few weeks ago when I tried to setup my existing Ubuntu 11.04 desktop using some PPA repositories I found online.

I was able to successfully upgrade from Ubuntu 11.04 to 11.10 beta, and  since the 11.10 final release was right around the corner I figured it was safe to go ahead and give it a try.  The upgrade went well, but I spent the next day fighting to try and get gnome-shell to play nicely with my Radeon card using the existing ATI drivers.

I ended up starting from scratch a few days later, by backing up some important files in my home directory and doing a clean install of 11.10 once the final version was released.

After doing an update and installing some other packages such as  ubuntu-restricted-extras, vlc, pidgin, etc installing gnome-shell was painless:

# apt-get install gnome-shell

After rebooting, I logged in to find some of the same problems as before with this desktop install (screen tearing, blurry icons, multicolored menus, etc). I found some posts around the net that alluded to the fact that I might be able to solve some of my problems if I used the latest drivers (version 11.9) off the ATI website.

On the other hand, I found other posts by people claiming that even using the latest drivers had not completely solved all their problems and that ATI would be releasing version 11.10 sometime within the next 2 to 3 weeks, and that this new version would be specifically tested against Gnome 3.x (and fix the remaining bugs).

Anyway, I decided that I had nothing to lose at this point and decided to grab the latest version from the web:

# mkdir ati-11.9; cd ati-11.9
# wget http://www2.ati.com/drivers/linux/ati-driver-installer-11-9-x86.x86_64.run
# sh ati-driver-installer-11-9-x86.x86_64.run –buildpkg Ubuntu/oneiric
# dpkg -i fglrx*.deb
# aticonfig –initial -f

After rebooting my machine again, I was pleasantly surprised to see that everything was looking good, no more problems with screen tearing and all my icons and menus were seemingly in order.

The only thing I needed to do now was to setup my multiple monitors correctly, since at that point I was staring at two cloned spaces instead of one large desktop spread across both my two 24″ monitors.

First I launched the Catalyst control panel:

# gksu amdcccle

Under the ‘Display Manager’ page I had to select ‘Multi-display desktop with display’

***FOR EACH OF MY TWO MONITORS****

After a reboot I went into the Gnome ‘System Settings’ and choose ‘Displays’….I was finally able to uncheck ‘Mirror displays’ and hit ‘Apply’ without error.

The final two steps required for me to getting everything working %100 correctly was to install the gnome-tweak-tool:

# apt-get install gnome-tweak-tool

and disable the ‘Have file manager handle the desktop’ option in the ‘Desktop’ section (that did away with the extra menu I was seeing).

The final step in the process involved installing a new theme…I really liked the Elementary them found here. So that is the one I choose….now everything is working as it should be!

Proxmox 1.9 and Java oom problems

Ever since we upgraded from Proxmox 1.8 to version 1.9 we have had users who have periodically complained about receiving out of memory errors when attempting to start or restart their java apps.

The following two threads contain a little bit more information about the problems people are seeing:

1)Proxmox mailing list thread
2)Openvz mailing list thread

At least one of the threads suggest you allocate a minimum of 2 cpu’s per VM in order to remedy the issue.  We already have 2 cpu’s per VM, so that was not a possible workaround for us.

Another suggestion made by one of the posters was to  revert back to using a previous version of the kernel, or downgrade Proxmox 1.9 to Proxmox 1.8 altogether.

I decided I would try to figure out a work around that did not involving downgrading software versions.

At first I tried to allocate additional memory to the VM’s and that seemed to resolve the issue for a short period of time, however after several days I once again started to hear about out of memory errors with Java.

After checking ‘/proc/user_beancounters’ on several of the VM’s,  I noticed that the failcnt numbers on the  ‘privvmpages’ parameter was increasing steadily over time.

The solution so far for us has been to increase the ‘privvmpages’ parameter (in my case I simply doubled it) to such a level that these errors are no longer incrementing the ‘failcnt’ counter.

If you would like to learn more about the various UBC parameters that can be modified inside openvz you can check out this link.

Upgrading Debian

After spending the last two weeks upgrading various versions of Debian to Squeeze, I figured I would post the details of how to upgrade each version, starting from Debian 3.1 to Debian 6.0.

The safest way to upgrade to Debian Squeeze is to upgrade from the prior version until you reach version 6.x.  In order words, if you are upgrading from Debian 4.x, need to upgrade to Debian 5.x and THEN to Debian 6.x.  Direct upgrades are not at all recommended.

Here are the steps that I took when I upgrading between various versions.

Sarge to Etch:

I was able to upgrade all of our Debian 3.1 machines to Debian 4.0 using the following commands.  I did not encounter any real surprises when I upgraded any of our physical of virtual machines.

You can upgrade using apt and the following commands:

# apt-get update
# apt-get dist-upgrade

Etch to Lenny:

The only real issue to note when upgrading from Debian 4.0 to 5.0, is that Lenny does not provide the drivers by default for any of the Broadcom network adapter drivers used by a majority of our Dell servers.  This caused some stress for me since I was doing the upgrades without physical access to the servers, so after I completed the upgrade to 5.0 and rebooted the server, of course I was not able to access the server because the NIC cards were no longer recognised by Debian.

In order to resolve this issue you will need to install the ‘firmware-bnx2‘ package after you do the upgrade but BEFORE you reboot the server.

The reason that the Debian team does not include these drivers by default is due to license restrictions placed on the firmware.  If you want to read more about this issue you can view the very short bug report here.

The best tool for upgrading to Debian 5 is aptitude:

# aptitude update
# aptitude install apt dpkg aptitude
# aptitude full-upgrade

Lenny to Squeeze:

Upgrading Debian 5.o to 6.0 was also relatively painless as well.  One issue that I did run into revolved around the new version of udev and kernel versions prior to 2.6.26.  We had a few servers that were using kernel versions in the 2.6.18 range and if don’t upgrade the kernel version before you reboot, you may have issues with certain devices not being recognized or named correctly and thus you may have issues that prevent a successful bootup.

You can use the following apt commands to complete the upgrade process:

# apt-get update
# apt-get dist-upgrade -u

Here are the repo’s that used while doing the upgrades:

#Debian Etch-4deb http://archive.debian.org/debian/ etch main non-free contrib
deb-src http://archive.debian.org/debian/ etch main non-free contrib

deb http://archive.debian.org/debian-security/ etch/updates main non-free contrib
deb-src http://archive.debian.org/debian-security/ etch/updates main non-free contrib

# Debian Lenny-5
deb http://archive.debian.org/debian/ lenny main contrib non-free
deb-src http://archive.debian.org/debian/ lenny main contrib non-free

deb http://archive.debian.org/debian-security lenny/updates main contrib non-free
deb-src http://archive.debian.org/debian-security lenny/updates main contrib non-free

deb http://archive.debian.org/debian-volatile lenny/volatile main contrib non-free
deb-src http://archive.debian.org/debian-volatile lenny/volatile main contrib non-free

# Debian Squeeze-6
deb http://ftp.us.debian.org/debian squeeze main contrib non-free

deb http://ftp.debian.org/debian/ squeeze-updates main contrib non-free
deb http://security.debian.org/ squeeze/updates main contrib non-free

Redhat to purchase Gluster

Redhat released a statement today in which they announced their plans to acquire Gluster, the company behind the open source scalable filesystem GlusterFS.

Only time will tell exactly what this means for the project, community, etc, but based on the fact that Redhat has a fairly good track record with the open source community, and given the statements they made in their FAQ, I can only assume that we will continue to see GlusterFS grow and mature into a tool that extends reliably into the enterprise environment.

Gluster also provided several statements via their website today as well, you can read a statement from the founders here, as well as an additional Gluster press release here.

Proxmox 2.0 beta released

Martin Maurer sent an email to the Proxmox-users mailing list this morning announcing that a version 2.0 beta ISO had been made available for download.

Here are some links that will provide further information on this latest release:

Roadmap and feature overview:
http://pve.proxmox.com/wiki/Roadmap#Roadmap_for_2.x

Preliminary 2.0 documentation:
http://pve.proxmox.com/wiki/Category:Proxmox_VE_2.0

Community tools (Bugzilla, Git, etc):
http://www.proxmox.com/products/proxmox-ve/get-involved

Proxmox VE 2.0 beta forum:
http://forum.proxmox.com/forums/16-Proxmox-VE-2.0-beta

Downloads:
http://www.proxmox.com/downloads/proxmox-ve/17-iso-images

I have not had a chance to install a test node using this latest 2.0 beta codebase, however I expect to have a two node cluster up and running in the next week or so, and after I do I will will follow up with another blog post detailing my thoughts.

Thanks again to Martin and Dietmar for all their hard work so far on this great open source project!

ZFS crash during high I/O

After successfully completing a ‘zfs replace’ I was not so pleased to get the following error message back from ‘zfs detach’:

cannot detach c5t17d0: no valid replicas

I decided that I would upgrade this OpenSolaris 2008.11 instance to OpenSolaris 2009.06 in order to see if the obvious bug that I was encountering was resolved in the newest version. Since upgrading in OpenSolaris supports automatic boot environment creation, there really is not much danger at all in updating because you can always boot back into the other environment at any time.

The upgrade was a success, and after I booted into 2009.06 I was able to simply detach the failed drive from the pool and thus remove it from the system.

I recompiled gluster and I ran 2009.06 for a couple of days, until I started noticing that the server was rebooting during times of high I/O. A peek inside ‘/var/adm/messages’ revealed the following errors:

Aug 15 22:33:04 cybertron unix: [ID 836849 kern.notice]
Aug 15 22:33:04 cybertron ^Mpanic[cpu0]/thread=ffffff091060c900:
Aug 15 22:33:04 cybertron genunix: [ID 783603 kern.notice] Deadlock: cycle in blocking chain
Aug 15 22:33:04 cybertron unix: [ID 100000 kern.notice]
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9651f0 genunix:turnstile_block+795 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965250 unix:mutex_vector_enter+261 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9652f0 zfs:zfs_zget+be ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965380 zfs:zfs_zaccess+7c ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965400 zfs:zfs_lookup+333 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9654a0 genunix:fop_lookup+ed ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965550 genunix:xattr_dir_realdir+8b ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9655a0 genunix:xattr_dir_realvp+5e ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d9655f0 genunix:fop_realvp+32 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965640 genunix:vn_compare+31 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965860 genunix:lookuppnvp+94c ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965900 genunix:lookuppnatcred+11b ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965990 genunix:lookuppnat+69 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965b30 genunix:vn_createat+13a ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965cf0 genunix:vn_openat+1fb ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965e50 genunix:copen+435 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965e80 genunix:openat64+25 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965ec0 genunix:fsat32+f5 ()
Aug 15 22:33:04 cybertron genunix: [ID 655072 kern.notice] ffffff003d965f10 unix:brand_sys_sysenter+1e0 ()
Aug 15 22:33:04 cybertron unix: [ID 100000 kern.notice]
Aug 15 22:33:04 cybertron genunix: [ID 672855 kern.notice] syncing file systems…
Aug 15 22:33:04 cybertron genunix: [ID 904073 kern.notice] done

My efforts to find any further detailes about this bug are ongoing, so at this point I have booted back into 2008.11 and I will be running that until a fix or a workaround is found.

SUNWattr_ro error:Permission denied on OpenSolaris using Gluster 3.0.5–PartII

Recently one of our 3ware 9650SE raid cards started spitting out errors indicating that the unit was repeatedly issuing a bunch of soft resets. The lines in the log look similar to this:

WARNING: tw1: tw_aen_task AEN 0x0039 Buffer ECC error corrected address=0xDF420
WARNING: tw1: tw_aen_task AEN 0x005f Cache synchronization failed; some data lost unit=22
WARNING: tw1: tw_aen_task AEN 0x0001 Controller reset occurred resets=13

I downloaded and installed the latest firmware for the card (version 4.10.00.021), which the release notes claimed had several fixes for cards experiencing soft resets.  Much to my disappointment the resets continued to occur despite the new revised firmware.

The card was under warranty, so I contacted 3ware support and had a new one sent overnight.  The new card seemed to resolve the issues associated with random soft resets, however the resets and the downtime had left this node little out of sync with the other Gluster server.

After doing a ‘zfs replace’ on two bad disks (at this point I am still unsure whether the bad drives where a symptom or the cause of the issues with the raid card, however what I do know is that the Cavier Geen Western Digital drives that are populating this card have a very high error rate, and we are currently in the process of replacing all 24 drives with hitachi ones), I set about trying to initiate a ‘self-heal’ on the known up to date node using the following command:

server2:/zpool/glusterfs# ls -laR *

After some time I decided to tail the log file to see if there were any errors that might indicate a problem with the self heal. Once again the Gluster error log begun to fill up with errors associated with setting extended attributes on SUNWattr_ro.

At that point I began to worry whether or not the AFR (Automatic File Replication) portion of the Replicate/AFR translator was actually working correctly or not.  I started running some tests to determine what exactly was going on.  I began by copying over a few files to test replication.  All the files showed up on both nodes, so far so good.

Next it was time to test AFR so I began deleting a few files off one node and then attempting to self heal those same deleted files.  After a couple of minutes, I re-listed the files and the deleted files had in fact been restored. Despite the successful copy, the errors continued to show up every single time the file/directory was accessed (via stat).  It seemed that even though AFR was able to copy all the files to the new node correctly, gluster for some reason continued to want to self heal the files over and over again.

After finding the function that sets the extended attributes on Solaris, the following patch was created:

— compat.c Tue Aug 23 13:24:33 2011
+++ compat_new.c Tue Aug 23 13:24:49 2011
@@ -193,7 +193,7 @@
{
int attrfd = -1;
int ret = 0;

+
attrfd = attropen (path, key, flags|O_CREAT|O_WRONLY, 0777);
if (attrfd >= 0) {
ftruncate (attrfd, 0);
@@ -200,13 +200,16 @@
ret = write (attrfd, value, size);
close (attrfd);
} else {
– if (errno != ENOENT)
– gf_log (“libglusterfs”, GF_LOG_ERROR,
+ if(!strcmp(key,”SUNWattr_ro”)&&!strcmp(key,”SUNWattr_rw”)) {
+
+ if (errno != ENOENT)
+ gf_log (“libglusterfs”, GF_LOG_ERROR,
“Couldn’t set extended attribute for %s (%d)”,
path, errno);
– return -1;
+ return -1;
+ }
+ return 0;
}

return 0;
}

The patch simply ignores the two Solaris specific extended attributes (SUNWattr_ro and SUNWattr_rw), and returns a ‘0’ to the posix layer instead of a ‘-1’ if either of these is encountered.

We’ve been running this code change on both Solaris nodes for several days and so far so good, the errors are gone and replicate and AFR both seem to be working very well.

Netflix API

Former Director of Application Development at NPR and current Director of Engineering for the Netflix API Daniel Jacobson gave a presentation at this years OSCON,  entitled ‘Redesigning the Netflix API’ and the slides have been posted here, for anyone who is interested in learning more about their API efforts.

Adrian Cockcroft also gave a talk at this years OSCON entitled ‘Data Flow at Netflix’.   The video for that presentation can be found here on the OSCON Youtube channel.