A small rant on Galera & XtraDB Cluster

I had to install Percona XtraDB Cluster, I think for the first time since it was announced stable. I remembered many problems I faced with beta releases, which was understandable given they were only for a preview, but this time I hoped for significant improvements.

I have to say I am generally quite sensitive about simple problems that could/should be easily discovered and corrected. Well, it didn’t take five minutes to see a few of such problems. These minutes I spent installing the database binaries from Percona Yum repository. It turned out that was enough to see a lot of errors for no reason. Not a good thing.

[..]
  Installing : 1:Percona-XtraDB-Cluster-server-5.5.23-23.5.333.rhel6.x86_64         5/5
ls: cannot access /var/lib/mysql/*.err: No such file or directory
ls: cannot access /var/lib/mysql/*.err: No such file or directory
[Note] Flashcache bypass: disabled

Why does it matter that ls: cannot access /var/lib/mysql/*.err: No such file or directory? Assuming in many cases I will be installing on a clean system I wouldn’t even expect /var/lib/mysql should exist first thing after installing the binaries.

A bit more silly was the following error:

WSREP: Failed to read output of: '/sbin/ifconfig | grep -m1 -1 -E '^[a-z]?eth[0-9]' | tail -n 1 | awk '{ print $2 }' | awk -F : '{ print $2 }''
[Warning] WSREP: Failed to autoguess base node address
[Note] WSREP: Service disconnected.
[Note] WSREP: Some threads may fail to exit.

It seems to be an element of some network interface discovery system, which is cool that somebody thought of that. But why does it limit the search to eth* devices and then it fails? Who actually uses raw eth0 device in a database cluster? The common practice is to create link aggregation for redundancy, which Linux provides through its bonding driver. The cluster software should look for /proc/net/bonding/ and check its contents first and only if it finds nothing there, it should try to locate any other usable devices. And even that last part should never crash like that. I would completely understand a simple message that auto-discovery failed and I should provide the valid network address myself, but not a pile of unhandled problems.

Then… another random error:

Percona recommends that all production deployments be protected with a support
contract (http://www.percona.com/mysql-suppport/) to ensure the highest uptime,
be eligible for hot fixes, and boost your team's productivity.
/var/tmp/rpm-tmp.6TRZxd: line 82: x1: command not found

What is the mysterious x1 command? Is the support missing after all?

I do not think these are big problems at all. I could point out much worse problems in a lot less complex programs, but on the other hand it may indicate lack of proper testing, which given the nature of this particular piece of software, doesn’t sound right.

[MySQL Health Check]
About Maciej Dobrzanski

A MySQL consultant with the primary focus on systems, databases and application stacks performance and scalability. Expert on open source technologies such as Linux, BSD, Apache, nginx, MySQL, and many more. @linkedin

Comments

  1. (note I work for Percona, but this is just my opinion)
    >It seems to be an element of some network interface discovery system, which is cool that somebody thought of that. But why does it limit the search to eth* devices and then it fails?

    Well, I’d say it is a simple discovery mechanism that was added for convenience of the end user, and which hopefully should get better over time (see a possible better version below). It does report that detection failed so that The person at Galera (the authors of wsrep) that wrote the script is likely a developer (thus not likely familiar with the particulars of bonded network interfaces) who incorrectly assumed that all ethernet interfaces start with “eth”, even bonded ones (if they even know those exist). Since most testing of software is done in virtual environments or test environments (environments not likely to have bonded interfaces) the glitch would not be caught by QA testing. It is not possible to match every possible production configuration in a software development company’s test environment, so occasionally bugs or missing features will be found in the real world.

    I filed a bug report and included an improved version of the detection script which should be interface agnostic.

    https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1007554

    • Justin,

      I do not agree this is “every possible production configuration”, because implementing any high availability solution essentially implies redundancy on all levels, so a system that uses link aggregation is by default the most typical configuration. Otherwise I mentioned already that it wasn’t a big problem at that point, but I didn’t expect to see so much garbage in just one place from a “mature” release.

  2. Alex says:

    Was googling around on Galera and found this. I will have to agree with Maciej. It looks like maybe Percona was only testing using virtual machines? Even with virtual machines though, you can test with bonding interfaces…. just bonding the linux dummy interface should work. Physical machines are always bonded, there is no database server that isn’t a virtual machine, that doesn’t use bonded interfaces. So from an Enterprise perspective… this is somewhat concerning…

  3. ron says:

    if you need the script to handle your config, you better off not using something like percona and stick with a xampp install,
    or just go for clustercontrol at severalnines.
    what is so hard about setting your network yourself?
    i find it pretty nice that percona checks for .err files .. nothing to worry about. just makes things go the right place.

Speak Your Mind

*