Author Archives: shainmiley

Replication improvements in Mysql 5.5

As promised Rob Young over at Oracle’s Mysql blog has provided us with one more Mysql 5.5 writeup. This time the focus is on some of the new features that you can expect from Mysql 5.5.

Here are the topics covered by Rob’s post:

  • Semi-synchronous Replication
  • Replication Heartbeat
  • Automatic Relay Log Recovery
  • Replication Per Server Filtering
  • Replication Slave Side Data Type Conversions

These are exciting changes, that many people have been looking forward to for quite some time. Including these features in 5.5 will help make sure that replication is even more reliable and more manageable in the future.

Calculating overall database size in Mysql

Recently I had a server that was running low on free disk space, after a bit of digging around I found out that the Mysql database on that particular machine was taking up the bulk of the usable disk space.

Given the fact that this was a shared Mysql instance, I needed to determine which databases were consuming the most amount of space. In order to calculate the total amount of space being used we need to take both the size of the data and all the indexes into account.

I used the following SELECT query, which will return the size of all databases(data + indexes) in MB.

SELECT table_schema "Database Name", sum( data_length + index_length) / 1024 / 1024
"Database Size(MB)" FROM information_schema.TABLES GROUP BY table_schema ;

Running the query above will result in output similar to this:

+--------------------+-------------------+
| Database Name      | Database Size(MB) |
+--------------------+-------------------+
| movies             |     3772.06922913 |
| tmp                |      101.08132978 |
| bikes              |       57.04234117 |
| information_schema |        0.00781250 |
| mysql              |        0.60790825 |
+--------------------+-------------------+

In this case we can clearly see that the ‘movies’ database is consuming the most space. At this point we may want to dig a little deeper and look at the size of each table within the ‘movies’ database, to see where in particular the space is being used.

In order to get some more detail we can use the following SELECT query:

SELECT table_name, table_rows, data_length, index_length,  round(((data_length + index_length) / 1024 / 1024),2) "Size(MB)" FROM information_schema.TABLES WHERE table_schema = "movies";

Running the query above will result in output similar to this:

+-----------------------------+------------+-------------+--------------+----------+
| table_name                  | table_rows | data_length | index_length | Size(MB) |
+-----------------------------+------------+-------------+--------------+----------+
| Id                          |          1 |       16384 |            0 |     0.02 |
| Teaser                      |          1 |       16384 |            0 |     0.02 |
| TeaserLog                   |      21767 |  3177586576 |       392192 |  3030.76 |
| TeaserChild                 |     912602 |    48873472 |     33112064 |    78.19 |
| Director1                   |     460722 |    57229312 |     13156352 |    67.13 |
| Director2                   |    2044044 |    87801856 |            0 |    83.73 |
| City                        |     286134 |    17367040 |     17858560 |    33.59 |
| City_alt_spelling           |       1086 |       65536 |        65536 |     0.13 |
| City_backup                 |     148811 |    13123584 |            0 |    12.52 |
| City_misspelling_log        |     166589 |     9977856 |            0 |     9.52 |
| City_save                   |     148618 |    13123584 |            0 |    12.52 |
+-----------------------------+------------+-------------+--------------+----------+
11 rows in set (0.14 sec)

Based on the output from this SQL query we are able to see that the ‘TeaserLog’ table is using up the majority of space within the ‘movies’ database.

Performance and scalability improvements in Mysql 5.5

Oracle’s Mysql Blog has a very good post that provides an overview of some of the improvements that you can expect in the upcoming Mysql 5.5 release.

This writeup focuses mainly on the changes as they relate to performance and scalability, however the author (Rob Young) expresses his plans to discuss other aspects as well, sometime in the near future.

Here are just a few of the topics covered by Rob:

  • Improved Default Thread Concurrency
  • Improved Recovery Performance
  • Multiple Buffer Pool Instances
  • Native Asynchronous I/O for Linux
  • Improved Metadata Locking Within Transactions
  • Better performance on Windows based installs

At some point I hope to continue my testing and benchmarking of several different versions of Mysql such as MariaDB, Percona and Mysql 5.5. However for production databases we will be sticking with the Mysql 5.1.x code branch for the foreseeable feature.

Gluster 3.1 released

Today the team over at Gluster.com announced the availability of version of Gluster 3.1 of their software.   There are currently two different offerings available from Gluster.  There is the Gluster Storage Platform, known as ‘GlusterSP’ which provides a Linux based bare metal installer, web based front end, etc.

They also offer ‘Glusterfs’ which they release as open source and provides the same functionality of GlusterSP,  but does not require a fresh install like GlusterSP,  but instead,  you can use it on an existing Linux or Solaris based system.

The 3.1 release brings the following new features:

Elastic Volume Management: logical storage volumes are decoupled from physical hardware, allowing administrators to grow, shrink and migrate storage volumes without any application downtime. As storage is added, storage volumes are automatically rebalanced across the cluster making it always available online regardless of changes to the underlying hardware.

New Gluster Console Manager: the Command Line Interface (CLI), Application Programming Interface (API) and shell are merged into a single powerful interface, enabling automation by giving the CLI higher level API’s and scripting capabilities. Languages such as Python, Ruby or PHP can be used to script a series of commands that are invoked through the command line. This new tool requires no new APIs and is able to script out and rapidly automate any information inserted in the CLI allowing cloud administrators the ability to simply automate large scale operations.

Native Network File System (NFS): including a native NFS v3 module which allows storage servers to communicate natively with NFS clients directly to any storage server in the cluster and simultaneously communicates NFS and the Gluster protocol. NFS requires no specialized training, making it simple and easy to deploy.

To find out more about Gluster you can visit Gluster.com, you can also visit Gluster.org if you want to get more familiar with the open source side of the Gluster house.

Ext4 vs Zfs Kernel Module:benchmarks so far.

Well I have finally set aside some time to try and test performance using the zfs kernel module that I blogged about a bit ago.

Overall the zfs kernel module produced results that were similar to the ones I saw while using ext4, however most real world zfs setups are not limited to a single disk, so it will be very interesting to see what kind of performance numbers we will see when we start benchmarking on setups that have many disks.

Although the zfs results were slower in almost every single case, ext4 was not too much faster in most of those cases and I suspect that there are lots of people out there who would be more then willing to take a tiny hit in speed, in order to gain the substantial benefits that comes with having zfs as your underlying filesystem.

Here are some of the benchmarks I got doing the following:

a)create 10,000 files using touch
b)create 10,000 directories using mkdir
c)untar the latest stable linux kernel
d)create a 1GB file using dd
e)find 10,000 files
f)delete 10,000 files
g)find 10,000 directories
h)delete 10,000 directories

At some point soon I plan to add values for raid2z, btrfs, iozone results, etc.

[easychart type=”vertbar” height=”10″ width=”10″ title=”Various File Operations in Seconds” groupnames=”Ext4,Zfs,Zfs-mirror” valuenames=”Touch x 10000,Mkdir x 10000,Untar kernel,Create 1 GB file” group1values=”12.669,14.276,4.997,1.110″ group2values=”13.009,13.015,6.577,6.084″ group3values=”13.044,13.352,9.787,12.208″] [easychart type=”vertbar” height=”10″ width=”10″ title=”Various File Operations in Seconds” groupnames=”Ext4,Zfs,Zfs-mirror” valuenames=”Delete files,Find files,Delete directories,Find directories” group1values=”0.122,0.036,0.163,0.295″ group2values=”0.577,0.096,0.247,0.764″ group3values=”0.526,0.141,0.261,0.690″ ]

ZFS kernel module for Linux

UPDATE: If you are interested in ZFS on linux you have two options at this point:

I have been actively following the  zfsonlinux project because once stable and ready it should offer surperior performance due to the extra overhead that would be incurred by using fuse with the zfs-fuse project.

You can see another one of my posts concerning zfsonlinux here.

————————————————————————————————————————————————————-

KQ Infotech has released (currently in closed beta) code that brings ZFS to Linux via a loadable kernel module.

Here is a link to the current and future feature set.  The reason that this is exciting is that although other ZFS implementations for Linux have traditionally existed, each of the available options have significant drawbacks.  For example  ZFS-FUSE is  implemented in userspace using FUSE, which has additional overhead due to the context switching that is required while switching back and forth between kernel-space and user -space. .

Another option is ZFS on Linux which provides a stable SPA, DMU and ZVOL layer, but does not however provide a Posix layer (ZPL) that would enable you to actually mount a ZFS filesystem from inside Linux.  From what I understand, KQ Infotech has basically taken some of the ZFS on Linux code that was developed by the Lawrence Livermore National Laboratory (LLNL), and actually implemented  the missing ZPL layer.

NPR was recently accepted into the closed beta program,  and I took some time last week to get this module installed on a Dell Poweredge 2950 running a 64 bit version of Ubuntu 10.04.  We are currently testing ZFS under  kernel version  2.6.32-24.  I have not had a ton of time to test things out, but I would say so far so good.  I plan on posting some ZFS and Btrfs benchmarks in the next few weeks after I get some time to better test performance, throughput, etc.

Btrfs: The Story So Far

Here is a link to a video presentation given by Josef Bacik, one of the 3 lead developers currently working on Btrfs.  This presentation was given at at LinuxCon Brazil 2010.  The video lasts about an hour and according to the description provides:

‘A look at the features that currently exist in Btrfs and what features are left to be done. We’ll look at stability and what things testers need to look out for. There will be plenty of benchmarks and use cases for the different features of Btrfs. We will also discuss what testing needs to be done, and how testers can help us developers.’

If you have some questions about the current state of Btrfs, current and future Btrfs development roadmap,  benchmarks, etc… you should take some time to watch this video.

Current state of “Btrfs” File System for Linux

Oracle has provided  a link to a webcast (registration required) on the state of Btrfs given,  by lead Btrfs developer Chris Mason.  Here is an excerpt from the webcast description:

‘Join Chris Mason, Director of Software Development at Oracle, the principal author of Btrfs flie system, and our own resident Linux kernel guru, as he discusses the development, features, benefits of the “Btrfs” file system (pronounced “Butter F S”, “B-tree F S”) in Linux.’

The video lasts about 1 hour and provides a very good overview of the current state of the file system, some of the pros and cons of Btrfs under various workloads, some of the features that have been implemented thus far, as well as some of the tools and features that a slated for future releases.