Jim Salter
OpenZFS founding developer Matthew Ahrens merged probably the most sought-after options in ZFS historical past—RAIDz enlargement—into grasp final week. The brand new characteristic permits a ZFS person to develop the dimensions of a single RAIDz vdev. For instance, you should utilize the brand new characteristic to show a three-disk RAIDz1 right into a 4, 5, or six RAIDz1.
OpenZFS is a fancy filesystem, and issues are essentially going to get a bit chewy explaining how the characteristic works. So if you happen to’re a ZFS beginner, it’s possible you’ll wish to refer again to our complete ZFS 101 introduction.
Increasing storage in ZFS
Along with being a filesystem, ZFS is a storage array and quantity supervisor, that means which you could feed it an entire pile of disk units, not only one. The guts of a ZFS storage system is thezpool
—that is probably the most basic degree of ZFS storage. The zpool
in flip incorporates vdevs
, and vdevs
comprise precise disks inside them. Writes are break up into items known as information
or blocks
, that are then distributed semi-evenly among the many vdevs
.
A storage vdev
will be considered one of 5 sorts—a single disk, mirror, RAIDz1
, RAIDz2
, or RAIDz3
. You’ll be able to add extra vdevs
to a zpool
, and you may connect
extra disks to a single or mirror vdev
. However managing storage this fashion requires some planning forward and budgeting—which hobbyists and homelabbers often aren’t too keen about.
Typical RAID
, which doesn’t share the “pool” idea with ZFS, usually provides the power to develop and/or reshape an array in-place. For instance, you may add a single disk to a six-disk RAID6
array, thereby turning it right into a seven-disk RAID6
array. Present process a reside reshaping will be fairly painful, particularly on nearly-full arrays; it is solely doable that such a process may require per week or extra, with array efficiency restricted to 1 / 4 or much less of regular all the time.
Traditionally, ZFS has eschewed this form of enlargement. ZFS was initially developed for enterprise use, and reside array re-shaping is mostly a non-starter within the enterprise world. Dropping your storage’s efficiency to unusable ranges for days on finish usually prices extra in payroll and overhead than shopping for a wholly new set of {hardware} would. Stay enlargement can also be probably very harmful because it entails studying and re-writing all knowledge and places the array in a brief and much much less well-tested “half this, half that” situation till it completes.
For customers with many disks, the brand new RAIDz
enlargement is unlikely to materially change how they use ZFS. It is going to nonetheless be each simpler and extra sensible to handle vdevs
as full items quite than making an attempt to muck about inside them. However hobbyists, homelabbers, and small customers who run ZFS with a single vdev
will doubtless get a number of use out of the brand new characteristic.
How does it work?

From a sensible perspective, Ahrens’ new vdev
enlargement characteristic merely provides new capabilities to an present command, specifically, zpool connect
, which is often used so as to add a disk to a single-disk vdev
(turning it right into a mirror vdev
) or add an additional disk to a mirror
(for instance, turning a two-disk mirror
right into a three-disk mirror
).
With the brand new code, you’ll connect
new disks to an present RAIDz
vdev as effectively. Doing so expands the vdev in width, however doesn’t change the vdev
sort, so you possibly can flip a six-disk RAIDz2
vdev right into a seven-disk RAIDz2
vdev, however you cannot flip it right into a seven-disk RAIDz3
.
Upon issuing your zpool connect
command, the enlargement begins. Throughout enlargement, every block
or document
is learn from the vdev
being expanded and is then rewritten. The sectors of the rewritten block
are distributed amongst all disks within the vdev
, together with the brand new disk(s), however the width of the stripe itself is just not modified. So a RAIDz2 vdev
expanded from six disks to 10 will nonetheless be filled with six-wide stripes after enlargement completes.
So whereas the person will see the additional house made obtainable by the brand new disks, the storage effectivity of the expanded knowledge will not have improved on account of them. Within the instance above, we went from a six-disk RAIDz2
with a nominal storage effectivity of 67% (4 of each six sectors are knowledge) to a ten-disk RAIDz2
. Knowledge newly written to the ten-disk RAIDz2 has a nominal storage effectivity of 80%—eight of each ten sectors are knowledge—however the outdated, expanded knowledge remains to be written in six-wide stripes, so it nonetheless has the outdated 67% storage effectivity.
It is price noting that this is not an surprising or weird state for a vdev to be in—RAIDz
already makes use of a dynamic, variable stripe width to account for blocks
or information
too small to stripe throughout all of the disks in a single vdev
.
For instance, if you happen to write a single metadata block—the information containing a file’s title, permissions, and site on disk—it suits inside a single sector
on disk. If you happen to write that metadata block to a ten-wide RAIDz2
, you do not write a full ten-wide stripe—as a substitute, you write an undersized block
solely three disks huge; a single knowledge sector
plus two parity sectors
. So the “undersized” blocks
in a newly expanded RAIDz
vdev aren’t something for ZFS to get confused about. They’re simply one other day on the workplace.
Is there any lasting efficiency affect?
As we mentioned above, a newly expanded RAIDz vdev
will not look fairly like one designed that means from “start”—a minimum of, not at first. Though there are extra disks within the combine, the interior construction of the information is not modified.
Including a number of new disks to the vdev
implies that it must be able to considerably greater throughput. Regardless that the legacy blocks
do not span all the width of the vdev
, the added disks imply extra spindles to distribute the work round. This in all probability will not make for a jaw-dropping velocity improve, although—six-wide stripes on a seven-disk vdev
imply that you simply nonetheless cannot learn or write two blocks
concurrently, so any velocity enhancements are more likely to be minor.
The online affect to efficiency will be troublesome to foretell. If you’re increasing from a six-disk RAIDz2
to a seven-disk RAIDz2
, for instance, your unique six-disk configuration did not want any padding. A 128KiB block
will be reduce evenly into 4 32KiB knowledge items, with two 32KiB parity items. The identical document break up amongst seven disks requires padding as a result of 128KiB / 5 knowledge items does not come out to a fair variety of sectors.
Equally, in some circumstances—notably with a small recordsize
or volblocksize
set—the workload per particular person disk could also be considerably much less difficult within the older, narrower format than within the newer, wider one. A 128KiB block
break up into 32KiB items for a six-wide RAIDz2
will be learn or written extra effectively per disk than one break up into 16KiB items for a ten-wide RAIDz2
, for instance—so it is a bit of a crapshoot whether or not extra disks however smaller items will present extra throughput than fewer disks however bigger items did.
The one factor you will be sure of is that the newly expanded configuration ought to usually be a minimum of as performant as the unique, non-expanded model—and that after the vast majority of knowledge is (re)written within the new width, the expanded vdev
will not carry out any otherwise, or be any much less dependable, than one which was designed that means from the beginning.
Why not reshape information/blocks throughout enlargement?
It might sound odd that the preliminary enlargement course of does not rewrite all present blocks
to the brand new width whereas it is working—in spite of everything, it is studying and re-writing the information anyway, proper? We requested Ahrens why the unique width was left as-is, and the reply boils all the way down to “it is simpler and safer that means.”
One key issue to acknowledge is that technically, the enlargement is not shifting blocks
; it is simply shifting sectors
. The way in which it is written, the enlargement code does not have to know the place ZFS’s logical block
boundaries are—the enlargement routine has no concept whether or not a person sector
is parity or knowledge, not to mention which block
it belongs to.
Enlargement might traverse all of the block
tips to find block
boundaries, and then it might know which sector
belongs to what block
and the right way to re-shape the block
, however in accordance with Ahrens, doing issues that means could be extraordinarily invasive to ZFS’s on-disk format. The enlargement would want to repeatedly replace spacemaps
on metaslabs
to account for modifications within the on-disk measurement of every block
—and if the block
is a part of a dataset
quite than a zvol
, replace the per-dataset and per-file house accounting as effectively.
If it actually makes your tooth itch understanding you’ve four-wide stripes on a freshly five-wide vdev, you possibly can simply learn and re-write your knowledge your self after enlargement completes. The best means to do that is to make use of zfs snapshot
, zfs ship
, and zfs obtain
to duplicate total datasets
and zvols
. If you happen to’re not apprehensive about ZFS properties, a easy mv
operation will do the trick.
Nonetheless, we would suggest usually simply stress-free and letting ZFS do its factor. Your undersized blocks
from older knowledge aren’t actually hurting something, and as you naturally delete and/or alter knowledge over the lifetime of the vdev
, most of them will get re-written naturally as crucial, with out the necessity for admin intervention or lengthy durations of excessive storage load on account of obsessively studying and re-writing every little thing abruptly.
When will RAIDz enlargement hit manufacturing?
Ahrens’ new code is in OpenZFS grasp however is just not but part of any OpenZFS launch, not to mention added to anybody else’s repositories. We requested Ahrens after we may count on to see the code in manufacturing, and sadly, will probably be some time.
RAIDz enlargement hit grasp too late to be included within the upcoming OpenZFS 2.1 launch, anticipated very quickly (2.1 launch candidate 7 is on the market now). It must be included within the subsequent main OpenZFS launch; it is too early for concrete dates, however main releases usually occur about as soon as per 12 months.
Broadly talking, we count on RAIDz enlargement to hit manufacturing within the likes of Ubuntu and FreeBSD someplace round August 2022, however that is only a guess. TrueNAS could very effectively put it into manufacturing earlier than that, since ixSystems tends to tug ZFS options from grasp earlier than they formally hit launch standing.
Matt Ahrens’ offered RAIDz enlargement on the FreeBSD Developer Summit—his speak begins at 1 hour 41 minutes on this video.