Replies: 2 comments
-
128K was a sweet spot for HDDs (IIRC), and it's still a good compromise. Larger blocks can help with:
So,
Rootfs usually have random IO, so it's better to use defaults. But you can test any block size you want, it'll be interesting to look at benchmarks :) |
Beta Was this translation helpful? Give feedback.
-
I remember talks about Sun having file system replay data so they could run ~3 years of real-world file system operations very quickly and measure ZFS's performance, fragmentation, et al. Do we have any such data to test ZFS against? |
Beta Was this translation helpful? Give feedback.
-
This was prompted by a query on IRC, effectively of the nature "Why not set a rootfs to recordsize=1M?" and I was unable to give any drawbacks to it. To be honest, I never think about it much; almost everything I run is kept at the default 128K, with a few exceptions for media datasets set to 1M, and postgresql datasets set to 16K.
Consulting the zfsprops(7) manual page, on the
recordsize
property, it mentions that it may be set up to 16 MiB, but caveats that sizes larger than 1 MiB may have a negative impact on I/O latency. This seems to heavily imply that everything up to and including 1 MiB is "basically fine" for most work loads (and you can always override the default anyway as necessary).I had a thought that perhaps this was not done for backwards compatibility concerns, especially sending to old ZFS implementations. I believe this cannot be the case, however, since
zfs send
defaults to breaking up records into new units maxing out at 128K, and the receiving system takes on the task to apply new optimizations for the physical layout. I have successfully tested this by creating a pool with-o version=28
and it was able to receive a dataset with 1M records fine, becoming 128K records on the receiving pool. Of course, the-L
and-w
options tozfs send
can break this compatibility, but that's easily the territory of "it's your own fault for setting those options when the receiver doesn't support it."With both these facets in mind, I am left to wonder why the default remains at 128K, if there's any reason other than this has been the ZFS way since 2005. Naturally, pools lacking the large_blocks feature can only remain at a 128K default, otherwise 1M should probably be fine?
Beta Was this translation helpful? Give feedback.
All reactions