8

I have been reading about disk recently which led me to 3 different doubts. And I am not able to link them together. Three different terms I am confused with are block size, IO and Performance.

I was reading about superblock at slashroot when I encountered the statement

Less IOPS will be performed if you have larger block size for your file system.

From this I understand that If I want to read 1024 KB of data, a disk(say A) with block size 4KB/4096B would take more IO than a disk(Say B) with block size of 64KB.

Now my question is how much more IO would disk A need ?.

As far as I am understanding the number of IO request required to read this data would also be dependent on the size of each IO request.

  • So who is deciding what is the size of the IO request? Is it equal to the block size? Some people say that your application decides the size of IO request which seems fair enough but how then OS divides the single request in multiple IO. There must be a limit after which the request splits in more then one IO. How to find that limit ?
  • Is it possible that in both disk (A and B) the data can be read in same number of IO?
  • Does reading each block means a single IO ? If not how many blocks can be maximum read in a single IO?
  • If the data is sequential or random spread, does CPU provides all block address to read once?

Also

num of IOPS possible = 1 /(average rotational delay + avg seek time)

Throughput = IOPS * IO size

From above the IOPS for a disk would be fix always but IO size can be variable. So to calculate the maximum possible throughput we would need maximum IO size. And from this what I understand is If I want to increase throughput from a disk I would do request with maximum data I can send in a request. Is this assumption correct ?

I apologize for too many questions but I have been reading about this for a while and could not get any satisfactory answers. I found different views on the same.

2 Answers 2

5

I think the Wikipedia article explains it well enough:

Absent simultaneous specifications of response-time and workload, IOPS are essentially meaningless.
...
Like benchmarks, IOPS numbers published by storage device manufacturers do not directly relate to real-world application performance. ...

Now to your questions:

So who is deciding what is the size of the IO request?

That is a both an easy and a difficult question to answer for a non-programmer like myself.

As usual the answer is an unsatisfactory "it depends"...

I/O operations with regards to disk storage by an application are usually system calls to the operating system and their size depends on which system call is made...

I'm more familiar with Linux than other operating systems, so I'll use that as reference.

The size of I/O operations such as open() , stat() , chmod() and similar is almost negligible.
On a spinning disk the performance of those calls is mainly dependant on how much the disk actuator needs to move the arm and read head the correct position on the disk platter.

On the other hand the size of a read() and write() calls is initially set by the application and can vary between 0 and 0x7ffff000 (2,147,479,552) bytes in a single I/O request...

Of course once such a system call has been made by the application and is received by the OS, the call will get scheduled and queued (depending on wether or not the O_DIRECT flag was used to by-pass the page cache and buffers and direct I/O was selected).

The abstract system call will need to be mapped to/from operations on the underlying file-system which is ordered in discrete blocks (the size of which is usually set when the file-system was created) and eventually the disk driver operates on either hard disk sectors of 512 or 4096 bytes or SSD memory pages of 2K, 4K, 8K, or 16K.

(For benchmarks typically the read and write calls are usually set to either 512B or 4KB which align really well with the underlying disk resulting in optimal performance.)

There must be a limit after which the request splits in more then one IO. How to find that limit ?

Yes there is a limit, on Linux as documented in the manual a single read() or write() system call will return a maximum of 0x7ffff000 (2,147,479,552) bytes. To read larger files larger you will need additional system calls.

Does reading each block means a single IO ?

As far as I understand typically each occurrence of a system call is what counts as an IO event.

A single read() system call counts as 1 I/0 event and neither X nor Y IO's regardless of how that system call get translated/implemented to accessing X blocks from a filesystem or reading Y sectors from a spinning hard disk.

2
  • 2
    thanks a lot for answering. I think I understood what you explained so what essentially gist is there is no direct relation between IO and block size. However if that's the case would it be correct to say that the statement "Less IOPS required with larger block size" does not hold true? Jul 12, 2018 at 8:22
  • @AnkitKulkarni Generally speaking it's easier to reach higher I/O throughput speeds with larger block sizes because you do less work per I/O to access a larger region. At the end of the day whether you can reach the maximum throughput you can with 4KByte blocks compared to 64KByte blocks will depend on various bottlenecks in the I/O stack but you will have surely done so by spending "more effort". "who is deciding what is the size of the IO request" ultimately the kernel (based on various things), see unix.stackexchange.com/a/533845/109111 for some discussion.
    – Anon
    Sep 30, 2019 at 8:20
0

Seems like you're trying to decode this statement:

"Less IOPS will be performed if you have larger block size for your file system."

Let me try rephrasing this statement to make the original author's meaning more clear:

"To read a given file with a particular size (say, 10MB), a file-system formatted with a larger block size will probably need to carry out a lower number of read operations than a file-system formatted with a smaller block size."

I hope my rephrasing makes a bit more sense than the original.

To properly parse that statement and to understand the reason for a) use of the term "filesystem" instead of disk and b) that pesky "probably", you'll need to learn a lot more about all the software layers between the data sitting on a disk (or SSD) and the userland applications. I can give you a few pointers to start googling:

For spinning disks:

  • sector size (disk) vs blocksize (filesystem)

Learn about caching:

  • Page/Buffer cache in OS kernel

  • I/O caching in user level libraries ( most important of which are libc and libc++ )

For SSD's or other flash based storage, there are some additional complications. You should look up how flash storage works in units of Pages and why any flash based storage requires a garbage collection process.

2
  • Thanks chetan for answering.I was going through medium.com/databasss/… the article to understand the same however as per @HBruijin answer each time a system call is made an IO event occur,& say if a single read IO call is made, it can read up to ~2GB(man7.org/linux/man-pages/man2/read.2.html#NOTES0) so my understanding is "it doesn't matter what block size a filesystem would have" and all what depends is how much bytes a single read call is set to do so IOPS is independent of block size.So I am confused that If I understood it correct? Jul 19, 2018 at 10:03
  • 1
    @AnkitKulkarni the problem is that you seem to be mixing and matching information for different layers of the stack and trying to make sense of it. The read() manpage you pointed to is a library call available to a C program and it doesn't have to map directly to a single read syscall. In general, the unix i/o system contains many layers from disk/ssd/controller-cache/device-drivers/virtual-memory-and-filesytem/user-level-libraries etc. And to correlated app code actions to resultant disk ops, you need to understand the role of each layer. In other words, there's no simple direct mapping.
    – chetan
    Jul 21, 2018 at 6:25

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .