Please help me understand Size vs Size On Disk

I need to record the “File size” of some files and I do not know, technically, which we should use. When you right-click a file in File Explorer, select Properties – you will see that there are two sizes listed for the same file:

Size and Size on Disk

Which one should we use when we need to quote the size of a file?

I have done my own research but I really have only come across a few analogies but it seems that those analogies are a bit over-complicated and assume more knowledge than most would have.

So what exactly is the difference? Why is there a difference and which one should we use when we need to quote the size of a file?

From what I understand of it, you have Size which is the total size of the file contents, and then you have Size on Disk which I think is the size in total of all the allocations required in order to store this file on Disk. Am I correct in thinking this? Could someone please elaborate?

Please help me understand SQL vs C like programming?

Specifically I am trying to grasp how SQL statements differ from normal C style programming, I can’t seem to understand how to do stuff like for every userid in this table that has another column of

Please help me understand: onkey=Javascript

Please help me understand this line: <textarea onkeyup=’this.rows = (this.value.split(/n).length||1);’ style=overflow-y: hidden;></textarea> I understand most of this line, but there a

please help me understand gitolite users [closed]

For some reason my gitolite didn’t come with a sample config file and i can’t seem to find one anywhere, im sure this question is answered in that file if someone can find me a copy, please!! Anyways,

Please Help Me Understand Scheme: No Arguments?

First time stackoverflow user but occasional lurker, hope you guys can help me out. So the first part of my assignment is to drop all ‘leading zeros’ in a list. ex: (0 0 0 0 0 1 0 1 0 1) -> (1 0 1

Please help me understand anonymous delegates?

I’ve downloaded the VCSharpSample pack from Microsoft and started reading on Anonymous Delegates. I can more or less understand what the code is doing, but I don’t understand the reason behind it. May

Please help me understand callbacks in java

I have been trying to wrap my head around callbacks and have been struggling to grasp the concept. The following code is an example that I found here starting from first to last I understand the flow

Please help me to understand Context

I have hard a big time trying to understand how Context stuff really works. I don’t really need this now, but I am sure I will need this soon… EXAMPLE I made an app called ave and a library (itself

Could you help me understand Pointers please?

I know this has been asked previously but one thing that these other questions didn’t touch upon is why Allow me to explain. I just ran through a tutorial that outputted integers and pointers to show

Please help me understand what this is asking

This question is from my digital logic class but I don’t understand it. Please help me understand what it is asking me. Lets say A1 and A2 are octal shorthand. Perform the following 1’s complement fi

Please help me to understand ClassNotFoundException and NoClassDefFoundError [closed]

Can someone help me to understand ClassNotFoundException and NoClassDefFoundError (with a good example for NoClassDefFoundError)?

Answers

Am I correct in thinking this?

Yes, you are correct.

Now, to explain without analogies.

When a disk is formatted, it is divided into blocks. In fact, if you use tools other than simply right clicking on the disk and allowing Windows or MacOS to format for you, you will see somewhere in the formatting dialog that specifies the block size. By default Windows and Mac auto selects the block size for you. Linux does the same but displays the block size and allows you to change it if you want.

A block is the smallest bit of data that can be read or stored onto the disk via the file system. Yes you can create files with only a byte of data (indeed you can create empty files with zero bytes of data) but the file system can only address blocks of data (128kB for example). So even that one byte file takes 128k of space on disk.

The original problem this blocks mechanism was meant to solve was one of addressing. CPUs have a limited number of bits that is addressable. For really old machines it was 16 bits (the size of one word on the CPU) for old machines it was 32 bits and for newer machines it’s 64 bits. Of course, at 64 bits of address space you can access 16384 petabytes of data on a byte by byte basis so it may seem that it’s unnecessary to divide the disk into blocks. But at 32 bits the amount of addressable data is only 4 gigabytes. Obviously it’s not enough to address every single byte on a disk.

So the solution was to divide the disk into blocks of a certain size. A 32 bit address accessing 128k blocks for example can access 500 terabytes before running out of bits. So that was the origin of why disks are divided into blocks: the size of disks were larger than what CPUs can access.

But even in today’s 64 bit world, it still makes sense to divide disks into blocks. Even if we can access every byte of the disk individually, managing individual bytes is hard. It’s simpler and faster to write algorithms to allocate and manage disk space in blocks. Besides, one day even the 64 bit address will be exhausted.

That is in essence why there is a difference between the size of a file and the size of a file on disk. There is of course another source for the difference: metadata. Even an empty file requires disk space to store the file name and file permissions and location of the file on disk etc. This also takes up disk space.

So which should you use? Well, it depends on what you really want to know about. If you want to know how much to read to find the end of the file then the real file size is the one you should use. If you want to transmit the data over the internet then the real file size is the one you should use. If you want to calculate how full your disk is then the size on disk is the one you should use.

Size is the actual size of the file in bytes.
Size on disk is the actual amount of space being taken up on the disk.

The disk is broken down into tracks and sectors. So, if your sectors size is 512 bytes and your file is actually 513 bytes, the size on disk will be 1024 bytes because it is occupying two sectors.