[SOLVED] About "over provisioning" ?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

karasahin

Distinguished
Sep 28, 2014
88
0
18,640
Hello,

I'm using a Samsung 870 EVO 500 GB SSD with its over provisioning value set to 11%, means 51 GB currently temporarily out of use.

How important is this thing? I guess it is helpful for the disk to last longer but is this true and if so how much longer if not trivial?

Should I disable this feature if the disk is about 90% full?

Thanks.
 
Solution
So this is about longevity of the SSD? The more consumed space above the threshold the more it becomes prone to performance loss and instability? If so, could you tell me after when this will start to become a problem?
To write (save) a new value to a cell, that cell has to first be erased.
The TRIM function erases cells that are not currently used for data, in preparation to accept something in the future.

Given sufficient free space, the drive controller does not have to spend so much time looking for a free cell to write new data to.

As said...keep some free space on the drive.
Personally, I don't go over 80% actual consumed space.
That's not what I'm asking.

Which is worse? An SSD that's free space lower than the threshold or an SSD that's working at a higher temp above the threshold?
Both are equally bad and cause premature wear and tear but for different reasons . Not enough free space causes much more writing because deleting previous data also counts as writing so in essence each bit written to a used cell means 3 writes at least not even counting error fixes that often occur. High temperature, although there are mechanisms to throttle down speed of writing can physically damage electronics gradually or fast if too high. Don't let either happen and you can expect disk duration to go far over manufacturer's estimates or warranty. Underlying reason is technology and architecture of SSDs. Each bit of data is stored in a transistor which could be and stay in one of two modes, off or on and to change state voltage has to be applied to it. Like with all other transistors number of times that happens is more or less finite because of electron migration and other causes. Not all are created equal, some may deteriorate faster than others and if that happens to some critical ones, whole thing can fail or cause file system or data corruption.
Each SSD apart from storage also has a control chip that contains microcode/Firmware with algorithms for writing, reading and other functions, "remembers" SMART table and records to it. When you read or write to a disk, OS is dealing with that chip and not with storage chips directly. Because it works harder it's usually hotter part, more prone to failure because of high temps you see are it's temps, not storage chips which can stand higher temps.
 
  • Like
Reactions: karasahin
For clarity, the SSD doesn't hunt around looking for spare single cells to write values to. I wrote something on this previously. Without repeating it all here, the SSD writes pages (sets of cells) but erases blocks (sets of pages), and it needs erased blocks to write to. That's why it has to do housekeeping and why it gets slower as it fills up.

Logically it doesn't even make any sense that the drive would care about partitions, but it was consistently happening in my tests. The total free space on the drive didn't matter, only the free space within a partition, either total amount of percentage of the total.
Logically it makes total sense to me. When partitioned it must somehow organise itself internally to treat partitions differently, otherwise there would be zero difference to just creating a C: folder and a D: folder and displaying a different desktop icon.

I would imagine the issue is that different partitions can have different file systems and so differently sized clusters. Probably, programming the drive controller software to juggle different cluster sizes within the same pages and blocks is very difficult and gives much slower performance than just assigning memory physically to each partition.
 
  • Like
Reactions: karasahin
For clarity, the SSD doesn't hunt around looking for spare single cells to write values to. I wrote something on this previously. Without repeating it all here, the SSD writes pages (sets of cells) but erases blocks (sets of pages), and it needs erased blocks to write to. That's why it has to do housekeeping and why it gets slower as it fills up.


Logically it makes total sense to me. When partitioned it must somehow organise itself internally to treat partitions differently, otherwise there would be zero difference to just creating a C: folder and a D: folder and displaying a different desktop icon.

I would imagine the issue is that different partitions can have different file systems and so differently sized clusters. Probably, programming the drive controller software to juggle different cluster sizes within the same pages and blocks is very difficult and gives much slower performance than just assigning memory physically to each partition.
Partitioning is all virtual even on HDDs with spinning platters. It's nothing like record player and records where everything recorded is continuation. HDDs have a head on each side of platter and data is divided on both sides simultaneously, part on one side and part on the other.OS doesn't command HDD where partitions are physically on disks, that's job for disk control chip and it's firmware to translate what OS wanted and organize files, folders and partitions according to it's firmware algorithms. File and partition allocation table is on a disk/platter and are read from there sending heads to exact position data is on.
SSDs are even more virtual, it's organized in a square grid each with own address and FW decides by it's algorithm where to place data. To prolong SSDs life data is placed in different transistors to equally cover all cells/transistors without set place. During it's routine maintenance data is moved around (write eqlisatin), deleted data marked for erasing is filled with zeros and marked as available for writing (Garbage collection) With modern SSDs all of that is built into FW and in Windows for instance triggered by automatic or set to certain interwas and forced Optimization is performed.
 
Partitioning is all virtual even on HDDs with spinning platters.
Actually, no.

Partitions on a HDD are in fact physical delimiters.

This is evidenced in the long abandoned practice of 'short stroking'.
 
Partitioning is all virtual even on HDDs with spinning platters...OS doesn't command HDD where partitions are physically on disks, that's job for disk control chip and it's firmware to translate what OS wanted and organize files, folders and partitions according to it's firmware algorithms. File and partition allocation table is on a disk/platter and are read from there sending heads to exact position data is on.
As said above: partitions have always been contiguous storage spaces. The table basically gives start and end points on the disk and the firmware knows if it's saving a file to drive E: it can only place its data within that assigned storage space, regardless of whether it saves the file itself in a contiguous section or not. Back in the day, partitioning a large disk could help with performance by e.g. ensuring that the entirety of C: was closer to the edge where access was slightly faster. This is also why resizing and reordering partitions is such a pain - if they were virtual it would be simple.

SSDs are even more virtual, it's organized in a square grid each with own address and FW decides by it's algorithm where to place data.
It still has to deal with partitioning since that's expected by every file system out there that I can think of. Now while it could probably 'virtualise' it by not really using a partition table but internally flagging which partition each page or block is assigned to, keeping track of how much space is left in each virtual partition, presenting an artificial table when requested and so on, it's presumably far easier just to do the same as HDDs and assign this contiguous space of memory to C:, the next to D: and so on. Unlike with HDDs though, there's no inherent difference in performance within partitions.
 
Logically it makes total sense to me. When partitioned it must somehow organise itself internally to treat partitions differently, otherwise there would be zero difference to just creating a C: folder and a D: folder and displaying a different desktop icon.
But this isn't a physical or logical thing with the drive's controller (SSD or HDD). The controller doesn't know where the partitions begin and end logically, therefore it isn't aware of them physically. The drive doesn't record any partition data in a way that the controller reads to itself. The OS records in the partitioning table where they begin and end (using LBA addresses that were reported as available to it from the drive) and then limits reads and writes to the address ranges associated with the partition. The OS doesn't tell the controller in the drive "look in partition 2 for data at this address".

The only way it really makes sense is for the blocks used as pSLC (or static SLC) to be specifically assigned in sections. So if the drive has only partial capacity for pSLC (like Samsung), then there might be sections A, B, C, and D, and if section A is filled with real (native) data, then that section's pSLC is no longer available. Other drives might allocate it all in section A (like maybe the BX500) so if section A is filled, there's simply no pSLC at all for any other writes. None of that relates to the logical partitioning.

A drive that uses the full capacity for pSLC could still break it up into sections where one section couldn't be used as cache for writes in another section.
 
As said above: partitions have always been contiguous storage spaces. The table basically gives start and end points on the disk and the firmware knows if it's saving a file to drive E: it can only place its data within that assigned storage space, regardless of whether it saves the file itself in a contiguous section or not. Back in the day, partitioning a large disk could help with performance by e.g. ensuring that the entirety of C: was closer to the edge where access was slightly faster.


It still has to deal with partitioning since that's expected by every file system out there that I can think of. Now while it could probably 'virtualise' it by not really using a partition table but internally flagging which partition each page or block is assigned to, keeping track of how much space is left in each virtual partition, presenting an artificial table when requested and so on, it's probably far easier just to do the same as HDDs and assign this contiguous space of memory to C:, the next to D: and so on. Unlike with HDDs though, there's no inherent difference in performance within partitions.

SSDs do have less "hard" physical partitioning than a mechanical drive. The controller has an internal translation table that takes the contiguous Logical Block Addressing scheme and tracks which physical blocks are assigned to a particular address. It's like reallocated sectors brought into the bigtime, always in play. That's because of thing like wear leveling and garbage collection, where data in the blocks really does get moved around constantly in order to ensure all cells get a similar number of writes over time. Data that was written to LBA# 200 may initially be written to cellblock# 200 then get moved to block 2932 if block 200 is getting modified a lot, and the controller has to keep that data in its SRAM translation cache. (I assume they use SRAM for speed and non-volatility.) Even during a normal write operation, SSDs don't write to the same physical block despite it being the same file and same LBA address. They write to a new, "erased" block, and mark the previous block as "used" until TRIM runs by the OS, while keeping the LBA address the same in the translation table. Eventually you would have a "contiguous" block of LBA addresses that is scattered across the physical drive (or perhaps within sectioned areas given my recent pSLC testing results).

SSDs also have multiple channels to the many flash chips, allowing for interleaving. As if there were multiple read/write heads actively working at the same time on an HDD. (This has been done but it's very expensive and not worth it for consumers.) On an HDD with multiple platters, only one head is active at a time, on one side of one platter, even though there may be 8 heads stacked on the actuator arm. In an SSD, all 4 or 8 channels can read or write at once, so 8 blocks could be written at once which is a huge part of the speed of an SSD. A file made of 8 blocks can be written across 8 flash chips if there are 8 channels. A file written to an HDD stays on a single track of a single platter until that track is full, then gets skipped down to the same track in the next platter (if available) until that track is filled on all the platters (because the actuator arm doesn't have to waste time physically switching tracks).

The difference is that in an SSD, there is no performance difference between reading a block at the beginning of the drive and one at the end of the drive, and no physical tracks to switch, so "short-stroking" is meaningless even if it would ensure physical locality of the data.

From my understanding, even new HAMR drives may not be as "hard" as CMR drives in this respect, since they support TRIM, though they likely keep the movements within a small range of tracks since that is a physical performance issue.
 
  • Like
Reactions: CountMike
The controller doesn't know where the partitions begin and end logically, therefore it isn't aware of them physically. The drive doesn't record any partition data in a way that the controller reads to itself.
The drive knows its partition table. I don't see why it should be impossible for the drive's controller to access that information. You said yourself, in your tests the controller behaves as if it's aware of the partitions.
 
Last edited:
The difference is that in an SSD, there is no performance difference between reading a block at the beginning of the drive and one at the end of the drive, and no physical tracks to switch, so "short-stroking" is meaningless even if it would ensure physical locality of the data.
Yes, we know all that, and the rest. I said there was no difference between SSD partitions.

Partitions are contiguous storage spaces.

The SSD's options are:
- spread data throughout the full address space, and go through various hoops to present artificial partition data, information and limitations when requested.
- divide the full address space into partitions and restrict data to the resulting partitions accordingly.

You think the first is the logical one, I think the second.

So far as I know, and so far as your tests show, the second is what what they do.

Edit: This article goes into some depth about partition alignment and how it is especially important for SSDs, again something that wouldn't be the case if partitions on SSDs were as abstract/virtual as claimed.
 
Last edited:
Yes, we know all that, and the rest. I said there was no difference. between SSD partitions.

Partitions are contiguous storage spaces.

The SSD's options are:
- spread data throughout the full address space, and go through various hoops to present artificial partition data, information and limitations when requested.
- divide the full address space into partitions and restrict data to the resulting partitions accordingly.

You think the first is the logical one, I think the second.

So far as I know, and so far as your tests show, the second is what what they do.
The third option is what I think most do, if they segment pSLC at all (because not all do): segment pSLC cache-capable areas into sections. That has nothing to do with the partition allocations, as I didn't do testing any further than 1 versus 2 partitions. If it were broken up into odd numbers like 3 or 7 it might make it look even weirded, like with partitions 1, 4 and 7 having available pSLC and the others not. Testing that on multiple drives could take a year.

I KNOW that the drive controllers "go through various hoops" because the translation from LBA addressing to physical space has been happening since LBA was introduced. That's why it's LOGICAL block addressing. (I'm not sure when sector reallocation actually became a thing.) And SSDs have had to do it much more heavily due to the use of multiple channels and particularly since wear leveling was introduced. That data MUST be tracked constantly and updated every time the wear leveling algorithm moves data from one block to another, which only happened due to bad block detection on HDDs.

There is no need for the controller to be "aware" of the locations of the partition because the partition table has that data, and the OS knows where the data it needs is located using the file table and partition table. The drive tells the OS "I have LBA addresses 0 to 999,999,999 million available" and the OS knows that means it has 512MB of space (with 512-byte sectors; and ignoring binary/decimal marketing where it would have less than 1 billion). You tell the OS to make two partitions and it records in the partition table that partition 1 is 0 to 499,999,999, and that to 999,999,999 is partition 2.

That's just a cosmetic notation for the user and applications, really. The drive has no need to know at all (and doesn't care what the partition table stored on it says) and the OS itself don't really care (*NIX treats partitions purely as part of the folder structure as far as a user is concerned, unlike Windows "drive" letters that make it look like another disk), because the OS still has to track the full billion LBA addresses for the entire drive in the file table and the controller obviously still has to know where those addresses are physically. Many of the purposes of partitioning have been largely lost; even Linux just uses one big partition these days for what used to be separated into several, and issues with partition size limits have been eliminated.

Whether mechanical or SSD, you ask for File X, and the OS still has to know that its start address is 612,983,486 (I'm just assuming 512-byte allocation units for ease so I don't have to calculate 4K chunk locations) and ask the drive for that address's data plus however many complete the file (including possibly jumping to other remote addresses). It doesn't ask the controller to first move to partition 2 then look for 112,983,487 within that partition. Controllers don't even really know where file data is or isn't located; on an SSD it just knows where data ISN'T (i.e., an erased block, or a block that is available for erasure after TRIM, or the garbage collection algorithm which isn't the same as knowing the file system data).

If the controller were aware of partition tables it would need to be made aware of every possible partition style. Partition tables are stored data, and the controller is data-agnostic. You can write anything you want to the drive, even if it's not a partition table, or make up your own new partition scheme that nobody else understands. It can be raw data that only your application can read and everyone else thinks is random with no partitions. MBR and GPT are the current common ones but others have existed and could be used on the drives, and the manufacturers would not be expected to make their controllers aware of them all. If you could make an old Macintosh II capable of seeing an NVMe drive you could use the Apple Partition Map and it would work exactly the same (up to 2TB).
 
If this were true, and partition tables are "just a cosmetic notation for the user and applications, really", explain how incorrect partition alignment reduces SSD performance.
Because SSDs use a different physical design from 512-byte sector mechanical drives and thus require the partitions to begin and end at physical boundaries, but that's completely unrelated to the controller being "aware" of the partition. The controller is aware of where each sector is located. No different from a 4KB sector drive in 512e mode that also has to be properly aligned so that the partition doesn't begin in the middle of a 4-kilobyte block. In both cases, without alignment, the drive has to read then rewrite two whole physical blocks every time a logical sector is written because part of that sector is stored in each block. Alignment just ensures the controller is not amplifying the number of writes enormously and wearing out the drive, AND improves performance because you don't have to wait for those extra writes.

Irrelevant to the issue but the fact that partitions are being used at all is still mostly just cosmetic now, a convenient way for arranging data at a level higher than folders, meant to emulate the older days when you had actual physically separate and very low-capacity drives so "partition" size limits weren't relevant, but then later partition size limitations were outpaced by physical capacity, but no longer really necessary.

The only functionally required partitioning is really having an EFI System Partition and separate OS partitions, since the ESP can be used for multiple OSes on the same system and would thus not be good to have merged with any single one of their OS partitions.
 
Right, I think I understand where you're coming from, and I agree now that you're right about the various hoops, so I'll explain my understanding now of how it works, and why I still think it makes sense for a near-full partition on a multi-partitioned SSD to behave like a near-full SSD.

I don't really want to fill this up with links, so this is about the best single reference that includes most of what I'm trying to say.

An SSD's unit of write is the page, and the unit of erasure is the block. Pages and blocks themselves are physical fixtures within the SSD. A block contains typically 2^7 - 2^9 pages. A page ready for writing to is marked free. When data is written to a free page the page is marked valid. Pages cannot physically be overwritten so if the data in a page is changed, the new data is written to a free page elsewhere. The original page is marked invalid and cannot be re-used until the whole block is erased. If there are pages in the block that are valid, the data inside them has to be copied over to free pages in other blocks before the block can be erased. When the block is erased, all pages are now marked free. GC performs this process regularly to maintain a sufficient number of free pages. The more GC needed the more the performance hit. (TRIM deals with LBAs where the data is deleted but not yet overwritten but I'll use GC as a catch all.)

An LBA is what the OS uses to refer to the location of the data. Each LBA points towards one page. The SSD maintains an internal L2P table so it knows which page each LBA points to. If the data gets moved to a different page, the L2P is updated to reflect this. This comes at a cost.

Writing to an LBA for the first time has the least impact on performance: the data is written and that's it. Overwriting an LBA has a bigger impact: a free page must be found, the data written, the old page marked invalid, the L2P updated. There's also a significant future cost because at some point the block containing the old page will need to be erased to allow the invalid page to be used, usually involving moves of valid pages and the consequent updating of the L2P.

When a drive is partitioned, each partition is assigned a range of LBAs, and at the beginning all those LBAs point to free pages and there are no invalid pages anywhere on the SSD. The first write to a an unused LBA is quick. When overwriting the data in a LBA the costs above creep in. If an LBA is overwritten ten times, it leaves behind nine invalid pages; these may be spread among as many as nine blocks which will now all require, at some point, having their valid pages moved and an erasure cycle applied before those pages can be reclaimed.

Therefore, even though they can be described as abstract, how often individual LBAs are overwritten has a direct impact on the performance of the SSD. (This is partially discussed in the paper in reference to dealing with hot/cold LBAs.) In an SSD with a single partition that's 45% full, there are a lot of 'free space' LBAs and data writes can be spread among them, limiting the amount over overwriting required. If the SSD is split into two equal partitions, one empty and one 90% full, the total amount of valid pages will be equal to the single-partition drive. However with only 10% of the LBAs to choose from in the near-full partition (since a write to a partition has to be a write to LBAs within that partition), overwrites happen far more, spreading invalid pages throughout the drive and increasing the amount of GC work that needs doing. (Finding free pages will be easier than in the 90% full single-partition SSD, true, but it's not clear that it's an especially significant contributor to write amplification.)


So I was mistaken in thinking that the SSD takes into account the partition table when forming the L2P, which I thought from partition alignment being an issue - it seems it's almost the other way around, in that it would have to take the table into account to avoid alignment being an issue, but the L2P would be too complex then. But although the SSD doesn't directly 'care' about partitions, it turns out it indirectly cares because a near-full partition can lead to the same LBAs getting repeatedly hammered, making a lot more work for the drive despite all the other free space and slowing down performance.
 
But although the SSD doesn't directly 'care' about partitions, it turns out it indirectly cares because a near-full partition can lead to the same LBAs getting repeatedly hammered, making a lot more work for the drive despite all the other free space and slowing down performance.
True, but the controller is also perfectly willing to do that if the OS asks it to, and still isn't "aware" of the partitioning. However during idle time, when the wear-leveling algorithm runs, it can take data out of the currently-used blocks and move it into those heavily-used "free" blocks, making those other less-used blocks the new free ones. It should be doing that as much as possible, at every chance, constantly moving data from blocks that have more usage into ones that have less. And it can allocate those blocks anywhere on the drive to any LBA address, so eventually they could get spread around physically.

When you first partition the drive, or after its been erased and had time for garbage collection and all that, the mapping between LBA and physical addresses will be pretty close to even, like a mechanical drive. LBA 1 to 200 will also be the first 200 physical addresses in the SSD, etc. As it gets used and the logical/physical mappings are changed for wear leveling and TRIM/garbage collection and reducing write amplification, LBA 100 might end up being physical block 23543 which was originally part of partition 2 but is now in partition 1, logically. That would probably affect the results of any "sectioning" of pSLC cache that I found, causing the performance characteristics to change even further over time and to vary wildly from user to user, becoming something that can't be reliably measured and applied to everyone. (Maybe backing up and wiping a drive then restoring is something that ought to get done once in a while, to replace the need for re-installing Windows every 6 months that we used to do.)