Question Samsung MZ-VL81T00 M.2 SSD write speed ?

Which Zenbook specifically? Not all laptops use x4 connections, although Asus probably does, and that looks like a PCIe3 read speed but OEM models often don't give the performance expected based on the interface. Is this a brand-new drive? I assume not since it appears to be nearly full.

Nearly all SSDs will be much slower at writes when they are near full. This is because they use features like pseudo-SLC cache to "hide" the slow performance of TLC or QLC's native write speeds. This being an OEM drive, it may not have the fastest TLC flash to start with. It looks like you may have partitioned a 1TB drive into smaller volumes. If it is actually the 1TB model and you just partitioned it, that could be limiting the cache. I did some testing recently that seems to show that on some drives, partitioning the storage space can also have the result of partitioning the available pSLC cache. So even if the drive ought to have say 100GB of cache, you may have limited it to 25GB for the C drive when writing to that particular partition since it's 1/4 of the drive. (I can't find any details about this model's actual cache structure but 100 to 200 would be normal for this size in the retail versions.)

Samsung in particular often uses some rather small pSLC cache sizes, which would get even smaller as the drive gets closer to full. With TLC as used in this drive, the pSLC size is no more than 1/3 of the free space (meaning you may only have 10GB available to the C drive right now), but it may be even less with Samsung. In addition, if you have been performing a lot of writes, especially benchmarking, the cache may not have had time to recover even to be able to use that small amount, so you end up at native speeds or even slower as the drive tries to recover the cache at the same time that data is being written. Some drives take a long time to recover the cache (by writing data to native TLC in the background).

Other functions that require free space include TRIM/garbage collection. If those haven't had a chance to run, and you don't have enough free space that has already been erased by those tasks, the drive ends up having to spend additional time erasing blocks before writing new data to them, which would definitely result in benchmarks looking terrible. We can't tell what the other partitions look like and how much free space actually exists through the whole drive.

The only way to test the real max speed is to have enough free space on the drive (at least 20% but even more is better if you do a lot of writing) and ensuring that TRIM has run and the system has been idle long enough for garbage collection to have occurred. (This is called optimization in the Windows defrag dialog.) Having a single partition that covers the entire drive also ensures that the full pSLC cache size will be available, and idle time assures that it has fully recovered.

If you did partition a 1TB drive, have you benchmarked the other partitions?
 
Which Zenbook specifically? Not all laptops use x4 connections, although Asus probably does, and that looks like a PCIe3 read speed but OEM models often don't give the performance expected based on the interface. Is this a brand-new drive? I assume not since it appears to be nearly full.

Nearly all SSDs will be much slower at writes when they are near full. This is because they use features like pseudo-SLC cache to "hide" the slow performance of TLC or QLC's native write speeds. This being an OEM drive, it may not have the fastest TLC flash to start with. It looks like you may have partitioned a 1TB drive into smaller volumes. If it is actually the 1TB model and you just partitioned it, that could be limiting the cache. I did some testing recently that seems to show that on some drives, partitioning the storage space can also have the result of partitioning the available pSLC cache. So even if the drive ought to have say 100GB of cache, you may have limited it to 25GB for the C drive when writing to that particular partition since it's 1/4 of the drive. (I can't find any details about this model's actual cache structure but 100 to 200 would be normal for this size in the retail versions.)

Samsung in particular often uses some rather small pSLC cache sizes, which would get even smaller as the drive gets closer to full. With TLC as used in this drive, the pSLC size is no more than 1/3 of the free space (meaning you may only have 10GB available to the C drive right now), but it may be even less with Samsung. In addition, if you have been performing a lot of writes, especially benchmarking, the cache may not have had time to recover even to be able to use that small amount, so you end up at native speeds or even slower as the drive tries to recover the cache at the same time that data is being written. Some drives take a long time to recover the cache (by writing data to native TLC in the background).

Other functions that require free space include TRIM/garbage collection. If those haven't had a chance to run, and you don't have enough free space that has already been erased by those tasks, the drive ends up having to spend additional time erasing blocks before writing new data to them, which would definitely result in benchmarks looking terrible. We can't tell what the other partitions look like and how much free space actually exists through the whole drive.

The only way to test the real max speed is to have enough free space on the drive (at least 20% but even more is better if you do a lot of writing) and ensuring that TRIM has run and the system has been idle long enough for garbage collection to have occurred. (This is called optimization in the Windows defrag dialog.) Having a single partition that covers the entire drive also ensures that the full pSLC cache size will be available, and idle time assures that it has fully recovered.

If you did partition a 1TB drive, have you benchmarked the other partitions?
Thank you very much for your response.
UX3405 - 1TB PCIe® 4.0 SSD
It is not a brand new device. I have it since late July 2024.
C: (5% free), D (9% free), B (only 100GB, 1% free). I have not tested D but I guess it wouldn't be good enoguh as it didn't have 20% free.

Are you referring to DRAM cache? if so, how we we know it has one? the newer drive I bought (990 evo plus) does not have one but provides better results.

"...ensuring that TRIM has run and the system has been idle long enough for garbage collection to have occurred" - how can you check that?

Do I cut my performance if I partition my newer NVME? As of now, I have allocated 300GB for Win11 (C:) and 3423GB for D:?
https://ibb.co/N6hw5gSB

Once again, thanks a lot!
 
C: (5% free), D (9% free), B (only 100GB, 1% free). I have not tested D but I guess it wouldn't be good enoguh as it didn't have 20% free.

Are you referring to DRAM cache? if so, how we we know it has one? the newer drive I bought (990 evo plus) does not have one but provides better results.

"...ensuring that TRIM has run and the system has been idle long enough for garbage collection to have occurred" - how can you check that?

Do I cut my performance if I partition my newer NVME? As of now, I have allocated 300GB for Win11 (C:) and 3423GB for D:?
You definitely don't have enough free space, and that's why it's slow. There isn't enough space for it to use any as pSLC cache or properly manage the flash cells for wear-leveling or running TRIM.

If you open the Windows "defrag" dialog it will show whether the drives have been recently optimized. With so little available space, you might even want to set it to run every day on all the drive letters. SSDs have to "erase" a cell before new data can be written, and TRIM signals to the drive to erase cells that are currently unused but recently had data in them. If TRIM doesn't run, then when you try to write data, the SSD has to take time to erase a cell before it can write the new data. If you have plenty of free space, you're less likely to run out of ready cells before the next time TRIM runs. If you don't have much space, then TRIM doesn't get a chance to run before you run out of ready cells and the drive takes 2 to 3 times as long to write every block of data. Doing it every day can mitigate this somewhat, but you would definitely be noticing poor write performance. (Whether you can see that during normal usage depends on how you use the system. Most people do much more reading than writing, and Windows caches so much stuff you don't realize that the drive is taking a long time.)

The DRAM cache is a separate thing. That can help with some functions, but most users don't NEED it unless they're trying to get every bit of performance they can and are willing to pay extra for it. The pSLC cache is when the SSD treats a chunk of TLC or QLC as if it was SLC (one bit of data per cell instead of 3 or 4). The controller can write data MUCH faster to SLC than it can to TLC/QLC, which is how drives can advertise that they have 10,000MBps speeds. In reality, that only applies when they're writing to the SLC cache. When that runs out, they have to write at the native speed of the TLC/QLC which could be 10 times slower or worse.

But if you use a block as SLC, you can only write 1 bit instead of 3/4 bits per cells, so the storage capacity is reduced. When it has a chance, the drive moves the data in the SLC section to the native TLC/QLC section in order to free up space to be used as SLC. Where you have only 30GB free though, the SLC runs out after writing 10GB of data, and the drive has to start changing those blocks to TLC while you're still trying to write new data. That means the new data has to sit there waiting while the drive makes some space available, and then the drive can only write it at native TLC speed. So the total write time is the combination of the time it was waiting and the time it takes to write at TLC speed, and your benchmark shows it. With really good TLC the native speed can be 1GBps or more, but if the drive is "folding" from pSLC to TLC at the same time you're writing new data, the effective speed can be 300MBps or even less, as you are seeing.

Some drives use the entire capacity as pSLC, so a 1TB TLC drive would have 333GB or so of SLC storage that could be written as maximum speed continuously. Some brands only use part of the capacity for pSLC, so it runs out faster. Samsung and some others may only give you 200GB on a 1TB drive for example. (They also may include a small chunk, like 6GB, that is always SLC and never converts to TLC, to ensure that there will always be a chunk that can be written at full speed.) There are tradeoffs, pros and cons, for doing it both ways.

Unfortunately without testing every model, there's no way to know which ones might segment the pSLC cache in the way that I saw on some drives. I only tested a handful. What happened there is that if I partitioned a drive into two volumes, each one might only have access to half of the cache even if there was plenty of empty space overall. Or the first partition might have access to all the cache, but the second partition might only get half. In your case, your 300GB C drive which is about 8% of the total space might only get 8% of the cache. That drive has 432GB of dynamic cache (plus 10GB static) so your C drive might only get 35GB. Or it might have access to all of it. And the D drive might get all of it or only 92%. Without testing that drive (which could take several hours due to having to give it idle time between tests) I don't know. I tested two different Samsung drives and they behaved differently from one another. You probably don't have a lot of data filling that 990 Evo Plus yet, so there would be little to cause you to run out of pSLC during benchmarks unless you actively changed the settings to result in massive amounts of data being written. Just using a single partition is the only way to be sure you're getting the full performance as intended and marketed by the manufacturer.

What is your reason for partitioning the drive? For the great majority of users there's really no reason to do it these days. The old reasons like being able to quickly reinstall the OS without touching data were only somewhat valid in the first place and are even less so now. (Windows 10/11 doesn't even require you to "reinstall" since it can reset itself and leaves all your data in place.) I partition mine solely because I'm used to doing it that way after 30 years of setting up PCs and find it easier to locate data without drilling down through folders. But if the drive dies, that data is gone along with the OS and that's why I run backups. But I'm considering not even bothering with partitions when I'm finally forced to move to Windows 11.
 
You definitely don't have enough free space, and that's why it's slow. There isn't enough space for it to use any as pSLC cache or properly manage the flash cells for wear-leveling or running TRIM.

If you open the Windows "defrag" dialog it will show whether the drives have been recently optimized. With so little available space, you might even want to set it to run every day on all the drive letters. SSDs have to "erase" a cell before new data can be written, and TRIM signals to the drive to erase cells that are currently unused but recently had data in them. If TRIM doesn't run, then when you try to write data, the SSD has to take time to erase a cell before it can write the new data. If you have plenty of free space, you're less likely to run out of ready cells before the next time TRIM runs. If you don't have much space, then TRIM doesn't get a chance to run before you run out of ready cells and the drive takes 2 to 3 times as long to write every block of data. Doing it every day can mitigate this somewhat, but you would definitely be noticing poor write performance. (Whether you can see that during normal usage depends on how you use the system. Most people do much more reading than writing, and Windows caches so much stuff you don't realize that the drive is taking a long time.)

The DRAM cache is a separate thing. That can help with some functions, but most users don't NEED it unless they're trying to get every bit of performance they can and are willing to pay extra for it. The pSLC cache is when the SSD treats a chunk of TLC or QLC as if it was SLC (one bit of data per cell instead of 3 or 4). The controller can write data MUCH faster to SLC than it can to TLC/QLC, which is how drives can advertise that they have 10,000MBps speeds. In reality, that only applies when they're writing to the SLC cache. When that runs out, they have to write at the native speed of the TLC/QLC which could be 10 times slower or worse.

But if you use a block as SLC, you can only write 1 bit instead of 3/4 bits per cells, so the storage capacity is reduced. When it has a chance, the drive moves the data in the SLC section to the native TLC/QLC section in order to free up space to be used as SLC. Where you have only 30GB free though, the SLC runs out after writing 10GB of data, and the drive has to start changing those blocks to TLC while you're still trying to write new data. That means the new data has to sit there waiting while the drive makes some space available, and then the drive can only write it at native TLC speed. So the total write time is the combination of the time it was waiting and the time it takes to write at TLC speed, and your benchmark shows it. With really good TLC the native speed can be 1GBps or more, but if the drive is "folding" from pSLC to TLC at the same time you're writing new data, the effective speed can be 300MBps or even less, as you are seeing.

Some drives use the entire capacity as pSLC, so a 1TB TLC drive would have 333GB or so of SLC storage that could be written as maximum speed continuously. Some brands only use part of the capacity for pSLC, so it runs out faster. Samsung and some others may only give you 200GB on a 1TB drive for example. (They also may include a small chunk, like 6GB, that is always SLC and never converts to TLC, to ensure that there will always be a chunk that can be written at full speed.) There are tradeoffs, pros and cons, for doing it both ways.

Unfortunately without testing every model, there's no way to know which ones might segment the pSLC cache in the way that I saw on some drives. I only tested a handful. What happened there is that if I partitioned a drive into two volumes, each one might only have access to half of the cache even if there was plenty of empty space overall. Or the first partition might have access to all the cache, but the second partition might only get half. In your case, your 300GB C drive which is about 8% of the total space might only get 8% of the cache. That drive has 432GB of dynamic cache (plus 10GB static) so your C drive might only get 35GB. Or it might have access to all of it. And the D drive might get all of it or only 92%. Without testing that drive (which could take several hours due to having to give it idle time between tests) I don't know. I tested two different Samsung drives and they behaved differently from one another. You probably don't have a lot of data filling that 990 Evo Plus yet, so there would be little to cause you to run out of pSLC during benchmarks unless you actively changed the settings to result in massive amounts of data being written. Just using a single partition is the only way to be sure you're getting the full performance as intended and marketed by the manufacturer.

What is your reason for partitioning the drive? For the great majority of users there's really no reason to do it these days. The old reasons like being able to quickly reinstall the OS without touching data were only somewhat valid in the first place and are even less so now. (Windows 10/11 doesn't even require you to "reinstall" since it can reset itself and leaves all your data in place.) I partition mine solely because I'm used to doing it that way after 30 years of setting up PCs and find it easier to locate data without drilling down through folders. But if the drive dies, that data is gone along with the OS and that's why I run backups. But I'm considering not even bothering with partitions when I'm finally forced to move to Windows 11.

Thank you for the detailed response — I really appreciate it!

I'm personally used to separating the OS from personal data, and I find that having a single partition can make backups more difficult to manage. For example, if I'm using AOMEI Backupper for daily system backups, even if I keep all my personal files in C:\Personal, I can't exclude that folder from the backup. That makes the backup unnecessarily large and less efficient. What do you think?

Also, you mentioned that the test "could take hours" — could you elaborate on how I should perform that?

Regarding Samsung: do they officially recommend using a single partition? I couldn’t find any reference for that in their documentation.

And lastly, just to clarify — over-provisioning in Samsung Magician is essentially the same as manually leaving free space on the drive, right?
 
That makes the backup unnecessarily large and less efficient. What do you think?

Also, you mentioned that the test "could take hours" — could you elaborate on how I should perform that?

Regarding Samsung: do they officially recommend using a single partition? I couldn’t find any reference for that in their documentation.

And lastly, just to clarify — over-provisioning in Samsung Magician is essentially the same as manually leaving free space on the drive, right?
If you aren't backing up your personal data files, what's the point of doing backups? Unless you mean stuff you could easily replace like downloaded or ripped movies. A proper backup would include everything on the system and be run using incrementals or differentials, so only changed data gets backed up. The initial backup is large but later backups are relatively small.

I don't have a huge amount of data that needs to be backed up, so a 1TB backup drive works for me. I do a full backup every week, then incrementals daily. It uses more space than longer intervals but I prefer the idea of only a one-week chain of incrementals (which are all needed for a restore) and then having a full backup again if I go further back so all those incrementals don't have to load.

I will say that yes, partitioning does make sense if you have a lot of data that doesn't need to be backed up. I've always tended to just use smaller drives for my primary data and then a completely different, larger and slower drive for that less important data. I only went with a 1TB primary drive because that was the smallest drive in the model I wanted for the performance, and in a lot of models going down to a 500GB drive meant a significant performance drop. (Smaller drives in an SSD model family are always slower.) Prior to that my C and D drives were separate drives, and now they're both on the same SSD. I use a 2TB, slower and cheaper drive for stuff I don't care about losing, and don't back that up. But I never had a chance to test out the performance implications of partitioning as related to SLC cache since it hadn't occurred to me then. I did all the testing on random drives I had laying around, plus a couple that I bought mostly for the testing but also had something I could use them for afterward.

Drive makers don't provide information about such low-level details of the way the cache works exactly, and they don't make recommendations about partitioning because they'd have to explain why, and go into every possible combination people might want to use, and provide support for people who have trouble and think it might be the partitioning.

Over-provisioning does provide some guaranteed pSLC cache space, but it's not the primary use, and in any case if you don't know how the drive allocates the pSLC cache amongst partitions you can't know how it would use that empty space for it, either. Would it use the space as cache for the first partition only, or make it available to all partitions, or split it evenly? The primary use of OP space is flash management, where the controller is guaranteed to have unused blocks that could be used for wear-leveling and other functions designed to make the drive last as long as possible without performance loss. And it does serve somewhat the same function as just not filling up a partition, because it helps to ensure locks free for writing without having to erase them first (assuming background garbage collection has run, as TRIM in the OS can't run on unallocated space). It's largely a way of guaranteeing that the user doesn't accidentally fill up the drive, because that space isn't allocated to a partition. If you can't see that you have 50GB of free space, you can't ignore the implications and fill it up. All SSDs have a small amount of unallocated (overprovisioned) space that is completely inaccessible to the OS, but it's not enough for very long-term performance and longevity to be ensured. Some models that are designed for write-intensive use have extra overprovisioned and unavailable to the OS because the heavy write load will wear out the flash faster.

If you are able to control your usage and just always make sure you never fill up the drive past 90%, you don't absolutely NEED to overprovision that space and leave it unallocated. But overprovisioning means you just don't even have to think about whether you're getting too close to full; you can completely fill up the partitions because you've locked out that additional space. (Although filling them up would affect the pSLC cache availability, regardless of how it works on a particular drive.) And yes, the manufacturer tools like Magician are just resizing the last partition to leave the OP space unallocated. Since Windows now makes the Windows Recovery Environment the last partition on the drive by default, Magician might not be able to configure OP because there's nowhere to take space from. Or if you turned off OP in Magician, it might add all that space to the WinRE partition where it would never get used.

As far as testing the pSLC cache, it's not something you can do on a drive that is already in use. It involves using IOMeter to run tests on the drive, filling it up with test data at maximum speed. When the speed drops, you add up how much data was written at full speed and that's the pSLC cache capacity. You can also let it run for a longer time and see if the speed drops further or jumps back up to a middle-ground area. The lower speed is when it's "folding" from pSLC to TLC at the same time as new data is coming in, and the middle speed is the native flash speed. I was running each test for 30 minutes on a decent 1TB drive, which ensured I could see the behavior patterns, and it took 4 or 5 test runs using different combinations of partitions and having them either empty or full, and after each test run I forced TRIM to be applied and then let the drive idle for another 20 to 30 minutes before doing another test. Plus I ran additional tests without idle time or without running TRIM, to see how quickly the pSLC cache could recover and what effect there was from not having enough erased blocks. So that one drive probably took a total of 6 to 8 hours (and I wouldn't use the machine while a test was running and sometimes while idling; I went to do something else, or took a nap). Your drive is faster, but 4 times the size. However the pSLC cache isn't that much larger since Samsung doesn't use the whole capacity. But you'd have to run just as many tests to see what effect partitioning has on the way the cache is allocated, if any, so you'd be looking at about the same amount of time probably.

There are other tools that can test the pSLC I think, but I don't know how they're used and they presumably would also need to be run on a blank drive. It's possible that you could modify the IOMeter test to run it even with the existing data, but I don't think it would give valid results with the partition sizes you've created, since your C drive is smaller than the size of the cache for that drive.

As for contacting Samsung, HAHAHAA. You think a modern company wants to hear from a user? Or answer obscure technical questions just because you're interested? Yes there are ways to contact them, but good luck getting to the one that is actually who you need. (The Service Center number is easy to get, but that's for tech support for problems, not providing tech specs.)
 
My personal information is backed up to BackBlaze (they do not back up your OS unless its an image file).
Regarding iometer - would it be OK if I delete D (will clone it first) then extend C to use all space? C used space will be 162GB, the rest will be free space.

How can I solve this firmware issue? it does not let me upgrade it which is extremely annoying.
 
Oh I didn't even realize there was a firmware issue. You didn't mention it. Is there an error when you try? I had an issue with a WD drive where their Dashboard would fail trying to update an SN580 (no error message, just wouldn't update), but then it just worked when I tried it again a week later. Could be something simply like a restart needed. (SSDs can be "locked" when in use that prevents certain functions like changing from 512e sectors to 4kN, or performing a Secure Erase. Restarting can unlock them.)

Yes, if you just change the partitioning that will be the same as having just made one partition in the first place. Just know that if you were to use IOMeter, the way it tests is to fill ALL free space on the partition with a test file, which takes a really long time in and of itself because it actually writes every bit. You can manually create a test file that is quickly created and is an arbitrary size that is big enough for the cache to be used up, using another tool. But if you're willing to just use a single partition anyway, just do it and be done, no testing.
 
Oh I didn't even realize there was a firmware issue. You didn't mention it. Is there an error when you try? I had an issue with a WD drive where their Dashboard would fail trying to update an SN580 (no error message, just wouldn't update), but then it just worked when I tried it again a week later. Could be something simply like a restart needed. (SSDs can be "locked" when in use that prevents certain functions like changing from 512e sectors to 4kN, or performing a Secure Erase. Restarting can unlock them.)

Yes, if you just change the partitioning that will be the same as having just made one partition in the first place. Just know that if you were to use IOMeter, the way it tests is to fill ALL free space on the partition with a test file, which takes a really long time in and of itself because it actually writes every bit. You can manually create a test file that is quickly created and is an arbitrary size that is big enough for the cache to be used up, using another tool. But if you're willing to just use a single partition anyway, just do it and be done, no testing.
Not an issue, the update just does not work (for the newer 4TB drive).
Many people complained online. Already tried to reboot... Looks like I must get a disk on key and do that offline.

I prefer to test it... How do you suggest doing it?
C (OS) consumes 158 GB. Should I delete all partitions and load from disk on key?
Or I can test with OS installed? I guess I need to test with two partitions and then with one partition?
 
IOMeter runs under Windows. It has a GUI, but to test this you need to use the command line to execute it so that it generates a full results file. (I got the correct stuff from another post after asking one of the Tom's writers, since their articles mention running the tests but they didn't make a "how we test" article about it.)

Download and extract IOMeter to a folder, and use that for holding everything else and for running the commands. You can also download testfilecreator to generate the test files instead of letting iometer do it. https://github.com/oberstet/scratchbox/blob/master/docs/iometer/Measuring IOPS using IOMeter.md

Create a text file containing all of this, and name the file WS-1hr.icf.

Code:
Version 1.1.0
'TEST SETUP ====================================================================
'Test Description
    Write Saturation
'Run Time
'    hours      minutes    seconds
    1          0         30
'Ramp Up Time (s)
    0
'Default Disk Workers to Spawn
    NUMBER_OF_CPUS
'Default Network Workers to Spawn
    0
'Record Results
    ALL
'Worker Cycling
'    start      step       step type
    1          1          LINEAR
'Disk Cycling
'    start      step       step type
    1          1          LINEAR
'Queue Depth Cycling
'    start      end        step       step type
    1          32         2          EXPONENTIAL
'Test Type
    NORMAL
'END test setup
'RESULTS DISPLAY ===============================================================
'Record Last Update Results,Update Frequency,Update Type
    ENABLED,1,LAST_UPDATE
'Bar chart 1 statistic
    Total I/Os per Second
'Bar chart 2 statistic
    Total MBs per Second (Decimal)
'Bar chart 3 statistic
    Average I/O Response Time (ms)
'Bar chart 4 statistic
    Maximum I/O Response Time (ms)
'Bar chart 5 statistic
    % CPU Utilization (total)
'Bar chart 6 statistic
    Total Error Count
'END results display
'ACCESS SPECIFICATIONS =========================================================
'Access specification name,default assignment
    Default,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    2048,100,67,100,0,1,2048,0
'Access specification name,default assignment
    512 B; 100% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    512,100,100,0,0,1,0,0
'Access specification name,default assignment
    512 B; 75% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    512,100,75,0,0,1,0,0
'Access specification name,default assignment
    512 B; 50% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    512,100,50,0,0,1,0,0
'Access specification name,default assignment
    512 B; 25% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    512,100,25,0,0,1,0,0
'Access specification name,default assignment
    512 B; 0% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    512,100,0,0,0,1,0,0
'Access specification name,default assignment
    4 KiB; 100% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    4096,100,100,0,0,1,0,0
'Access specification name,default assignment
    4 KiB; 75% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    4096,100,75,0,0,1,0,0
'Access specification name,default assignment
    4 KiB; 50% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    4096,100,50,0,0,1,0,0
'Access specification name,default assignment
    4 KiB; 25% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    4096,100,25,0,0,1,0,0
'Access specification name,default assignment
    4 KiB; 0% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    4096,100,0,0,0,1,0,0
'Access specification name,default assignment
    4 KiB aligned; 100% Read; 100% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    4096,100,100,100,0,1,4096,0
'Access specification name,default assignment
    4 KiB aligned; 50% Read; 100% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    4096,100,50,100,0,1,4096,0
'Access specification name,default assignment
    4 KiB aligned; 0% Read; 100% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    4096,100,0,100,0,1,4096,0
'Access specification name,default assignment
    16 KiB; 100% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    16384,100,100,0,0,1,0,0
'Access specification name,default assignment
    16 KiB; 75% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    16384,100,75,0,0,1,0,0
'Access specification name,default assignment
    16 KiB; 50% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    16384,100,50,0,0,1,0,0
'Access specification name,default assignment
    16 KiB; 25% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    16384,100,25,0,0,1,0,0
'Access specification name,default assignment
    16 KiB; 0% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    16384,100,0,0,0,1,0,0
'Access specification name,default assignment
    32 KiB; 100% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    32768,100,100,0,0,1,0,0
'Access specification name,default assignment
    32 KiB; 75% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    32768,100,75,0,0,1,0,0
'Access specification name,default assignment
    32 KiB; 50% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    32768,100,50,0,0,1,0,0
'Access specification name,default assignment
    32 KiB; 25% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    32768,100,25,0,0,1,0,0
'Access specification name,default assignment
    32 KiB; 0% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    32768,100,0,0,0,1,0,0
'Access specification name,default assignment
    64 KiB; 100% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    65536,100,100,0,0,1,0,0
'Access specification name,default assignment
    64 KiB; 50% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    65536,100,50,0,0,1,0,0
'Access specification name,default assignment
    64 KiB; 0% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    65536,100,0,0,0,1,0,0
'Access specification name,default assignment
    256 KiB; 100% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    262144,100,100,0,0,1,0,0
'Access specification name,default assignment
    256 KiB; 50% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    262144,100,50,0,0,1,0,0
'Access specification name,default assignment
    256 KiB; 0% Read; 0% random,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    262144,100,0,0,0,1,0,0
'Access specification name,default assignment
    All in one,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    512,4,100,0,0,1,0,0
    512,4,75,0,0,1,0,0
    512,4,50,0,0,1,0,0
    512,4,25,0,0,1,0,0
    512,4,0,0,0,1,0,0
    4096,4,100,0,0,1,0,0
    4096,4,75,0,0,1,0,0
    4096,4,50,0,0,1,0,0
    4096,4,25,0,0,1,0,0
    4096,4,0,0,0,1,0,0
    4096,4,100,100,0,1,4096,0
    4096,4,50,100,0,1,4096,0
    4096,4,0,100,0,1,4096,0
    16384,3,100,0,0,1,0,0
    16384,3,75,0,0,1,0,0
    16384,3,50,0,0,1,0,0
    16384,3,25,0,0,1,0,0
    16384,3,0,0,0,1,0,0
    32768,3,100,0,0,1,0,0
    32768,3,75,0,0,1,0,0
    32768,3,50,0,0,1,0,0
    32768,3,25,0,0,1,0,0
    32768,3,0,0,0,1,0,0
    65536,3,100,0,0,1,0,0
    65536,3,50,0,0,1,0,0
    65536,3,0,0,0,1,0,0
    262144,3,100,0,0,1,0,0
    262144,3,50,0,0,1,0,0
    262144,3,0,0,0,1,0,0
'Access specification name,default assignment
    Fill,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    4096,100,0,30,0,1,4096,0
'Access specification name,default assignment
    8020 fill,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    4096,100,0,20,0,1,4096,0
'Access specification name,default assignment
    Write Saturation,NONE
'size,% of size,% reads,% random,delay,burst,align,reply
    1048576,100,0,0,0,1,1048576,0
'END access specifications
'MANAGER LIST ==================================================================
'Manager ID, manager name
    1,X570
'Manager network address
  
'Worker
    Worker 1
'Worker type
    DISK
'Default target settings for worker
'Number of outstanding IOs,test connection rate,transactions per connection,use fixed seed,fixed seed value
    32,DISABLED,1,DISABLED,0
'Disk maximum size,starting sector,Data pattern
    0,0,2
'End default target settings for worker
'Assigned access specs
    Write Saturation
'End assigned access specs
'Target assignments
'Target
    0: ""
'Target type
    DISK
'End target
'End target assignments
'End worker
'End manager
'END manager list
Version 1.1.0

Since you're testing a partition rather than a blank drive, you need to edit the file to specify the "Target" parameter to indicate the drive letter, which is O in the content above. Don't remove any of the quotes. (If you test a blank drive you have to run the GUI to see what drive number IOMeter sees.)

That file sets it to run a 1 hour test (the Run Time parameter), which will probably be plenty to see the pSLC cache behavior and size as well as to be able to tell how it behaves after the cache runs out. Even a 30 minute test would probably be okay, since we already know what size the cache ought to be and I know from my tests that it's enough time to exhaust the cache. If it was a 4TB TLC drive that might use the full capacity for cache, then even 1 hour might not be enough to exhaust the cache depending on the speed of the drive. (You can calculate an estimate by dividing the potential cache size by the known max write speed of the drive, then pad that a bit.)

On the command line, switch to the folder where you've got the iometer and testfilecreator executables. Generate a test file of sufficient size. For this drive, 500GB ought to be enough, so
Code:
TestFileCreator.exe 500G iobw.tst
would do it. You may need to specify the full path for the file. If you don't create this file and run the test, IOMeter will generate a test file itself that will be the full size of all the free space on the partition, which means it will take whatever amount of time is required to write nearly 4TB of data (18 minutes if it was at 4GBps) before the test even starts. If you are testing a single partition the full size of the drive, you'd use a much larger test file, but that tool still only takes a few seconds to create it.

You would of course need to make sure the partition is large enough for that file. You could run it on the current 300GB C partition, and just create a test file that fits within the free space. 100G is often recommended as a good size, even on a large drive. If it is able to use the full pSLC cache despite being a smaller partition, then the results should show that the speed continues to be at maximum even after it has tested that file, as it just continues to rewrite the file until the timer runs out. I chose to use maximum file sizes and fill up the partitions to be certain the tests were valid, and since they were blank drives anyway.

The command to run the test is then
Code:
Iometer.exe /c WS-1hr.icf /r WS-1hr.csv
. Because I was testing multiple drives and wanted to be able to refer back to them, I changed the CSV output file name each time I ran it, with things like the drive model and what partition was tested. (You also get a second output file with a similar name which just contains the parameters that were used for the test. You can delete that anytime you want unless you need that information.)

After that, importantly, use the Defrag dialog to "optimize" all the partitions, and then let the system idle for a while, or have minimal use at least. (A good reason for running the tests on a secondary, unused drive, rather than your OS drive. You could potentially do the testing from a WinPE bootable USB so that you're not testing the OS drive.) I'd give it 30 minutes to be sure. You are trying to be sure that the drive has actually performed the TRIM operations (there is no way to find out if it finished; all you can know is that Windows has sent the command to do it, after which the drive choses WHEN to do it) and to be sure the pSLC cache has fully recovered. Keep in mind that TRIM commands only run on allocated space, so don't just leave 3TB of empty space with no drive letter.

If you re-run the test as quickly as possible after it finishes, you'd see the performance drop in the speeds almost right away, because there wouldn't have been many erased blocks available and everything would be running at less than native speed. There would probably be some pSLC cache available, but not necessarily the full amount, so it would soon start to run at folding speed, even less than native speed.

After the wait time, you can change the partitions to whatever test configuration you want to do next. Edit the test config file to point to the correct drive letter. Run the test command with a different output file name; if you don't change it, the results will be appended to the previous CSV file; I wanted to have different files for each drive and each test run so I could easily compare.

You can watch the GUI as the test runs on the results tab, and you'll see when it goes from like 6GBps to 1GBps and even maybe 100MBps when folding is happening. Cheap drives may go down to 50Mbps. I also do a quick check to make sure that the Update Frequency dropdown says 1s, as sometimes it is infinity which means there are no recorded results.

Open the CSV results file in a spreadsheet app, there are a lot of columns you won't care about. The ones relevant to this test are M (Write MBps (Binary)) and AG (Bytes Written). I just delete everything in between so they're beside each other, but I don't save the changes in case I decide to look at something else; I guess you could just hide the columns and save it. You'll need to expand the columns so that AG shows full numbers instead of scientific notation. Now look at column M, which should show many lines close to the maximum speed of the drive. Scroll down until that number suddenly drops dramatically, maybe to 25% or less and stays there. That's the point where the pSLC cache ran out, and the drive had to either start writing to native TLC. Since it's a Samsung drive, it doesn't have to start folding immediately because there is still empty native TLC. If the drive used the full capacity as cache, it would have to begin folding at the same time that new data was coming in, causing an even greater slowdown.

Add up the numbers in column AG beside each of the maximum speeds in column M (copy all those cells to a blank area and use AutoSum, obviously). That's the amount of data in bytes that was written during the full period where the drive was running at maximum speed, and is the size of the pSLC cache. For your SSD when testing the entire drive it should come out somewhere close to 442GB (432GB dynamic cache and 10GB static cache). If it was a drive that used the entire capacity, the total would be reduced significantly due to the existing data on the drive (that space can't be used as cache), but since it's a new and mostly unused Samsung that should not happen. The pSLC cache available should only reduce when the drive has less than about 1.5GB of free space.

When you run the test on a smaller partition, whether it's the first partition, second, third, whatever, ideally the total should still be about 442GB as long as the total free space across the SSD is enough, even if the partition isn't that large, which would mean every partition can access the full pSLC cache and there is no segmentation. But if you run it on the C drive and the total cache is much less, that could indicate that the drive segments the available cache in some way. For such a large drive as you have, you might want to make something like 500GB partitions, and see whether each of them can access the full cache. In one drive, it seemed like the first partition could use all the cache, but a second partition could not.

My theory has been that blocks used for cache on some drives are segmented into "local" availability (not based on partitions as the drive doesn't care about those, but based on the physical blocks of the drive). For example if it uses 500GB segments, then a 500GB partition might only be able to use the "local" 500GB of space for cache, and can't reach out and use the free space that is in "non-local" physical areas of the drive. If the partition size were to cross boundaries of those segments, it would make it even more complicated to figure out how much cache space it ought to have (two partitions that share some cache space but not the rest?), and there obviously would be no documentation of the boundaries.

Samsung of course wants their drives to be known for having top-tier speeds, so hopefully they wouldn't make such a thing happen in most of their drives, but because they make a lot of OEM drives and have had many controllers, it's possible that some of them do this segmentation. The only Samsung drives I had to test were a really crummy OEM drive and a very small but decent one, which is hard to test due to the tiny cache size (the cache can be filled in 2 seconds).

Segmentation of cache I think would have a worse effect on a drive like a Samsung that limits the cache size than on one that uses the entire capacity. If you made a 2TB partition but it only got 217GB of cache, that would mean writes to that partition would slow down even faster than expected. On a different high-speed 4TB TLC drive, it at least would still have 1.3TB of cache available. (But, the Samsung drive would only slow down to native speed after that, say 1.6GBps, for a long time instead of possibly dropping to folding speed and only doing like 600MBps or less as another drive would when it ran out of cache.)

If you've done nothing on the current Windows install yet, then sure, wipe it and test with a blank drive, using WinPE as I said or having it installed as a secondary drive with the OS on another one (ideally with the new drive on the fastest M.2 slot), but I don't think it's really necessary as you already know what the pSLC cache size is. I don't think testing with the existing partitions and OS will have a huge effect as long as you've got very little else happening on the drive at the same time, since other activity would be writing and using some of the cache at the same time. You're not trying to get exact numbers, just ballpark figures to determine whether there's a huge variation. I don't think you want to dedicate the time it would take to testing with all the various possible partition sizes to try to narrow down the segmentation size, if there is one, just to determine the ideal partition sizes. Even if you only had 100GB of cache on your C drive, how often do you write a continuous stream of 100GB of data at full speed and then continue writing more so that you'd see a performance drop?

Unless there's a significant known issue with the current firmware on the drive that is fixed by the new one, I wouldn't put too much effort into flashing it. Just wait a while and see if they update Magician or release a different firmware; sometimes it's just a bug in the flash code or app. If they haven't fixed it after a couple of months, then go ahead and make the effort to flash from a bootable USB.