Help with DAS/NAS diy

BLACKBERREST3

Prominent
May 23, 2017
50
0
630
Hello, I am looking to build a DAS/NAS. I have been researching on what the best solutions would be for performance and it has left me with many questions. I would like to use existing hardware if possible which consists of: i7 6700k, 64gb DDR4 (non ecc) corsair ram, Z170 Deluxe (20 lanes all together). I will use SyncBackPro for scheduled backups. I want insane read/write speeds on the main work portion (around 8 terabytes) and 2 high-speed redundancies (I’ll add more as I need it). I am looking towards using software raid and using either FreeNAS or Windows Server 2016. I also need my data to be byte perfect with no degradation over time to preserve precious data. I plan to use this as a personal DAS/NAS and not something that would require it to run all the time.

My questions are:
1. ZFS or ReFS, suggestions?
2. Can I use ram as a non-volatile super cache or lazy read-writer if it is always powered and I keep redundancies to prevent data loss?
3. What is the best set up for performance that also lets me add more storage easily if I need it; raid0 SSDs + raid10 HHDs or tier-ing or something else?
4. What SSD/HDD combros do you guys recommend, I am leaning towards seagate for HDDs?
5. If a raid array fails, does that mean that I must replace the drive that failed or all drives because it damaged it somehow (only talking about hardware not data)?
6. What is the best way to connect to this DAS/NAS; direct PCIE pc to pc or 40/100gbe or something else?
7. How would I set up a 40/100GBE connection and what would I need?
8. Is there any other thing that I may need to know or want relating to this?
 
Solution
PCIe Gen 4 is due next year. Gen 5 is due a couple years later. 5 years from now, you won't be looking at a build like this at all. ThreadRipper is due in weels. I wouldn't go for a build like this even then.

Regarding the workstation+storage server build, you don't need a crazy network. Just allocate half the drives to the storage server for backup purposes. Unless you need twice-daily backups, a simple gigabit network would suffice. All you'd need is a half decent switch that won't destroy the rest of the network while the backup is running, and a second port for RDA.

Regarding "no bottleneck vs depends on networking", that's fairly naive. There's always a bottleneck. It's almost always a function of the workload, and usually best...
"Best solutions" is implicitly understood. "Insane speeds" not so much.

You need to quantify your requirements to guide the design and subsequently select hardware and software.

One of your requirements is to use existing hardware. That is a firm requirement to a large degree., You just need to document and verify that all of the NAS components (hardware, software, connections,etc,) will work together. No one part bottlenecking the overall design and performance.

Use the following approach: Provide answers as best you can for your own questions and the basis for the answer per your research.

Your "answers" may be right, wrong, or somewhere in between. And wrong is not meant in a negative sense. There may simply be some other factor that should be considered.

For example - Question 5: There are different types of RAID configurations. All have pros and cons. There may be trade-offs between any options that are truly available. Your hardware may limit those options.

Anyway, by providing answers to your own questions you will learn more in the process and more likely to receive further comments and suggestions based on what what you determine is the "best" for your environment and plans.
 


The whole point of me going to a forum is get a number of people to give me ideas or something I can work with. I have already done research for this and I am hoping I can find more information here. If you don't know any of the answers or have any ideas then don't respond telling me to answer them myself.
 
What is your application, and what are the performance requirements of the system? Do you have specific hardware redundancy requirements?

You mention that this also needs archival capabilities. How long are we talking? 5 years? 10 years? 100 years?

The single most important factor in this type of build is what you're using it for. This will determine most design factors and performance requirements. "Insane performance" can mean very different things to different people, and different applications will radically change the hardware/software you should look at.
 
For performance requirements, I would need to work with around 8 terabytes of data renaming, zipping, unzipping, fixing, etc. and not have it take 4 hours to do. That would be for the main pool of work space. For the actual storage, I would like to have high read/write speeds for easy access and for the redundancy I would prefer that to last 100+ years. I know that a raid array can provide a large boost to performance and that there are other methods to boost performance as well. I will be using this system as a DAS primarily. I am only choosing to use an OS designed for a server because, correct me if I am wrong, they have an improved file-system for working with long file names and larger datasets + they have superior software raid as well. I would like to use as much of my current pc components as possible for this build minus the SSD/HDD because I will buy those separately. The original questions still stand and would help me if you know anything about them.
 
I'd be happy to help with this. There are some details to be aware of first, though.

The workload you're describing is rather unusual. It involves operations that tend to be limited by very different components. Each of those must be optimized to achieve "extreme performance".

The retention length you're asking for can only be achieved with a handful of options, and you won't like any of them. The two that come to mind with that level of retention are tape drives and blu-rays.

Your views regarding RAID arrays are somewhat true in certain circumstances. This is not one of them at all, especially with any sort of software RAID.

Your views regarding server operating systems is somewhat true, but depending on how you want to use this system, the benefits may not overcome the penalties associated with remote storage.

How many files are we talking about, and what is your budget for new components? If I were planning this sort of system for a business, I would expect it to run $10000 or more.

Also, please describe your workflow, the software that you use, and what you're trying to accomplish in exacting detail to the best of your ability. Without details, this is not a solvable problem.

Regarding your original questions:
1) This depends greatly on what features you want to have access to. Performance is rarely involved in file system choice.
2) You're playing with fire. Considering the reliability requirements you've put forth, a RAM disk should not be on your list of considerations at all.
3) RAID can cripple this system's performance for several of the operations you've listed unless you change a number of components. SSDs do not respond well to RAID.
4) This depends on the details of the workload. Based on what you've said so far, the drives will probably not be the limiting factor.
5) The impact of a drive failing is very different depending on the RAID level.
- RAID 0: Losing a single drive means all data is effectively lost.
- RAID 1: If you have a single drive left, you haven't lost anything.
- RAID 10: If you lose one drive, you're fine. If you lose two, it depends on which two. You either lost nothing or everything. If you lose three, everything is lost.
- RAID 5: If you lose one drive, you're fine. If you lose two, you lose everything.
- RAID 6: If you lose one drive, you're fine. If you lose two, you're fine. If you lose three, you lose everything.

6) This depends on your particular requirements and budget. I've never seen PCIe used to connect an storage server. DAS refers to the drives inside a computer, not a separate appliance (usually).
7) This isn't something for the uninitiated or those without large sums of money to throw at the problem. If you do have a large sum to throw at this problem, I can think of numerous better ways to spend it. Once all of those are checked off, you may then discuss that type of networking tech.
8) Yes, lots. More than you probably want me to post in a single comment, actually. This is a complex, multi-faceted optimization problem, and the storage system is a single subsystem among many. I'd be happy to walk you through everything, but one thing at a time. This isn't nearly as cut and dry as you seem to think.
 
Let's say theoretically I have already built a DAS/NAS/SAN with windows server 2016 already installed. Let's say I am replacing all the drives in this machine with a combination of solid state and hard disk drives. Now, in what configuration would this give me the best read/write speeds for 8 terabytes of space while also providing me with 50 terabytes of high speed storage and 50 terabytes of high speed storage. Don't think about costs or brands of items, just tell me;
What is the fastest way to connect this to my main pc?
What type of configuration would I set up with said HDDs and SSDs to get 8tb+50tb+50tb speeds mentioned above?

What I don't want to hear:
"Speeds are dependent on the drive you get and no configuration can improve that." - FALSE
"What do you want to use it for?" - the description is good enough
"Raid is not a good backup solution" - It is a good backup solution because I am not running this 24/7 and I can stop operations on it to rebuild from the redundancies. Hence why raid has redundancy in the first place.

Thank you the nerd 389 for answering a couple of my questions.
 
Without more details, this is where I'd suggest you look:

For the 50 TB blocks, consider 8x 10 TB drives in RAID 6E with hardware RAID. This is reliable. This will choke on lots of small writes. If that's not your priority, then more details are necessary. Also, these drives offer about 10 years of retention, not 100. Performance of this block is as good as you can realistically get out of a 50 TB block in a single machine.

For the 10 TB block, you would want to look at something like 4 8x PCIe SSDs, such as the HGST Ultrastar SN150 if you want the best in every possible metric, assuming you had the PCIe lanes to work with, which you don't. That CPU will struggle to keep two of these running at full speed from a CPU power perspective, and simply cannot feed more than three at a time due to PCIe lane restrictions. In compression workloads, it won't even keep one fed. With those drives, you have to choose between reliability and performance. RAID will reduce performance in every variation, including RAID 0. Using these drives as a single block is non-trivial and potentially requires a custom driver. These drives offer about 5 years of retention.

For backup, you'll need either tape or blu-rays to get 100 years of retention or more, period. That kind of retention requires that you do not have the backup in the same place as the server. Ideally, they should be on separate continents.

For the interconnect, be prepared for some headaches. 40 Gbe is doable by your average person. 100 Gbe gets complicated due to cabling length requirements. If you need to run the cable for more than about 3 feet, you'll need to make your own cable and do a better job than modern cable factories, or use fiber. Unfortunately, you don't have the PCIe bandwidth to do this with the storage system connected.

If I were asked to put that system together for a business (I would have more details, to be sure), I would be looking at a cluster of dual socket Xeon E5s.
 


What infrastructure besides the Windows server do you have? SAN and NAS all assume some type of infrastructure. Your original post asked about 40Gbit and 100Gbit connectivity. 40GE has $20K entry point and 100GE is probably $50K (or more).
 
There isn't an infrastructure at all, I am trying to make the most of what I have more than anything. I have done a little bit more research on bandwidth and the type of bottlenecks there might be. The CPU I have now has 16 lanes available to use which means 16*8gt/s *128/130 bit encoding * 1byte/8bits=15.7538 gigabytes/s of theoretical bandwidth. I did not know what interface to use and I still don't. I thought that I could get away with software raid only, but now that I have done some research, how would I connect the drives? List time;

1. How do you connect drives to a z170 deluxe motherboard that would enable them to use the full x16 bandwidth or even half if I need a network card or something; Does it have to do with raid cards or some type of nvme or sata pcie adapter?

2. I forgot to ask this in the other thread, say the CPU has 16 lanes and the chipset has essentially 4 lanes (DMI 3.0). Does this mean that the lanes are shared between the two or does it communicate differently giving me 20 lanes altogether?

3. Thuderbolt 3 is also 40gb/s and it doesn't cost too much. How could a 40Gbit connection be any different besides flexiblity?
 
It's worth noting that your CPU-to-RAM connection only has about 34 GB/s of bandwidth. If you max out a 40 Gb connection, and then transfer that to disk, that's essentially all of your RAM bandwidth gone. Also, you don't have enough PCIe lanes to transfer the data from the network card and to the storage system at the same time.

You really do need more PCIe lanes and more memory channels.
 
Can you explain how the ram would be the bottleneck in more detail. The DMI has a theoretical output of 3.94 gigabytes/s which is plenty fast for a redundancy and main storage. The pcie lanes are what I am trying to turn into the main 8tb of workspace. If I have to reallocate 4 lanes for connecting it to the network or directly to the main pc, I will try to find a way. It would help to know exactly how many lanes I would be working with also if you could answer the last one;
Say the CPU has 16 lanes and the chipset has essentially 4 lanes (DMI 3.0). Does this mean that the lanes are shared between the two or does it communicate differently giving me 20 lanes altogether?
 
Your CPU has 20 lanes. The particular configuration of those lanes is a function of the chipset, if I'm not mistaken. Regardless, you'd be limited to 16 lanes to share between the network card and working space. If you have PCIe switches, you'd have 16 lanes on both, but could only use one at a time. If not, you'd have 8 lanes on each, and could access them at the same time.

The DMI lanes are separate from the 16 CPU lanes.

Also, there's a reason you can't find options for NVMe RAID hardware. It's a bad idea, and you suffer significant performance penalties when you try it. We've already said that. Several times. The numbers they show on that page are the interface throughput values. You won't see anything close to that kind of throughput.

Assuming no processing is done on incoming network traffic, you have 8 GB/s coming in from the network into RAM. This then goes from RAM to the working storage area. That's your 15.7 GB/s. The problem is that there IS processing between the network and the storage device. The CPU would probably read the data one or two times before writing it into storage. That means you're going to bump right up against the 34 GB/s RAM limit with almost no processing of incoming data.

If you're trying to do anything else when the transfer starts, it's basically going to come to a full stop.

Regarding being able to work with anything at 8 GB/s throughput, it's incredibly unlikely with that CPU or RAM. Compression often involves 20-30 reads/writes to RAM per byte of input data. That's a RAM throughput bottleneck in this system.

Of course, that assumes the CPU can actually keep up with the data. I would be utterly shocked if it were capable of hitting the RAM throughput limit in a compression workload. It's just not fast enough.

As I've stated before, you should be looking at a dual socket Xeon E5 system, not a consumer one. I'm not sure why you seem set on using the i7, but it's seriously not cut out for this. Moving up to the E5 would let you run as much networking throughput as you could possibly want, multiple 8x PCIe SSDs, 8 channels of RAM, and enough CPU cores to do something useful with it all.

From what I can tell, you've been using theoretical numbers for your throughput estimations. Unfortunately, those numbers are normally quite a bit different than real-world performance.
 


In theory, maybe.
Since we have not had tape drives, or even less blu-ray, in existence for 100 years...
Yes, I know about forced aging and timelines, but still, just theory.
 
I guess I am just confused between how a client/server operate. I tried researching it, but no luck yet. If you make a request on the main pc to download, upload, or change data from a server, then what are the processes involved? I thought the main pc does all the heavy lifting while the server just manages the data. I'm not after 100 years necessarily, I will replace them when they fail which shouldn't happen too often at all.
 


OK...I have a NAS.
Among other things, it holds my medium size movie and music libs.
As wella s backups and a bunch of other stuff.

The main system where I am typing this just sees that as 'just another drive'.
Most of the main processing happens here.

Now...this NAS I have also does 4k video transcoding, on the fly.
It will output directly to the TV if requested.

If you wish your NAS (or render farm) to do something else, you'd have it set up differently.
You could cause it to do the bulk if video rendering. (If you are actually doing that much, which is doubtful)

Mine could also, if I chose, to be the main torrenting box. Or FTP server, or run a firewall box in a VM, and be the main border protection device.


What, exactly, do you want yours to do?
 
I think it might be easier to have the work storage on the main pc instead which is going to have at least 40 lanes. I don't know how memory bandwidth relates to the system bottleneck in terms of storage, but I am trying to find out as best I can. So for the NAS/DAS side of the build;

x amount of tb in an array for high amount of sequential read/write speed
x amount of tb in an array for cold storage and reliability
 
Just a recap here:

I'm trying to turn my pc into a server that is only going to be accessed by 1 person at a time (me). I need it to have high sequential read/write speeds and a reliable redundancy. My system has a memory bandwidth of 34.1GB/s, 16 free pcie 3.0 lanes to do whatever with, 4 extra lanes from the (DMI 3.0) chipset, and can support 64gb of ram. I am hoping to connect this NAS to the main PC as a DAS or a SAN or whichever is faster or as fast as it can support. I want to use windows server 2016 because of ReFS.
 
I am going to ask this one final time:

What, specifically, will you be using this for?

"I need it to have high sequential read/write speeds and a reliable redundancy." is not a "use".
'High' is not a number.
"reliable redundancy" can be done with 2 garden variety hard drives

Playing/storing movies is different than producing movies is different than building a skyscraper with AutoCAD is different than rendering the totality of Cars 6 is different than being your own personal neighborhood ISP is different than running a game server for 300 people is different than creating the next Angry Birds...

So...what do YOU want to do?.
 
Taking all of this into account. I have decided to use the 40 lanes and increased memory bandwidth from the new pc build as the main work space. The server will have a cold storage array on the onboard sata controllers that use 4 lanes because they are for redundancy and not speed. The access part of the server will have as high of a sequential read/write speed as I can get up to the bottleneck, which in this case, should either be the drives or the ram bandwidth. Now I only have a couple of questions;

Which interface should I use; 40gbe nic or thuderbolt 32gb/s?
Do I need a raid card or a sata adapter if I want to take advantage of software raid?

I thought I mentioned what I wanted to do with it already. I need more storage plain and simple. I would like it to also act as a nas so I can access these files remotely.
 


None of your statistics are that meaningful. Why? Unless you choose a lot of SSD you are limited by the bandwidth on each disk spindle. A WD Red PRO disk will give about 200MB/s sustained sequential throughput. If we could get ALL that performance, a RAID 6 of ten disks would only get about 1.6GB/s. That is it. Period.

Your expectations are not calibrated to the physical limitations of storage. If your "budget" has six digits then we can talk about 5GB/s throughput.
 

TRENDING THREADS