Question How many devices on USB4

Status
Not open for further replies.

JRRT

Prominent
Mar 25, 2022
41
0
530
I am looking to make a really large home built file server. I am aware that there are NAS solutions that will let me put maybe a dozen hard drives in them that are available fairly inexpensively, but I need more like 50 - 100 drives, each in the 10-20 terabyte range. Absolute top performance would be nice, but network speed is going to be the limiting factor anyway. My thought is if the new USB standards like USB4 will still allow you to connect a whole lot of devices, and if there are reasonably priced adapters available, then I could hook up a rack of drives this way. Does anyone know how many devices you can hook up to USB4 and if USB to hard drive (as opposed to the easy to find USB to NVME) adapters are available? Thank you.
 
"but I need more like 50 - 100 drives, each in the 10-20 terabyte range. "

Yeah, that's not happening. You can expect at least 1 or 2 drive failures per month either from the heat of so many drives packed that close together or from the vibrations induced by that many constantly moving heads. Rebuild time on an array that size runs into the several weeks range which may induce even more failures.
 
  • Like
Reactions: Christophe516

JRRT

Prominent
Mar 25, 2022
41
0
530
To answer people's questions and I guess make this make more sense. I am a consulting mechanical engineer who is able to build a physical rack that dampens vibration and has enough forced ventilation to get rid of the quite obviously substantial heat. I am working with friends on an AI research project, and so unless and until we get funding this is out of pocket therefore we are trying to do this halfway affordably.
 
Last edited:

JRRT

Prominent
Mar 25, 2022
41
0
530
The programming guys don't know the answer to this, and the main hardware guy is in the hospital right now, so I am asking all of you. Thanks.
 

USAFRet

Titan
Moderator
To answer people's questions and I guess make this make more sense. I am a consulting mechanical engineer who is able to build a physical rack that dampens vibration and has enough forced ventilation to get rid of the quite obviously substantial heat. Working with friends on an AI research project, and since unless and until we get funding this is out of pocket we are trying to do this halfway affordably.
And still wondering, for a "home built file server "....."50 - 100 drives, each in the 10-20 terabyte range "
At the low end of that combination, 500TB.

That is much different than "an AI research project".

Details up front are very helpful.
 

JRRT

Prominent
Mar 25, 2022
41
0
530
Yeah, that's not happening. You can expect at least 1 or 2 drive failures per month either from the heat of so many drives packed that close together or from the vibrations induced by that many constantly moving heads. Rebuild time on an array that size runs into the several weeks range which may induce even more failures.
I should have been more clear. I never planned on using a commercial rack, but instead purpose building one designed for this that would accommodate the needs of the hard drives. I made the mistake of falling into the assume trap and making myself look silly. I will not be packing the hard drives closely, there will be forced air ventilation around the sides of each one in a 45 degree F room, and cast iron is really good at not transferring vibration. (By the way, titanium is also good for dampening vibration, and prettier, but a great deal more expensive and actually much harder to work with machine tools.) So you use cast iron and paint it. But it is brittle and has low tensile strength. Still, it should be good enough to hold a few hard drives, just not an elephant. This is why they needed a mechanical engineer.
 
Last edited:

JRRT

Prominent
Mar 25, 2022
41
0
530
By the way, I think that this is my first time posting on this website, is this even the right place for me to be putting this question? Thank you all very much!
 

JRRT

Prominent
Mar 25, 2022
41
0
530
Thank you very much! I am going to have to do some research to see if this can work for what I am asking about in this thread or not, it is absolutely something that I need and at a sort of reasonable price for an individual. Basically, it is at least an option for local data caches. I really appreciate it very much. Now I just need to figure out if there is a practical way to make it work for the truly mass data storage needs. I know that in theory I could probably spread the training data across a few of these, and the ability to add more of them as finances allow is appealing, so there is a very good chance that we will wind up doing this. On the other hand, I am going to have to ask the data base guys if 10 large hard drives will work. We want to go with RAID 6 and / or mirroring for everything. I imagine that ten hard drives of twenty TB each would be plenty of space, but in terms of performance my understanding was that the Database server was supposed to have a "special" set up. SSD boot drive with the software on it, and while the data gets stored on a RAID 6 hard drive array there is also supposed to be a mirrored and striped group of SSDs set up as a cache? If I understand this correctly, in spite of the fact that the network limits data transfer speeds this is still well worth doing because the data base server performs operations on the data that is being stored on it? So does this mean that we would not want to move the data off of it onto something like one of these racks, or is this really just chasing marginal gains?
If you don't know, my apologies for bothering you, I just don't really know and I am very grateful for any help. Thanks!
 

Ralston18

Titan
Moderator
"Out of pocket"....

Startup? Business Plan?

Is there a budget?

That rack mount NAS does not come with disks. (Reference Customer Q & A #3.) Also no rails.

And you need rack(s), power, cooling, etc..

Not questioning RAID 6 per se but is RAID 6 truly a requirement for the AI project? Or someone has just decided to use RAID 6 versus some specific reason or requirement for doing so. Data can still be lost.

Not a RAID person (full disclosure) so I cannot really question the details.

However there are trade-offs involved.

For example:

https://techgenix.com/raid-5-vs-raid-6/

I believe that I have a sense of what you (Mechanical Engineer)/your role is: and that role is to basically to build/provide the rack(s) to hold everything. Correct?

You should not be concerned about what RAID for example. The OS, the AI software, - not you. Disk drives and drive capacity - not you. Network requirements and performance - not you.

Accommodating the required number of disk drives, power requirements to the rack, motherboard, drives, and proper cooling, etc. - mostly you I think.

Remember access, maintenance, and physical security. All very important.

= = = =

Unfortunately this is one of those situations where a "group of friends" can quickly end up as no longer friends.

Especially as the initial startup investment and required operational expenses start to grow.

Define/quantify marginal gains... Actually for many startups immediate gains are rarely expected. Planning includes allowances for initial losses.

My recommendation is a serious, mandatory "all hands on deck" group meeting to identify the specific objectives, requirements, and assignments.

You need to design to specific written requirements that are provided to you.

If you can design and build to those requirements - then good. If not, then you provide the necessary feedback as to why not and perhaps offer options within your purview.

Just my thoughts on the matter.
 

JRRT

Prominent
Mar 25, 2022
41
0
530
And still wondering, for a "home built file server "....."50 - 100 drives, each in the 10-20 terabyte range "
At the low end of that combination, 500TB.

That is much different than "an AI research project".

Details up front are very helpful.
You are of course correct, I am just too used to people telling me that I give too many details. In the future I will know that they are welcomed in these forums. The ultimate goal is to have two pools of storage, one of which can probably be reasonably easily physically subdivided, but I am not sure if this is the case for the other. Still in the planning and budgeting stages, but we would like to be able to allocate about a petabyte of useable storage in each, and have redundancy equal or better to what you can get with RAID 6. Again, only a few people trying to start something, and if we don't have double redundancy for storage somebody has to watch it all the time. While it can obviously be monitored remotely, somebody still needs to get to it to fix it, so for example we could not all go and meet with investors if this ever gets that far. I am sure that there is more info that I should be giving you and am not, my apologies in advance. Training data can likely be spread across as many physical locations as needed, seems that I was probably incorrect, but I am still not clear about the database.
 

JRRT

Prominent
Mar 25, 2022
41
0
530
And still wondering, for a "home built file server "....."50 - 100 drives, each in the 10-20 terabyte range "
At the low end of that combination, 500TB.

That is much different than "an AI research project".

Details up front are very helpful.

I just realized that the detailed follow up explanation that I thought was posted... was not. Which makes what I have been saying make even less sense. The abreviated recap is that we are working on a new approach to artificial intelligence software that, in theory, should be less computationally intensive but does still require a lot of data storage. The work is performed in parallel on networked machines that cache what they are working on at the moment locally, but requires a large central repository for every machine to pull training data from so that we can be sure that they are using the same data set. We don't know for sure how big this needs to be but estimates are as high as about a petabyte. The work done on the data is split between local processing of the raw data itself that has been cached on each machine, and frankly over my head SQL operations that are run on the database. I don't fully understand the software side. I just know that apparently the database server needs to be optimized for performance because apparently there are actually operations performed there, it is not just a matter of retrieving data from it. The problem is that no one has any idea how big this database will grow, so the desire is to "make it as large as possible so that it doesn't wind up breaking the whole thing".
 

JRRT

Prominent
Mar 25, 2022
41
0
530
Apparently I am even more behind then I knew... I finally got ahold of a friend who is a sysop and he told me basically " Yes of course you there are cheap easy adapters online to connect a hard drive to USB if you really want to do it that way, but it is almost certainly going to choke if you try to put more than about four of them on a controller" and he then mentioned collisions and advised me to use something like is in the link that you were nice enough to provide. Thanks again!
 
Status
Not open for further replies.