News 77TB of Research Data Lost Because of HPE Software Up

JOSHSKORN · Jan 1, 2022

If you lose data in this day in age, you probably deserve it. Hell, even 20 years ago, I would backup data on a storage medium that I would always have to physically attach. The company that I was working for would back up data, send it off and keep it for at least 4 days at a time. How the hell is this still not a practice to some companies?

derekullo · Jan 1, 2022

InvalidError said:
That only works when the backup script or software responsible for creating snapshots or whatever backup strategy they were using is actually doing its job as intended instead of destroying the files it was meant to preserve.

They got screwed over by a buggy backup script. Their data would likely have been fine if they hadn't attempted to back it up with the "updated" backup script that ended up destroying two days worth of data before they realized something went wrong.

Ahh that makes sense.

I didn't occur to me that you could have the same system for initializing the backups as well as creating the snapshots.

When I first saw the article I thought of the image in your signature!

brandonjclark · Jan 1, 2022

Who wants to bet the problem was a bad ROBOCOPY / switch?

elementalRealms · Jan 1, 2022

ThatMouse said:
Billion dollar super computer and they can't afford a real backup? The excuses are astonishing and make zero sense. I don't know why HPE is taking the blame. The systems administrator is not someone you can just outsource. HPE doesn't know what files need backing up. Did the researchers just send HPE an email and say, all our important files are back-up? Yes. That's good enough for me!

and who told you that they did not pay HP for the backup solution ? HP admitted its fault. But what makes me angry about this situation is the new generation of lazy people who dont write their own scripts for such a HUGE research Data backups !!! or at least check the codes and the scripts before applying them (Test them ) ... Universities should be better than this. Many MANY people should be fired after this.

InvalidError · Jan 1, 2022

elementalRealms said:
Many MANY people should be fired after this.

If the data lost is hundreds of man-hours worth of work, maybe. If most of the data loss is supercomputer sim output that can be re-created by re-running the sim from the last saved checkpoint before the backup screw-up, then there is no major harm done other than the waste of sim time.

Darkbreeze · Jan 1, 2022

scottsoapbox said:
Data isn't safe unless it's in at least 3 different physical locations using at least 2 different automated backup processes.

You're ALMOST correct, except that what it should have said is "3 different physical locations with at least 2 of them having been manually created and confirmed". Nothing "automatic" is EVER "safe" or "guaranteed". And that includes updates, which should not ever be pushed to critical systems without prior authorization, specifically to avoid these kinds of potentially catastrophic occurrences. Hell, they shouldn't even be pushed to consumer systems without an "ok, go ahead". I don't care what anybody's reasons are for it. They are likely good and valid reasons, but it still needs to be given the "sure, go ahead" first.

Geef · Jan 1, 2022

Tech from HPE "Ah darnit, we lost 77TB of data." 😱

Earlier that day...

Scientist from University "Wonderful, I just finished the genetic code for the cure to cancer and uploaded all copies of it to the system! It will be safe there!" 🤓

escksu · Jan 2, 2022

Delete does not mean the data is lost...... Still can recover though.

USAFRet · Jan 2, 2022

escksu said:
Delete does not mean the data is lost...... Still can recover though.

If they could, we wouldn't be having this conversation.

And 'recovering' 77TB of screwed up backups is not like retrieving a couple of accidentally deleted cat pics from your Recycle Bin.

Endymio · Jan 2, 2022

dalauder said:
It doesn't really sound like it's "CRITICAL RESEARCH data". From what I can figure, they've probably already caught up on all the data they lost.

You're interjecting logic and rational skepticism into sensationalistic reporting? Apparently it was two days worth of data -- and more than two thirds of that was recovered. So less than one day's data was actually lost.

Endymio · Jan 2, 2022

InvalidError said:
Offline backups won't save you when it is your broken backup script that is deleting files instead of actually backing them up.

If a "broken script" has access to the backup, then it isn't an offline backup.

Endymio · Jan 2, 2022

USAFRet said:
If they could [recover it], we wouldn't be having this conversation.

More than 2/3 of the data was recovered, however.

And 'recovering' 77TB of screwed up backups is not like retrieving a couple of accidentally deleted cat pics from your Recycle Bin.

This isn't a PC. The storage subsystem(s) on this supercomputer have an aggregate bandwidth of 640GB/sec and a total capacity of nearly 1 PB.

USAFRet · Jan 2, 2022

Endymio said:
This isn't a PC. The storage subsystem(s) on this supercomputer have an aggregate bandwidth of 640GB/sec.

Yes, I know this isn't a "PC". I was just commenting on the 'ease' of recovery noted by a previous commenter.

alceryes · Jan 2, 2022

USAFRet said:
...make good passwords...

Have you seen Microsoft's latest take on 'password management'?

The passwordless future with Microsoft | Microsoft Security Blog

Beginning today, break free from your password and go passwordless with your Microsoft account.

www.microsoft.com

USAFRet · Jan 2, 2022

alceryes said:
Have you seen Microsoft's latest take on 'password management'?

The passwordless future with Microsoft | Microsoft Security Blog

Beginning today, break free from your password and go passwordless with your Microsoft account.

www.microsoft.com

Yes, I saw that.
It is still a "password" of sorts. Just using your phone, instead of typing it in a box onscreen.

Endymio · Jan 2, 2022

USAFRet said:
It is still a "password" of sorts. Just using your phone, instead of typing it in a box onscreen.

Tied to a physical device like your phone, it becomes a "key", not a password. Passwords are pure information, easy to duplicate or simply guess, whereas no one can unlock the deadbolt to your house without physically picking your pocket for the key.

alceryes · Jan 2, 2022

USAFRet said:
Yes, I saw that.
It is still a "password" of sorts. Just using your phone, instead of typing it in a box onscreen.

I'm not a fan.
What happens when these 2FA services go down? (well, it's not even 2FA at that point. It IS your login.)

USAFRet · Jan 2, 2022

Endymio said:
Tied to a physical device like your phone, it becomes a "key", not a password. Passwords are pure information, easy to duplicate or simply guess, whereas no one can unlock the deadbolt to your house without physically picking your pocket for the key.

And the phone needs to be unlocked with....fingerprint, iris scan, facial rec, a "password" or PIN...

And then you lose your "key", and what happens?

Additionally, we now have all the keypad doorlocks, that run through AWS, etc.
Just a couple weeks ago, people were locked out of their house due to an outage at AWS.

alceryes · Jan 2, 2022

USAFRet said:
Just a couple weeks ago, people were locked out of their house due to an outage at AWS.

I was actually locked out of one of my Gmail accounts because Google wasn't sending me the 2FA (G-code) text during one of those AWS outages. Made me wonder why a service Google relies on to allow logins, relies on AWS.

USAFRet · Jan 2, 2022

alceryes said:
I was actually locked out of one of my Gmail accounts because Google wasn't sending me the 2FA (G-code) text during one of those AWS outages. Made me wonder why a service Google relies on to allow logins, relies on AWS.

Last summer on vacation, partial fail on my part:

Recently reinstalled OS on my travel laptop.
Go on vacation.
Try to log into Service X.
Oops...we need to send a verification to your email.
Can't log into the email, because gmail needs to send a verification to the other gmail...
Round and round we went....

So some would say just use the phone as part of the 2FA.
Then your phone becomes the single point of fail.

jmcgaw · Jan 2, 2022

USAFRet said:
What part of "offline backup" was unclear?

You beat me to it. I'm just a simple little private user with nothing beyond financial records and family momentos to protect but I invariably have two generations of fully verified off-site backup in my bank deposit box and a couple more stored in house. Sure my off-site backups drives are only 4tB each but I'm sure that a big uni can afford bigger drives. I wonder if they are in the market for a paranoid backer-upper to take care of their data and how much they are willing to pay? (also USAF retired)

InvalidError · Jan 2, 2022

Endymio said:
If a "broken script" has access to the backup, then it isn't an offline backup.

A broken backup script that either deletes files or lets them get deleted without getting backed up doesn't need to have access to offline backups. Once the files are gone from the original source without getting backed up for whatever reason due to a backup screw-up, they never make it into a backup whatever that backup is.

alceryes · Jan 2, 2022

USAFRet said:
So some would say just use the phone as part of the 2FA.
Then your phone becomes the single point of fail.

Some of these 2FA services rely on a running SaaS that is separate from the target system(s) you are trying to log in to.
That service remains another point of failure unless the 2FA component lives entirely on the target system(s) and doesn't need any extra off-system services to operate.

Darkbreeze · Jan 2, 2022

Endymio said:
Tied to a physical device like your phone, it becomes a "key", not a password. Passwords are pure information, easy to duplicate or simply guess, whereas no one can unlock the deadbolt to your house without physically picking your pocket for the key.

As with anything, they can still get it in. It just takes brute force. They can pretty easily bash a window or kick in a door, and no need for the key at all. In the end, it's the same result regardless of which way they do it. But, probably not the right thread for that discussion.

GenericUser · Jan 2, 2022

USAFRet said:
Oops...we need to send a verification to your email.
Can't log into the email, because gmail needs to send a verification to the other gmail...
Round and round we went....

Not exactly the same thing, but this reminded of a time I needed to contact person A for something. As I started to write the email I got an out of office message pop up since they were on vacation, and it said to contact person B for what I needed. So I abort that email and start writing one to person B. Well, apparently person B was out of office too, also on vacation and their out of office message said to contact person A. That was fun to deal with.

News 77TB of Research Data Lost Because of HPE Software Up

Distinguished

Splendid

Distinguished

Titan

Retired Mod

Distinguished

Reputable

Titan

Reputable

Reputable

Reputable

Titan

Splendid

Titan

Reputable

Splendid

Titan

Splendid

Titan

Distinguished

Titan

Splendid

Retired Mod

Distinguished

Share this page