this post was submitted on 31 Jul 2024

429 points (98.4% liked)

News

23367 readers

3121 users here now

Welcome to the News community!

Rules:

1. Be civil

Attack the argument, not the person. No racism/sexism/bigotry. Good faith argumentation only. This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban. Do not respond to rule-breaking content; report it and move on.

2. All posts should contain a source (url) that is as reliable and unbiased as possible and must only contain one link.

Obvious right or left wing sources will be removed at the mods discretion. We have an actively updated blocklist, which you can see here: https://lemmy.world/post/2246130 if you feel like any website is missing, contact the mods. Supporting links can be added in comments or posted seperately but not to the post body.

3. No bots, spam or self-promotion.

Only approved bots, which follow the guidelines for bots set by the instance, are allowed.

4. Post titles should be the same as the article used as source.

Posts which titles don’t match the source won’t be removed, but the autoMod will notify you, and if your title misrepresents the original article, the post will be deleted. If the site changed their headline, the bot might still contact you, just ignore it, we won’t delete your post.

5. Only recent news is allowed.

Posts must be news from the most recent 30 days.

6. All posts must be news articles.

No opinion pieces, Listicles, editorials or celebrity gossip is allowed. All posts will be judged on a case-by-case basis.

7. No duplicate posts.

If a source you used was already posted by someone else, the autoMod will leave a message. Please remove your post if the autoMod is correct. If the post that matches your post is very old, we refer you to rule 5.

8. Misinformation is prohibited.

Misinformation / propaganda is strictly prohibited. Any comment or post containing or linking to misinformation will be removed. If you feel that your post has been removed in error, credible sources must be provided.

9. No link shorteners.

The auto mod will contact you if a link shortener is detected, please delete your post if they are right.

10. Don't copy entire article in your post body

For copyright reasons, you are not allowed to copy an entire article into your post body. This is an instance wide rule, that is strictly enforced in this community.

founded 1 year ago

MODERATORS

[email protected]

429

Delta CEO says CrowdStrike-Microsoft outage cost the airline $500 million (www.cnbc.com)

submitted 3 months ago by [email protected] to c/[email protected]

129 comments fedilink hide all child comments

Delta Air Lines CEO Ed Bastian said the massive IT outage earlier this month that stranded thousands of customers will cost it $500 million.
The airline canceled more than 4,000 flights in the wake of the outage, which was caused by a botched CrowdStrike software update and took thousands of Microsoft systems around the world offline.
Bastian, speaking from Paris, told CNBC’s “Squawk Box” on Wednesday that the carrier would seek damages from the disruptions, adding, “We have no choice.”

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 9 points 3 months ago (2 children)

I think what @[email protected] was saying is you shouldn't have multiple mission critical systems all using the same 3rd party services. Have a mix of at least two, so if one 3rd party service goes down not everything goes down with it

[–] [email protected] 12 points 3 months ago (1 children)

That sounds easy to say, but in execution it would be massively complicated. Modern enterprises are littered with 3rd party services all over the place. The alternative is writing and maintaining your own solution in house, which is an incredibly heavy lift to cover the entirety of all services needed in the enterprise. Most large enterprises are resources starved as is, and this suggestion of having redundancy for any 3rd party service that touches mission critical workloads would probably increase burden and costs by at least 50%. I don't see that happening in commercial companies.

[–] [email protected] 6 points 3 months ago (1 children)

As far as the companies go, their lack of resources is an entirely self-inflicted problem, because they're won't invest in increasing those resources, like more IT infrastructure and staff. It's the same as many companies that keep terrible backups of their data (if any) when they're not bound to by the law, because they simply don't want to pay for it, even though it could very well save them from ruin.

The crowdstrike incident was as bad as it was exactly because loads of companies had their eggs in one basket. Those that didn't recovered much quicker. Redundancy is the lesson to take from this that none of them will learn.

[–] [email protected] 3 points 3 months ago (4 children)

As far as the companies go, their lack of resources is an entirely self-inflicted problem, because they’re won’t invest in increasing those resources, like more IT infrastructure and staff.

Play that out to its logical conclusion.

Our example airline suddenly doubles or triples its IT budget.
The increased costs don't actually increase profit it merely increases resiliency
Other airlines don't do this.
Our example airline has to increase ticket prices or fees to cover the increased IT spending.
Other airlines don't do this.
Customers start predominantly flying the other airlines with their cheaper fares.
Our example airline goes out of business, or gets acquired by one of the other airlines

The end result is all operating airlines are back to the prior stance.

[–] [email protected] 6 points 3 months ago (1 children)

Two big assumptions here.

First, multiple business systems are already being supported, and the OS only incidentally. Assuming double or triple IT costs is very unlikely, but feel free to post evidence to the contrary.

Second, a tight coupling between costs and prices. Anyone that's been paying attention to gouging and shrinkflation of the past few years of record profits, or the doomsaying virtually anywhere the minimum wage has increased and businesses haven't been annihilated, would know this is nonsense.

[–] [email protected] 1 points 3 months ago (3 children)

First, multiple business systems are already being supported, and the OS only incidentally. Assuming double or triple IT costs is very unlikely, but feel free to post evidence to the contrary.

The suggestion the poster made was that ALL 3rd party services need to have an additional counterpart for redundancy. So we're not just talking about a second AV vendor. We have to duplicate ALL 3rd party services running on or supporting critical workloads to meet what that poster is suggesting.

inventory agents
OS patching
security vulnerability scanning
file and DB level backup
monitoring and alerting
remote access management
PAM management
secrets management
config managment

....the list goes on.

Anyone that’s been paying attention to gouging and shrinkflation of the past few years of record profits, or the doomsaying virtually anywhere the minimum wage has increased and businesses haven’t been annihilated, would know this is nonsense.

You're suggesting the companies simply take less profits? Those company's board of directors will get annihilated by shareholders. The board would be voted out with their IT improvement plans, and replace with those that would return to profitability.

[–] [email protected] 2 points 3 months ago (1 children)

Which of the things you listed have kernel-level access?

[–] [email protected] 1 points 3 months ago

Which of the things you listed have kernel-level access?

Kernel level access isn't a requirement the poster @[email protected] placed on their suggestion that all 3rd party services should have at least one duplicate 3rd party service serving each function.

[–] [email protected] 2 points 3 months ago (1 children)

Even load-balancing multiple servers in a homogenous network, where patches are only deployed in phases is better (and a best practice) than what, to outside observers, appears to have been everything going down due to a mass update everywhere, all at once.

[–] [email protected] 1 points 3 months ago* (last edited 3 months ago)

Even load-balancing multiple servers in a homogenous network, where patches are only deployed in phases is better (and a best practice) than what, to outside observers, appears to have been everything going down due to a mass update everywhere, all at once.

This is where reason gets subjective. If you're solving for resiliency against a bad patch, then absolutely, do a small test deployment before pushing everywhere. This is a balance that whatever is being patched is less of a risk than the patch itself.

However, look at what is being patched in this case: AV/malware protection. In this case, you're knowingly leaving large portions of your fleet open to known, documented, and in-the-wild, vulnerabilities. In the past 10 years we've seen headlines littered with large organizations being downed by cryptolocker style malware. Only doing a partial deployment of this AV/malware protection means you're intentionally leaving yourself open to the latest and greatest crytolocker (among other things). This is a balance where the risk of whatever being patched is more of a risk than the patch itself.

Seeing as we've only really had this AV/malware scanner problem hit the headlines in the last 10 or 15 years, and cryptolocker/malware nearly monthly for the last 10 to 15 years, it would appear on the surface that pushing the patches immediately actually the better idea.

[–] [email protected] 2 points 3 months ago (1 children)

And yes, taking less profits to distinguish your product as a prestige brand is fairly common.

[–] [email protected] 1 points 3 months ago

And yes, taking less profits to distinguish your product as a prestige brand is fairly common.

In luxury goods, absolutely. In commodity goods, not so much. The airlines that had the nationwide disruptions are most certainly commodity.

[–] [email protected] 1 points 3 months ago (1 children)

customers start predominantly flying the other airlines with cheaper fares

I was with you till this part, except with the way flying is set up in this country, there's very little competition between airlines. They've essentially set themselves up with airports/hubs so if an airline is down for a day, that's kinda it unless you want to switch to a different airport.

[–] [email protected] 1 points 3 months ago

In the USA besides very small cities, this isn't my experience. My flights out of my home airport are spread across 5 or 6 airlines. My city doesn't even break into the top ten largest in the nation. As far as domestic destinations, There are usually 3 to 5 airlines available as choices.

[–] [email protected] 1 points 3 months ago

There is an argument to be made that they IT team and infrastructure isn't supposed to be an ongoing expense or revenue generation. It's insurance against catastrophe. And if you wanna pivot to something profit generating then you can reassign them to improve UX or other client impacting things that can result in revenue gain. For example notification systems for flight delays are absolute garbage IMO. I land, I check in my flights app and it doesn't show any changes to when my flight is departing, I load google and those changes are right there. Or they could add maps for every airport they operate a flight from to their apps. They could streamline the process for booking a replacement flight when your incoming flight is delayed or you missed a connecting flight (i had to walk up to a desk, wait in a queue with dozens of other people for half an hour just to be stampped with a new boarding pass and moved along). They could add an actual notification system for when boarding starts (my turkish air flight at one airport didnt have an intercom so i didnt know it was boarding and missed the fligbt). All of these are just examples but my point is theres an inherent shortsightedness in assuming an investment in IT, especially for a company that deals primairly with interconnectivity, is wasted. This is the reason everything is so sh*tty for users. Companies prefer minimising costs to maximising value to the user even if the latter can generate long term revenue and increase user retention.

[–] [email protected] 0 points 3 months ago (1 children)

Our example airline has to increase ticket prices or fees to cover the increased IT spending.

Or they could just cut already excessive executive bonuses...

[–] [email protected] 1 points 3 months ago (1 children)

You know they're not going to do that, so how useful is it to suggest that? If we just want to talk about pie-in-the-sky fixes then sure, but at the end of that we'll likely have nationalized airlines, which that isn't happening either.

So are we talking about fantasy or things that can actually happen?

[–] [email protected] 2 points 3 months ago (1 children)

No, we're talking about things that should happen and things that should be called out every time.

Not just throwing up our hands and going "welp, they won't willingly do it so there's nothing we can do" like you seem to be doing.

[–] [email protected] 1 points 3 months ago

Not just throwing up our hands and going “welp, they won’t willingly do it

This is what I'm doing.

so there’s nothing we can do” like you seem to be doing.

This is NOT what I'm doing. Just because I don't think the suggested approach is viable doesn't mean that NO approach is viable.

[–] [email protected] 6 points 3 months ago (1 children)

In this case, it's a local third party tool and they thought they could control to cadence of updates. There was no reason to think there was anything particularly unstable about the situation.

This is closer to saying that half of your servers should be Linux and half should be windows in case one has a bug.

Crowdstrike bypassed user controls on updates.
The normal responsible course of action is to deploy an update to a small test environment, test to make sure it doesn't break anything, and then slowly deploy it to more places while watching for unexpected errors.
Crowdstrike shotgunned it to every system at once without monitoring, with grossly inadequate testing, and entirely bypassed any user configurable setting to avoid or opt out of the update.

I was much more willing to put the blame on the organizers that had the outages for failing to follow best practices before I learned that they way the update was pushed would have entirely bypassed any of those safeguards.

It's unreasonable to say that an organization needs to run multiple copies of every service with different fundamental infrastructure choices for each in case one magics itself broken.

[–] [email protected] 6 points 3 months ago

Crowdstrike also bypassed Microsoft's driver signing as part of their update process, just to make the updates release faster.

That MS is getting any flak for this is just shit journalism.