saint

joined 2 years ago
MODERATOR OF
 

Good slides on how to reduce risks

[–] [email protected] 1 points 3 days ago

Būtų įdomu paskaityt tai kas ten iš tiesų įvyko ir kaip buvo tvarkoma, bet turbūt Cloudflare lygio post-mortem analizės tikėtis neverta.

[–] [email protected] 1 points 3 days ago

What about it? ;)

 

Highlights

In analyzing 138 actively exploited vulnerabilities in 2023, Google Mandiant reported Oct. 15 that 70% of them were zero-days, indicating that threat actors are getting much better at identifying vulnerabilities in software.

It’s a worrying trend in and of itself, but what caused even more concern among security analysts was that Google Mandiant also found that the time-to-exploit (TTE) — the time it takes threat actors to exploit a flaw — was down to a mere five days in 2023 compared with 63 days in 2018-19 and 32 days in 2021-22.

 

Will be interesting to see how it works out

The Indian nonprofit People+ai wants to fix this by creating an open and interoperable marketplace of cloud providers of all sizes. The Open Cloud Compute (OCC) project plans to use open protocols and standards to allow cloud providers of all sizes to offer their services on the network. It also plans to make it easy for customers to shift between offerings depending on their needs. People+ai held a hackathon on 20 September at People’s Education Society University (PES University) in Bengaluru to test out an early prototype of the platform.

 

Highlights

Failure is an expected state in production systems, and no predictable failure of either software or hardware components should result in a negative experience for users. The exact failure mode may vary, but certain remediation steps must be taken after detection. A common example is when an error occurs on a server, rendering it unfit for production workloads, and requiring action to recover.

It can be tempting to rely on the expertise of world-class engineers to remediate these faults, but this would be manual, repetitive, unlikely to produce enduring value, and not scaling.

The commonality of lower-priority failures makes it obvious when the response required, as defined in runbooks, is “toilsome”. To reduce this toil, we had previously implemented a plethora of solutions to automate runbook actions such as manually-invoked shell scripts, cron jobs, and ad-hoc software services. These had grown organically over time and provided solutions on a case-by-case basis, which led to duplication of work, tight coupling, and lack of context awareness across the solutions.

A good solution would not allow only the SRE team to auto-remediate, it would empower the entire company. The key to adding self-healing capability was a generic interface for all teams to self-service and quickly remediate failures at various levels: machine, service, network, or dependencies.

Temporal is a durable execution platform which is useful to gracefully manage infrastructure failures such as network outages and transient failures in external service endpoints. This capability meant we only needed to build a way to schedule “workflow” tasks and have Temporal provide reliability guarantees.

After a workflow is validated in the staging environment, we can then do a full release to production. It seems obvious, but catching simple configuration errors before releasing has saved us many hours in development/change-related-task time.

Building a system that is maintained by several SRE teams has allowed us to iterate faster, and rapidly tackle long-standing problems. We have set ambitious goals regarding toil elimination and are on course to achieve them, which will allow us to scale faster by eliminating the human bottleneck.

 

Resurfaced in my feed. Obvious in retrospect.

1
2024 Conference (www.remoteworkconference.org)
 

Some interesting research

 

We, humanz, are very good in creating SPOFs

 

Researcher Christina Bodin Danielsson calls open office landscapes a “sea of ​​slaves.”

^^ more like tin can :)

 

Highlights

Iran’s multifaceted approach in the cyber domain allows Iran to project power and influence in the Middle East while avoiding direct conventional military confrontations with stronger adversaries. Iran uses cyber operations to complement its broader geopolitical strategies, often employing cyber espionage and sabotage to gain strategic advantages or to retaliate against sanctions and military threats. As Iran increasingly incorporates AI technologies into its cyber operations, the likelihood of more disruptive and damaging activities escalates, presenting a substantial challenge not only to regional stability but also to global security.

Maj. Gen. Qassem Soleimani’s death marked a significant turning point in Iran’s cyber strategy, pushing Tehran to assert its power and influence through increased cyber activities aimed at the U.S. and its allies

Cyber proxy groups use various tactics to create negative psychological effects among adversaries. APTs such as Mint Sandstorm use precise targeting to create unease among a specific group of people. Iran also uses “faketivists,” which are groups that commit cyberattacks for a specific cause, like hacktivists, but are borne from a specific geopolitical event and are created by a nation-state to perpetuate narratives that support their cause. Faketivists can be nation-state actors and/or proxy groups associated with the IRGC and the Ministry of Intelligence and Security (MOIS). The cyberattacks in Israel that have deployed faketivists have had mixed success, but they have garnered both local and global support. The purpose of these groups is to spread their “success” and to create disruption and attention, regardless of actual operational success.

Looking ahead, we can expect Iran to further integrate AI into its cyber strategy, escalating the frequency and sophistication of attacks, particularly on critical infrastructure and democratic processes. Additionally, the growing alignment between Iran and other global cyber powers, such as Russia and China, further increases the sophistication and reach of its cyber capabilities, presenting significant challenges for those attempting to counter these evolving threats.

[–] [email protected] 1 points 1 month ago

Not anymore, nowadays, I feel guilty reading non-fiction and understand Lindy effect on books much better (be it fiction or non-fiction).

[–] [email protected] 26 points 1 month ago

They cut all such scenes and pasted into The Boys, in a Mark Twain style “Sprinkle these around as you see fit!”.

 

A fellow Matrix user has reported that matrix.group.lt has stopped showing Youtube URL previews and suggested that according to GitHub - the issue lies with Synapse server software itself. So now I:

in vars.yaml

matrix_synapse_configuration_extension_yaml: |
  oembed:
    disable_default_providers: true
    additional_providers:
     - /oembedproviders.json
  other_custom_config_blocks_for_homeserver

matrix_synapse_container_additional_volumes: 
  - {"src": "/matrix/oembedproviders.json", "dst": "/oembedproviders.json", "options": "ro"}  

you need to upload oembedproviders.json to the server first and then regenerate Synapse configs and restart it:

ansible-playbook -i inventory/hosts setup.yml --tags=setup-synapse,restart-all

Thank you, Citizen Laszlo for not ignoring the issue.

 

Many microbes and cells are in deep sleep, waiting for the right moment to activate.

Harsh conditions like lack of food or cold weather can appear out of nowhere. In these dire straits, rather than keel over and die, many organisms have mastered the art of dormancy. They slow down their activity and metabolism. Then, w

Sitting around in a dormant state is actually the norm for the majority of life on Earth: By some estimates, 60% of all microbial cells are hibernating at any given time. Even in organisms whose entire bodies do not go dormant, like most mammals, some cellular populations within them rest and wait for the best time to activate.

“Life is mainly about being asleep.”

Because dormancy can be triggered by a variety of conditions, including starvation and drought, the scientists pursue this research with a practical goal in mind: “We can probably use this knowledge in order to engineer organisms that can tolerate warmer climates,” Melnikov said, “and therefore withstand climate change.”

Balon is notably absent from Escherichia coli and Staphylococcus aureus, the two most commonly studied bacteria and the most widely used models for cellular dormancy. By focusing on just a few lab organisms, scientists had missed a widespread hibernation tactic, Helena-Bueno said. “I tried to look into an under-studied corner of nature and happened to find something.”

“Most microbes are starving,” said Ashley Shade, a microbiologist at the University of Lyon who was not involved in the new study. “They’re existing in a state of want. They’re not doubling. They’re not living their best life.”

“This is not something that’s unique to bacteria or archaea,” Lennon said. “Every organism in the tree of life has a way of achieving this strategy. They can pause their metabolism.”

“Before the invention of hibernation, the only way to live was to keep growing without interruptions,” Melnikov said. “Putting life on pause is a luxury.”

It’s also a type of population-level insurance. Some cells pursue dormancy by detecting environmental changes and responding accordingly. However, many bacteria use a stochastic strategy. “In randomly fluctuating environments, if you don’t go into dormancy sometimes, there’s a chance that the whole population will go extinct” through random encounters with disaster, Lennon said. In even the healthiest, happiest, fastest-growing cultures of E. coli, between 5% and 10% of the cells will nevertheless be dormant. They are the designated survivors who will live should something happen to their more active, vulnerable cousins.

More fundamentally, Melnikov and Helena-Bueno hope that the discovery of Balon and its ubiquity will help people reframe what is important in life. We all frequently go dormant, and many of us quite enjoy it. “We spend one-third of our life asleep, but we don’t talk about it at all,” Melnikov said. Instead of complaining about what we’re missing when we’re asleep, maybe we can experience it as a process that connects us to all life on Earth, including microbes sleeping deep in the Arctic permafrost.

[–] [email protected] 2 points 1 month ago

I liked the book as well. The show had some similar feeling in some ways, but also had a distinct character for itself.

[–] [email protected] 1 points 5 months ago

Reread today again, with some highlights:

Lessons Learned from Twenty Years of Site Reliability Engineering

Metadata

Highlights

The riskiness of a mitigation should scale with the severity of the outage

We, here in SRE, have had some interesting experiences in choosing a mitigation with more risks than the outage it's meant to resolve.

We learned the hard way that during an incident, we should monitor and evaluate the severity of the situation and choose a mitigation path whose riskiness is appropriate for that severity.

Recovery mechanisms should be fully tested before an emergency

An emergency fire evacuation in a tall city building is a terrible opportunity to use a ladder for the first time.

Testing recovery mechanisms has a fun side effect of reducing the risk of performing some of these actions. Since this messy outage, we've doubled down on testing.

We were pretty sure that it would not lead to anything bad. But pretty sure is not 100% sure.

A "Big Red Button" is a unique but highly practical safety feature: it should kick off a simple, easy-to-trigger action that reverts whatever triggered the undesirable state to (ideally) shut down whatever's happening.

Unit tests alone are not enough - integration testing is also needed

This lesson was learned during a Calendar outage in which our testing didn't follow the same path as real use, resulting in plenty of testing... that didn't help us assess how a change would perform in reality.

Teams were expecting to be able to use Google Hangouts and Google Meet to manage the incident. But when 350M users were logged out of their devices and services... relying on these Google services was, in retrospect, kind of a bad call.

It's easy to think of availability as either "fully up" or "fully down" ... but being able to offer a continuous minimum functionality with a degraded performance mode helps to offer a more consistent user experience.

This next lesson is a recommendation to ensure that your last-line-of-defense system works as expected in extreme scenarios, such as natural disasters or cyber attacks, that result in loss of productivity or service availability.

A useful activity can also be sitting your team down and working through how some of these scenarios could theoretically play out—tabletop game style. This can also be a fun opportunity to explore those terrifying "What Ifs", for example, "What if part of your network connectivity gets shut down unexpectedly?".

In such instances, you can reduce your mean time to resolution (MTTR), by automating mitigating measures done by hand. If there's a clear signal that a particular failure is occurring, then why can't that mitigation be kicked off in an automated way? Sometimes it is better to use an automated mitigation first and save the root-causing for after user impact has been avoided.

Having long delays between rollouts, especially in complex, multiple component systems, makes it extremely difficult to reason out the safety of a particular change. Frequent rollouts—with the proper testing in place— lead to fewer surprises from this class of failure.

Having only one particular model of device to perform a critical function can make for simpler operations and maintenance. However, it means that if that model turns out to have a problem, that critical function is no longer being performed.

Latent bugs in critical infrastructure can lurk undetected until a seemingly innocuous event triggers them. Maintaining a diverse infrastructure, while incurring costs of its own, can mean the difference between a troublesome outage and a total one.

[–] [email protected] 2 points 6 months ago

not a bug, but a feature :))

[–] [email protected] -2 points 7 months ago

a source code of a game ;))

[–] [email protected] 3 points 8 months ago (1 children)

thank you, actually it seems that it is https://en.m.wikipedia.org/wiki/The_Sliced-Crosswise_Only-On-Tuesday_World , which has inspired Dayworld :)

[–] [email protected] 2 points 8 months ago

looks interesting, but not this one.

view more: next ›