Three cloud management steps to keep your data safe

Mondo Technology Updated on 2024-01-29

Even some of the most obvious precautions have not received sufficient attention and priority in a time when engineering organizations are already overburdened.

Translated from 3 Steps Cloud Governance Steps to **oid the next hack by Cindy Blake is VP of Marketing at Firefly, a seed-funded startup that aims to solve many of the challenges of using and managing multi-cloud infrastructure in a DevOps environment. Blake is adept at leveraging AI and focusing marketing investments on effort. Cybercrime and hacking are always distressing, but they're even more sad when there's nothing new about them, simply because IT governance and security hygiene are constantly being put off.

While writing the book "10 Steps Every CISO Should Take to Secure Next-Generation Software," I learned that even some of the most obvious precautions aren't given enough attention and priority when it comes to an already overloaded engineering organization. Prioritization is often hampered by a lack of understanding between cross-functions and roles: security teams don't understand the impact of changes in how cloud applications are developed, deployed, and maintained on risk; DevOps teams also don't understand how their actions inject or create additional risk. Although it has been several years since the book was published, these dysfunctions still exist.

The MGM hack has taught us that some well-known tactics, such as social engineering as a means of gaining privileged access, are still the ones that work. If successful, they will tragically continue to pave the way for malicious entities to reap significant gains. With the benefit of hindsight, it's easy to say, "You should have been." It's easy, too, but there's always a benefit to re-emphasizing and reiterating some best practices, especially when it comes to IT healthcare and governance. Refresh the basics and remind yourself of some of the more obscure to-dos that will hopefully help prevent the next hack.

We've come a long way when it comes to cloud governance, gitops, and cloud security. Today, with a combination of automation, policy-as-a-policy, and improved visibility, you can get accurate information in real time that can help you detect and remediate potential risks. As your attack surface grows and evolves, minimizing risk with a simple cordon can be the difference between a quick recovery and costly and long downtime. Below, we'll share some of the best practices that every IT, DevOps, SRE, and security engineer should adopt right now to help align the cloud with the "coding requirements" for better security and reliability. Let's start with immutability - the concept is not new, and has become a standard best practice through tools like Terraform and infrastructure that have "invariant" into our systems. Immutability provides security that ensures that configuration cannot be changed without intervention or a single entity – whether external malice or internal ignorance.

This has been built into DevOps as a standard, primarily to prevent production incidents and downtime, but as a byproduct, it also provides the added benefit of security, ensuring that no one can hack into your cloud-based system and make undetected changes, and also ensuring that no beginner can accidentally delete a production environment without a recovery path.

Further driving this is coding the cordon of cost, reliability, and security into policies, and then automating those governances.

In another painful lesson learned, we all know that without automation, it wouldn't happen. Patches are a good example.

For years, Fortify's (my previous employer) annual threat report cited failed patching as the single biggest threat. Now, in 2023, in its annual report, the 2023 M-Trends Report, Mandiant explains why patches and vulnerabilities are still leading to global events that exploit common vulnerabilities:"While system administrators need time to test and validate patches, threat actors only need a basic proof-of-concept (PoC)** overlay to start targeting those organizations. "At a security conference in 2018, I announced that with cloud adoption and the use of DevOps tools, I expected misconfigurations to go hand in hand with the threat of patch failures.

I think we've gotten this far! In their M-Trends 2023 report, Mandiant states:"Multi-layered identity management and application deployment brings new verticals that must be secured for customer environments. "The report also says:"As the implementation and design phases of a cloud service migration encounter the realities of business operations, configuration errors are not uncommon. Organizations should consider testing their cloud architecture deployments to improve resiliency against agile, motivated adversaries. "Policies are the ones that enable you to automate the good practices and controls that apply to the norms and repetition of cloud systems to ensure that even if something does change, you can detect cloud drift and policy violations continuously and in real-time, and you can deal with them immediately and decisively. While software bills of materials (SBOMs) and securing your software**chain have been the latest buzzwords and hype, few have really focused on the equally important infrastructure bill of materials (IBOM). Many people know that applications contain dependencies – modules for specific sub-functions, often written by third parties and/or open sources.

Similarly, the cloud infrastructure that cloud-native applications rely on is made up of dependencies—sub-functions that define and configure specific resources used by a particular environment. As an example, consider an EC2 instance whose dependencies might include network interfaces and EBS volumes. Dependencies can extend several layers.

Now that I think about it, I might use a terraform module to manage it. This image depicts the actual relationship between cloud resources.

If a developer changes the state of Hashicorp's Terraform, or a cloud engineer changes an element within the structure of a cloud resource, we now have a disconnect between what we think is configured (Terraform) and what is actually configured (Cloud Resources).

This is known as configuration drift and is very important to manage. In the 2023 Infrastructure as a Report, we found that most people identify this drift manually, and it can take weeks to resolve it. Going back to misconfigurations goes hand in hand with patching failures, it's a bit like keeping the system unpatched and fragile for weeks.

A comprehensive inventory and health check of your cloud infrastructure stack is the backbone of a good audit.

Emerging organizations that are "inherently cloud-based" can learn from those with legacy Xi on-premises IT systems: you need to start by understanding the assets you own, and then you can gain visibility into change history, version control, and management.

For on-premises IT systems, this practice uses a CMDB (Configuration Management Database) tool such as ServiceNow to catalog your assets and IT asset management to manage changes to them.

These tools, developed when you have to swipe your employee card into the data center to change hardware configurations, often struggle to maintain accurate accounting of your ever-changing cloud assets.

When you code all your cloud resources and automatically detect drift, you can apply versioning and history management to your infrastructure just like an app. You can monitor when assets were changed, where they were changed, and by whom, and then roll them back to a previous version if necessary. In addition to being able to see when changes have occurred to cloud assets that are managed in a way and rolling them back like rolling back bad commits, this also provides the additional and potentially more important benefit of disaster recovery.

In order to be able to recover from the MGM attack, Resort Chain suffered a $100 million ransomware attack and one of the toughest decisions it had to make was to delete critical assets that were not backed up. One of the applications exploited by the hackers was OKTA, which allowed them to end up gaining access to the company's servers and launch a very painful denial-of-service attack on a number of business-critical applications.

We've historically talked about the need to code business-critical cloud assets, and the need to code all SaaS application configurations, and that includes identity management services like OKTA.

If the OKTA configuration was also managed in a state-of-the-art manner and the same versioning practices were applied to this critical single sign-on service, the mean recovery time for this breach would have been greatly reduced, and business disruption and loss would have been minimized. The same is true if an IT administrator accidentally deletes an important system configuration, or if the software is extorted due to data corruption or even software (as happened to Caesars Resort a week before the MGM attack). This is the inherent benefit of managing all your cloud assets in a way that is also known as everything is a matter of what, with the added benefits of automation, consistent deployment, and auditability.

OKTA isn't the only SaaS application you should manage, and all SaaS applications—from monitoring tools to application performance management (APM), identity and access management (IAM) tools, and content delivery networks (CDNs)—should be incorporated into a strategy that manages their configuration in a way that reaps the benefits of disaster recovery.

There are a number of tools (including Firefly) that scan your cloud, find those resources, and automatically import them into your infrastructure, such as Terraform, Pulumi, or CDK, and it can serve as a quick backup service for important applications like Cloudflare, Datadog, and your Git repository. If your software repository is corrupted or ransom, how long will it take you to recover without these backups?

A typical cloud infrastructure also includes many other security settings that need to be coded and governed. For example, consider security groups. Security groups act as firewalls that control the traffic allowed into and out of resources in your virtual private cloud (VPC). You can choose the ports and protocols that allow inbound and outbound traffic. Several cloud services rely on security groups, including: Amazon EC2 instances.

aws lambda

AWS Elastic Load Balancing.

Containers and Kubernetes services (ECS and EKS).

If the security group settings change, you can imagine the possible consequences. Capturing this important resource as infrastructure, and then managing its changes and policy alignment, is an important aspect of securing your organization. Cloud governance isn't uncharted territory. Although it does require some domain expertise, many of these practices are now well established and widely understood.

Without simple recommended best practices for IT and cloud environments, you're jeopardizing sensitive customer information and business-critical systems.

As cloud utilization grows, it's imperative to better cover the fundamentals of IT governance and security for our cloud infrastructure. Let's not close the old gaps, it's time for our engineers to focus on more novel and emerging threats.

Related Pages