Migrating from public cloud object storage to back to privatization Everything you need to know abou

Mondo Finance Updated on 2024-02-01

The response to our last article, "How to Repatriate from AWS S3 to Minio," has been fantastic—we've received dozens of requests from organizations asking for repatriation advice. We've put together these responses in this new article, in which we take a deeper look at the costs and savings associated with repatriation to make it easier for you to do your own analysis. For many, data migration is a daunting task. In practice, their goal is to bring new data into Minio and use their sweet time to migrate old data from the cloud or leave it in place without growing.

To send data back from AWS S3, you will follow these general guidelines:

Review data requirements: Identify the specific buckets and objects that need to be returned from AWS S3. Make sure you understand your business needs and compliance requirements on a bucket-by-bucket basis.

Determine your repatriation destination: Now that you've decided to repatriate to Minio, you now have the option to run Minio in an on-premises data center or another cloud provider or colocation facility. With the requirements in 1, you'll choose either hardware or instances to meet your storage, transport, and availability needs.

Data transfer: Plan and execute a data transfer from AWS S3 to Minio. Simply use Minio's built-in bulk replication or use the Minio client for mirroring (see How to repatriate from AWS S3 to Minio for more information). There are several other methods you can use for data transfer, such as using AWS DataSync, AWS Snowball, or TD Synnex data migration, or directly using AWS APIs.

Data access and permissions: Ensure that appropriate access controls and permissions are set for each bucket's returned data. This includes IAM and bucket policies to manage user access, authentication, and authorization to ensure the security of your data.

Object Lock: It's critical to retain object lock retention and legal hold policies after migration. The target object store must interpret the rules in the same way as Amazon S3. If you are unsure, request a Cohasset Associates compliance assessment for the target object storage implementation.

Data lifecycle management: Define and enforce data lifecycle management policies for returned data. This includes defining retention policies, backup and recovery procedures, and per-bucket data archiving practices.

Data Validation: Verify the transmitted data to ensure its integrity and integrity. Perform the necessary checks and tests to ensure that the data has been successfully transferred without any corruption or loss. Once transferred, the object name, etag and metadata, checksums, and number of objects between the source and destination are matched.

Update applications and workflows: The good news is that if you're building your apps following cloud-native principles, all you have to do is reconfigure them for the new Minio endpoints. However, if your applications and workflows are designed to work with the AWS ecosystem, make the necessary updates to accommodate the data returned. This may involve updating configurations, reconfiguring integrations, or modifying in some cases**.

Monitoring and optimization: Continuously monitor and optimize the repatriated data environment to ensure optimal performance, cost-effectiveness, and adherence to data management best practices.

There are many factors to consider when budgeting and planning for cloud repatriation. Luckily, our engineers have worked with many customers, and we have made a detailed plan for you. Our customers have repatriated everything from a handful of workloads to hundreds of petabytes.

The biggest planning task is to consider options around networking, renting bandwidth, server hardware, archiving costs for data that is not selected for repatriation, and the labor costs of managing and maintaining your own cloud infrastructure. Estimate these costs and develop a plan for them. Cloud repatriation costs will include data egress charges for moving data from the cloud back to the data center. These fees are deliberately high enough to force a cloud lock-in. Be aware of these high egress fees - they confirm the economic argument for leaving the public cloud, as egress fees increase as the amount of data you manage grows. Therefore, if you are going to be repatriated, it pays to act as early as possible.

We'll focus on the data and metadata that has to be moved – that's 80% of the work required for repatriation. Metadata includes bucket properties and policies (access management based on access private keys, lifecycle management, encryption, anonymous public access, object locking, and versioning).

Now let's focus on the data (objects). For each namespace that you want to migrate, inventory the buckets and objects that you want to move. Your DevOps team probably already knows which buckets contain important current data. You can also use Amazon S3 manifests. At a high level, this will look like this:

The next step is to list the properties of each bucket and each bucket it is migrating by namespace. Note the applications that store and read data in that bucket. Categorize each bucket as hot, warm, or cold tier data based on usage.

In the abridged version, this looks like.

At this point, you need to make some decisions about data lifecycle management, and keep an eye on it, as it's a great way to save money on AWS. Classify the objects in each bucket as hot, warm, or cold based on the frequency of access. A great place to save money is to migrate your cold-tier buckets directly to S3 Glacier, and there's no reason to incur egress fees just to upload again.

Depending on the amount of data you want to repatriate, you have a few options to choose how you want to migrate. We recommend that you load and process new data on the new minio cluster, while copying hot and warm data to the new cluster over time. Of course, the time and bandwidth required to replicate objects will depend on the number and size of objects to be replicated.

Here, calculating the total data to be passed back from AWS S3 will be very helpful. Review your inventory and calculate the total size of all the kegs categorized as hot and warm.

The data export fee is calculated based on the above total. I'm using the list price, but your organization may be eligible for a discount on AWS. I also use 10 Gbps for the connection bandwidth, but you may be more or less usable. In the end, my assumption is that one-third of S3 data will be transferred to the S3 Glacier Deep Archive only.

Don't forget to budget for your S3 Glacier Deep Archive usage.

For the sake of simplicity, the above calculations do not include the cost of each object operation (0.).$40 for $1 million) and does not include the cost of the listing ($5 for $1 million). For very large repatriation projects, we can also compress the object before sending it over the network, saving you some export fees.

Another option is to use AWS Snowball to transfer objects. Each Snowball device is 80TB, so we knew in advance that we would need 20 devices for the repatriation effort. The cost of each device includes 10 days of use, plus 2 days of shipping. Additional days are $30 per device.

AWS will charge you standard request, storage, and data transfer rates for reading and writing to AWS services, including Amazon S3 and AWS Key Management Service (KMS). There are additional considerations when using Amazon S3 storage classes. For S3 export jobs, data transferred from S3 to the Snow Family device is billed at standard S3 charges for operations such as list, get, and so on. You will also be charged standard rates for Amazon CloudWatch Logs, Amazon CloudWatch Metrics, and Amazon CloudWatch Events.

Now we know how long it takes and how much it will cost to migrate such a huge amount of data. Make business decisions about which method meets your needs based on a combination of time and expense.

At this point, we also know the hardware requirements required to run the minio locally or in a colocation facility. According to 15PB storage requirements, estimate data growth, and check out our recommended hardware and configuration page to choose the best hardware for your minio deployment.

The first step is to recreate the S3 bucket in Minio. You must do so regardless of how you choose to migrate objects. While both S3 and Minio use server-side encryption to store objects, you don't have to worry about migrating encryption keys. You can use Minio KES to connect to the KMS of your choice to manage encryption keys. This way, when you create an encrypted tenant and bucket in Minio, a new key will be automatically generated for you.

There are several options for copying objects: Batch Copy andmc mirror。My previous blog post, How to Repatriate from AWS S3 to Minio, contains detailed instructions for both approaches. You can copy objects directly from S3 to your local minio, or query S3 using a temporary minio cluster running on EC2 and then mirror it to your local minio.

Typically, customers use tools we write in conjunction with AWS Snowball or TD Synnex data migration hardware and services to move large amounts of data (more than 1 petabyte).

Minio recently partnered with Western Digital and TD Synnex to launch a Snowball alternative. Customers can arrange a window to receive Western Digital hardware and pay for what they need during the rental period. What's more, the service is cloud-agnostic, which means businesses can use the service to move data into, out, and across clouds, all using the ubiquitous S3 protocol. Additional details about the service can be found on the Data Migration Services page of TD Synnex**.

Can be usedget-bucket s3The API call reads the bucket metadata, including policies and bucket properties, and then sets it in Minio. When you sign up for the Minio Subnet, our engineers will work with you to migrate the following setups from AWS S3: access management based on access keys Private keys, lifecycle management policies, encryption, anonymous public access, immutability, and versioning. One thing about versioning is that AWS version IDs are typically not preserved when migrating data, as each version ID is an internal UUID. This is largely not a problem for customers, as objects are usually called by name. However, if you need an AWS version ID, then we have an extension that can keep it in Minio and help you enable it.

Pay special attention to IAM and bucket policies. S3 won't be the only part of your AWS infrastructure that you're leaving behind. When accessing the S3 bucket, you will have a large number of service accounts** to use with the program. This will be a good time to list and audit all service accounts. You can then decide whether or not to recreate them in your identity provider. If you choose automation, use Amazon Cognito to share IAM information with external OpenID Connect IDPs and AD LDAPs.

Pay special attention to data lifecycle management, such as object retention, object locking, and archiving tiering. Run one on each bucketget-bucket-lifecycle-configurationto get a human-readable list of lifecycle rules json. You can easily recreate your AWS S3 setup using the Minio console or the Minio client (MC). Use andget-object-legal-holdget-object-lock-configurationand other commands to pinpoint objects that require special security and governance treatment.

When we talk about the life cycle, let's talk about backup and disaster recovery first. Do you want to replicate to other minio clusters for backup and disaster recovery?

Once an object has been copied from AWS S3 to Minio, it is important to verify data integrity. The easiest way to do this is to use the minio client to run mc diff on the old bucket in S3 and the new bucket on minio. This calculates the difference between the buckets and returns only a list of missing or different objects. This command takes the parameters of the source and destination buckets. For convenience, you may want to create aliases for S3 and Minio so that you don't have to constantly type in full addresses and credentials. For example:

mc diff s3/bucket1 minio/bucket1 

The good news is that all you have to do is point your existing app to the new minio endpoint. Configurations can be overridden on an app-by-app basis over a period of time. Migrating data in object storage is less intrusive than the file system, as you can simply change the URL to read and write from the new cluster. Note that if you previously relied on AWS services to power your applications, those services will not be present in your data center, so you will have to replace them with their open-source equivalents and rewrite some**. For example, Athena can be replaced with Spark SQL, Apache Hive and Presto, Kinesis and Apache Kafka, and AWS Glue and Apache Airflow.

If your S3 migration is part of a larger effort to move your entire application on-premises, chances are you'll use S3 event notifications to call downstream services when new data arrives. If that's the case, fear not - minio supports event notifications too. The most straightforward migration here is to implement a custom webhook to receive notifications. However, if you need a more durable and resilient destination, use a messaging service like Kafka or RabbitMQ. We also support sending events to databases such as PostgreSQL and MySQL.

Now that you've completed your repatriation, it's time to turn your attention to storage operations, monitoring, and optimization. The good news is that minio doesn't need optimization – we've built optimization into the software, so you know you're getting the best performance out of your hardware. You'll need to start monitoring your new minio cluster to continuously assess resource utilization and performance. Minio exposes metrics through the Prometheus endpoint, which you can use in the monitoring and alerting platform of your choice. For more information about monitoring, see Multi-cloud monitoring and alerting with Prometheus and Grafana, and Metrics with Minio with OpenTelemetry, Flask, and Prometheus.

It's no secret that the days of writing blank checks to cloud providers are gone. Many businesses are currently evaluating their cloud spending to look for potential savings. Now you have everything you need to start migrating from AWS S3 to Minio, including specific technical steps and a financial framework.

Related Pages