AWS re:Invent 2018 – Latest Announcements – Part II

You might have already seen Part I of this post. Here are the rest of the announcements:

Storage

Intelligent Tiering and Cost Optimisation for Amazon S3

For many years, we’ve been used to storage tiering and in general, the idea is that hot data is placed on faster and more accessible storage and data that is infrequently accessed or not accessed at all, is moved to cheaper storage. The concept remains the same for cloud storage and for that reason, every cloud provider provides the different classes of storage.

For S3, the different storage classes are available but one has to define lifecycle management policies to move data between them. Intelligent Tiering is now available to automatically move infrequently accessed data to, guess what, the Standard-IA (Standard Infrequent Access) S3 class, thereby costing you less.

There are some caveats though. Currently, it’s just from Standard to Standard-IA i.e. not to One Zone-IA or Glacier, latter of which would have been really useful. There also is a small charge for this automation but should still prove to be economical if there’s a good amount of data stored. In addition, objects smaller than 128KB will not be moved so not really suitable for data comprised of lots of small files.

For more information, read: Automatic Cost Optimization for Amazon S3 via Intelligent Tiering

Amazon S3 and Glacier Archival Enhancements

While on the subject of storage, AWS has introduced several enhancements today to their offering. My favourite of those is Amazon S3 Object Lock, which allows prevention of data deletion for a certain period of time and for any of the S3 storage classes.

This is of particular importance to compliance use-cases. The lock can be enabled in “Governance” or “Compliance” mode. In the case of former, one can define certain user accounts that can remove the lock if/when required but for the latter, it’s stored without any change access to anyone, even the root account!

That’s perfect for long-term compliance and retention requirements and will avoid a lot of hoops one had to go through to secure such data on public cloud storage.

In addition to this, there’s “S3 Restore Speed Upgrade” that one can use to expedite restore speeds and “S3 Restore Notifications”, which as the name implies, allows you to be notified when your restore job has finished.

For more information, read: Amazon S3 & Amazon S3 Glacier Launch Announcements for Archival Workloads

AWS Transfer for SFTP

We all know that SFTP is widely used to transfer data from one place to another and especially to cloud storage. Today, AWS is launching AWS Transfer for SFTP to have a fully-managed and highly-available SFTP service available for you to consume for such operations.

The great thing about this is that you get fine-grained control over permissions and policies. Also, as the data goes to S3, you can take advantage of all the capabilities that S3 has to offer in terms of availability, durability and lifecycle management.

In addition, you can also implement the same automation techniques that are available to S3 users e.g. triggering a Lambda function when an object lands and transform it into something else or serve it to another process etc.

In terms of pricing, you pay a per-hour fee for each running server and per-GB data transfer fee (both upload and download).

For more information, read: AWS Transfer for SFTP – Fully Managed SFTP Service for Amazon S3

AWS Data Sync

There are instances where the transfer of data to AWS using SnowBall or even SnowMobile isn’t appropriate i.e. where the rate of change of data is very high, rendering the data transfer useless by the time it gets to AWS. This is especially true for sectors where data is generated on-premises and quite frequently but processed in the cloud.

For such use cases, AWS has introduced AWS Data Sync. It’s a managed service and operates using a highly-efficient and purpose-built data transfer protocol. To use it, one would install a VM on-premises, install an agent on it and then send the data to AWS Data Sync.

The data can go to S3 or EFS (Elastic File System) and can be transferred via the Internet or Direct Connect. It should also be noted that the reverse is also true i.e. you can send data from AWS to on-premises in the same way too.

Things to consider is that being highly-efficient, it will need throttling (and options are available) as if data is to be sent, it will go full blast and can quite easily saturate the link.

For more information, read: AWS DataSync – Automated and Accelerated Data Transfer

In addition to the above, SnowBall Edge is now available with more compute and GPU and Amazon EBS Doubles the Maximum Performance of Provisioned IOPS SSD

IoT

There were a few limited availability and preview announcements on the IoT front too that I am listing below for reference:

AWS IoT Things Graph

AWS IoT Things Graph is a service that makes it easy to connect various devices to web services. The key thing here is that the interface is easy and visual, that makes working with it a drag-and-drop affair and removes the need for complex code to perform standard connectivity tasks.

AWS is targeting Home/Industrial Automation and Energy Management as use cases. They are the obvious ones by there might be more going forward.

Currently, it’s only available in preview so one can sign up for it, if interested.

AWS IoT SiteWise

Talking transfer of data at a larger scale from your IoT devices, IoT Site Wise is a managed service that is aimed towards industrial scale IoT data collection and organisation.

It works by having a gateway deployed on-premises, which collects and sends the data to AWS. One can use it on a SnowBall device or install it on a third-party gateway, as they’re typically designed for such functionality.

Such gateways can be deployed at each on-premises location so that data can be sent to a central location i.e. IoT SiteWise where common industrial performance metrics can be created and acted upon, if and when required.

The service is currently in “limited preview” and I’d imagine just to environments that can make use of it.

HPC/ML

Dynamic Training for deep learning with Amazon EC2

As we all know, deep learning is an extremely heavy workload, can run for days (or weeks, depending on the complexity) and require many repetitions of learning cycles. For that reason, it’s typically run on a number of nodes i.e. distributed training, all working together to reduce training time.

The idea for Dynamic Training is to introduce elasticity in the process so that the training cluster can grow and shrink as required, thereby reducing costs in times where intensive runs are not required. It can also take advantage of Spot Instances which can potentially reduce the cost of the same job significantly.

For more information, read: Introducing Dynamic Training for deep learning with Amazon EC2

Phew! That’s again a very sizable update so I hope you have a mug filled with Espresso shots. Hopefully, that’s enough to keep you going throughout the day and at least until the conference starts properly.

Hope this helps and happy reading!

AWS re:Invent 2018 – Latest Announcements – Part II

Storage

Intelligent Tiering and Cost Optimisation for Amazon S3