Cohesity has now become a veteran of Tech Field Day events and knows exactly what/how to present to those delegates. While the presentation started with some marketing slides, they were mainly for the benefit of people not familiar with Cohesity already and didn’t last long.
The thing I liked most about the sessions, apart from the great discussions, was the well-structured live demos that Jon Hildebrand did. That’s brave but also engaging for techies and generates good interaction. Please do keep doing it, Cohesity!
Cohesity is a storage company, founded by Mohit Aron in 2013, who is also the co-founder of Nutanix but most importantly, he was the lead on Google File System. When your business is all about data, having a great file system underpinning your product can mean everything and having that experience is invaluable.
Cohesity offers secondary data storage with scale and simplicity. Their mission is “to reinvent data infrastructure so IT can focus on serving the business rather than managing complexity”. They have gained a lot of traction in the storage world in a very short space of time.
That’s because they’re one of the very few storage vendors who store various forms of data and in turn, provide enhanced features like deduplication, search and analytics, test and dev deployments etc. Integration with the most popular on-premises hypervisors and public clouds make it a very attractive option to start small and then grow as required.
What is Cohesity DataPlatform?
In legacy environments, data is siloed and is stored in many places and in different forms. For example, you have your file shares, object data, tapes, logs and so many more places where data is stored. Not only that, in a lot of cases, data is duplicated many times over, say for example, in file shares.
However, all of that data brought to a single location brings many opportunities. You can deduplicate, compress, search, transform but most importantly, repurpose it for various use cases.
Cohesity DataPlatform is all about providing that capability by consolidating all that data into one place, backed by a great file system and accessible via APIs to provide various capabilities that I will go into shortly.
In addition to its native platform i.e. the on-premises hyper-converged appliance, DataPlatform also has a “Virtual Edition” aimed at IoT/ROBO environments and “Cloud Edition”, which provides the same functionality for AWS, Azure and Google cloud platforms (at the time of writing).
For details, on all I’ve said here, see this presentation by Aaron Delp:
As the function for DataPlatform variants is the same for the various editions, I’ll just talk about the important features for the rest of the post.
As I mentioned above, once you have all the data collected in one place, all sorts of opportunity becomes available. In addition to the standard aim of storing and protecting it, enhanced features and capabilities can be provided.
In this section, I’ll talk about some of the main features provided by Cohesity DataPlatform.
Talk about great naming. Need I say more about what they do?
Of course, it’s all about long-term retention of your data and recovery when required. You are required to have connectivity to the cloud provider of your choice but when achieved, the data is stored in that cloud where you can take advantage of the cost-benefits and scale that the environment provides.
These days, you wouldn’t accept anything such product without encryption, deduplication and compression and CloudArchive provides that too. Also, typically you would expect to be able to restore at a file or VM level, which is also catered for. You can also configure throttling to control how much bandwidth is consumed by CloudArchive at a particular time. You can have policies that back up a set of servers that correspond to say a service and then send it to the cloud of your choice or even more than one if you so wish.
Once stored, you can restore that object on to any of the connected platforms. As all data stored is indexed for metadata etc., it can pick exactly what is required to restore and from the closest repository to the platform. The platform also optimises data egress. That is not only important for efficiency but also results in faster restore times.
Due to the extensive metadata saved, you can do search extensively to look for what was previously backed up and needs restoring. One great capability (often asked by everyone) is its ability to search using vCenter tags. So, you should be able to set certain tags and they become part of the metadata, which will allow searching for required VMs easier.
If you think about it, once you have a configured repository in the cloud, the capability to migrate workloads to that cloud also becomes a possibility. All you need now is some sort of transformation capability that adjusts the VM according to the cloud platform’s IaaS layer and you’re in business.
That’s exactly what CloudSpin is about. Once a VM is backed up, CloudSpin can trigger a conversion process that converts the VMs IaaS layer to become compatible with the destination cloud. Having that capability right in the product also saves you from that tedious work that otherwise, you’d have to manage manually or via an additionally bought third-party product.
So as part of the archival process, you can also enable a CloudSpin policy that sends it to a cloud with a view to make it capable of running in that cloud. That way, you can use it for migration for service or as part of failover etc. You can even use this for test and dev.
App and Data Mobility
OK, so taking CloudSpin a little further. Imagine you have an on-premises application that has a few VMs to provide a particular service and it also has some associated data. You would, of course, have it backed up via DataPlatform and possibly stored in the cloud for long-term protection.
As you now have them stored in your favourite cloud, you can restore them and associated data in that environment in isolation for test and development purposes.
You may not need it for long so it’s very cost-effective to have such a testbed in the cloud as for exactly how long you need it for. That saves unnecessary spending on on-premises capacity, not to mention ordering, lead time, commissioning etc. and we all know well how that story goes!
Cloud Native Backup
So far the story mainly has revolved around backing up and restoring on-premises VMs and data but what about your cloud workloads. Naturally, they are as important as the on-premises ones as once connected, you wouldn’t use the cloud environment just for backup purposes, would you?
Having multi-cloud connectivity enables so many opportunities that I am sure you would deploy production services there too. Those workloads and associated data will also need backing up and exactly what Cloud-Native Backup feature provides.
Why not just “Cloud Backup”? That’s because it’s driven how it should be i.e. using the cloud-specific APIs for backups i.e. not a ported appliance. The backups are taken using snapshots and they’re stored in object storage e.g. S3. The snapshots can then be used to create EBS volumes and attached to EC2 instance as required (and an equivalent process for other clouds).
One thing companies are quite concerned about is the lock-in situation. While it’s mostly to with services that they consume on a platform but the point is what if a company moves to a certain cloud vendor, only to find later that they need to move to some other vendor? They don’t want to be hindered in any way. So, it’s an important concern to address.
Cohesity demonstrated in the session that the data saved by them is cloud-agnostic and can be moved from one to another, without being concerned about restores. For the same reason, you don’t have to move the backups already taken when the move happens as new backups can be sent to the new cloud platform, while still retaining the capability to restore on either of the cloud environments. Obviously, in time, all backups in the retention period will only be in the new cloud, at which point, the older storage can be cleaned up.
After finding out about all the different features Cohesity presented and discussion, here are a few thoughts:
Enterprise Data Management
The fact that you can have one corporate data management policy that applies to all data, backs it up, analyses it and looks for data patterns for compliance etc. is a great feature and makes the life of an enterprise admin so much easier.
You may have seen it in one of the videos but I think more can be done with CloudSpin e.g. creation of cloud-specific templates that can bring up archived instances in an isolated environment, optionally with IP addresses reset.
That would be a great use case for stored backups and solve a problem many companies face when it comes to realistic testing.
These days, if you’re going to pick any service, it has to have RestAPI coverage and Cohesity has that part covered. During the demo, you can see the Jon configured pretty much everything using PowerShell, which is also a useful example of how those tasks can be done programmatically.
I am sure Cohesity will share sample code to give customers a flying start.
Inventory Refresh with Cloud
If you’ve seen the videos, in one I asked if the drop-down menus when selecting a cloud resource is making a real-time query or the list is made up of a periodic refresh. The concern is that if it’s a periodic refresh, depending on that interval, you could find that someone with access rights goes and changes/deletes resources directly on the cloud environment console and Cohesity won’t know about it until the next refresh.
I was told that the refresh is every 5 minutes but there’s a refresh button too. The idea is that whoever is using it, will refresh it manually when using the console. In the interest of time, I didn’t object but I don’t think people refresh naturally.
I think the refresh interval should be configurable so allow more frequent updates if people find inventory conflicts are happening frequently. An auto-refresh could also work if it fetches the required data when someone switches to a form that requires it i.e. just in time.
It’s human nature that people are more interested in using things that are easier to use. IT products are no different and Cohesity has really paid a good amount of attention there. The interface is sleek and very intuitive.
Emphasis on Cloud Focus
Cohesity is doing great things with their public cloud capabilities and work is going on to improve them. I’d like to see those features emphasised more in the marketing as I feel it’s currently under-represented.
The past few years have seen a few vendors doing great stuff with data and Cohesity is definitely one of them. Hyper-convergence typically contains similar if not the same hardware. However, the software running on top is the thing that matters. Whatever that intelligence is capable of and the solutions it enables will determine who does well in this space.
There were so many things that were coming to mind during that interactive presentation at Cohesity and one of the reasons I enjoyed the session so much (as you might be able to see in the videos). Lots of ideas were coming to the head and it was quite funny when the answer to a lot of my questions was “we’re working on it”. So funny actually that Aaron Delp actually reemphasised that to me after the recording stopped.
It’s great to know because that means they’re thinking about the same problems, how to solve them and actually working on them. I am excited to see all the features that Cohesity is bringing to this space and can’t wait for the features that they say they’re working on, to be released.
Disclaimer: As is customary for Tech Field Day delegates (and just in case), I would like to say that while Gestalt IT paid for my travel, accommodation etc. to attend Cloud Field Day 4, I am not being paid or being asked to write anything either good or bad about Tech Field Day and/or any of the companies that presented at Cloud Field Day 4.