Aviatrix Systems presented their product “Aviatrix” at Cloud Field Day 4. For those who don’t know, they are all about multi-cloud networking and automation of it. Their aim is to simplify cloud networking for hybrid scenarios in a secure and scalable way.

Major public clouds like AWS, Azure and Google are already supported and support for Oracle cloud is “coming soon”. We don’t know exactly when but I can safely say that at least Nate Avery would be very interested in the release date. 🙂

It is customary for presenters to show which clients are already using the product and we’re told that Aviatrix is already in use by global companies like NetFlix, NASA, NCR, Pfizer, University of Reading and many others. So, a broad spectrum of industries is using it already.

I’ll stick to the format I used in my post about SoftNAS Cloud i.e. some introduction, some talk with my views added to it and then my overall thoughts at the end.

Why Aviatrix?

When talking about cloud connectivity, one of the most common hurdles is networking complexity. This is especially true for organisations where teams might have organically skilled themselves up with time. When working with public clouds, most concepts remain the same (at least for IaaS workloads) but networking is widely different and remains as one of the biggest challenges. More often than not, that’s the most feared side of cloud migration and causes a roadblock.

Like it or not, most cloud teams do not have networking experts. They don’t interact with network gear on a daily basis and unless they’ve transitioned from a networking background, they have a rudimentary knowledge of networking; it’s enough to have an intelligent conversation and do a design but not down to protocol, routing, encryption or security levels etc.

However, in order for the various clouds to communicate, they need network connectivity and hence the requirement of real, traditional networking. Here where Aviatrix comes in. After initial setup and connectivity tasks, it takes over. Network connectivity requirements are catered for by it, in simple and automated fashion. There are some other benefits that come along the way too, which we’ll talk about in a minute.

Architecture Basics

It would be good to have a little introduction to how it’s all set up and Aviatrix has provided us with a nice little slide to show, that I present before explaining it:

CFD4 - Cloud Routing Reference Architecture

As you can see, the corporate data center is connected to an AWS account via BGP over Direct Connect (could also be a VPN connection). That’s your cloud networking done!

OK, I oversimplified a little and I’ll explain. But first, a little chat about the components:

Centralized Controller

Near to the bottom right, you can see the “Centralized Controller”. That’s the brain of the whole operation i.e. the control plane. As the name suggests, it centralises the operations and management of the whole architecture. In addition to the management of the rest of the deployment, it also provides:

  • Single console for all the orchestration and automation
  • Access logging and monitoring
  • Multi-region and multi-cloud encrypted peering
  • Policy-based management of various components

This controller can be hosted either on-premises or in any of the supported public cloud environments. However, it is also available as a hosted service (also available as a Free Trial), where Aviatrix manages the controller for you and it provides the same functions for the connected cloud environments.

It’s deployed in a high-availability configuration but in an active/passive configuration. As you would expect, in case of a failure, the other appliance comes up but networking continues as normal in the meantime as this is just the control plane.

Gateway

You will have also noticed a gateway connected to every VPC. It is deployed per VPC (or equivalent) and is responsible for passing the traffic in and out of the VPC. It an auto-scaled and load-balanced deployment that allows direct VPN access to VPCs. In addition to that, other features include:

  • Deployable on all supported cloud platforms i.e. ESX/Hyper-V/KVM/AWS/Azure/Google
  • Profile-based access and consistent security policies
  • Multifactor authentication via popular providers

Operation

For its operation, Aviatrix relies on a Transit Network mechanism. In simple terms, it creates a hub and spoke architecture between the main VPC (where connectivity was done between public cloud and on-premises in the diagram above) and all other VPCs. This connectivity is provided by the gateways deployed in each and a VPN is created between the hub and each spoke.

All traffic travelling through is therefore encrypted but more importantly, simplified and all handled through the central console (or API driven). All you have to do is provide some basic details (or replicate with relevant parameters) and it all happens automagically.

There are other benefits that you also get using this approach but more on that later.

Use Cases

It seems that Aviatrix is targetting what they see as the major use cases. Seems like a good enough format to me. Let’s discuss them one by one and then I’ll add my commentary to it.

Next-Gen Transit Network

Think of the days when virtualisation became common. Suddenly, you could control your destiny but were still handicapped by the networking guys. Then came VST (Virtual Switch Tagging) and all was good in the world. Your conversation with the networking guys went down to getting a few trunk ports off them. With a cup of coffee in your hand, you could have a smug face and say “just give me some trunk ports and add these VLAN tags. I’ll take care of the rest”

You can do all the automation in the world and make your cloud infrastructure agile but if the environment requires connectivity back to on-premises, network team’s involvement is required. Considering the process, at-risk periods, change control and all the rest, it could take weeks for a simple change.

Aviatrix is basically providing the public cloud equivalent of VST to you (metaphorically – not in networking terms, before anyone objects!) It allows you to set up BGP connectivity between on-premises and the first VPC in that diagram above. From that point forward, connectivity is handled by the controller by deploying the gateway infrastructure to every new VPC and setting up that VPN spoke.

BGP is configured once for the pipe and no configuration changes are required after that as you’re not connecting any new VPCs to the on-premises router.

See that’s what I meant by “That your cloud networking done!” there. As you know now, you do have to deploy the rest of the infrastructure to make it work too.

My Thoughts: This is absolutely brilliant! I do know how long that process can take in practice. I also know that a popular model is to have separate accounts/VPCs for different types of environment, which also helps with security boundary and billing purposes. For such environments, the number of VPCs can multiply quite quickly. Connecting all those environments back to the on-premises networking is typically the bottleneck in terms of speed of deployment.

There is something to consider though. These gateways are inline and due to that reason, there’s a possibility that something might go wrong and in case of such an event, connectivity can be affected. The resilient model should alleviate those concerns but these concerns don’t exist in the native method.

Like everything else, one has to weigh the benefits against the probability of a gateway failing and accept the small risk that exists accordingly.

Egress Security

So you have many VPCs and instances that need to connect to the Internet for various reasons. Machines in one VPC might need to go out for patching, activation of licenses, CRLs etc. while machines in another might contain users browsing educational sites, and some might need open access.

Aviatrix argues that NAT gateway doesn’t provide control and for NAT instances, one would hit IP address based policy issues or you will hit other network ACL/security group limits. The only alternative (in their view) is to hairpin the traffic back to on-premises and out, which is not only hugely-expensive but also introduces latency.

Aviatrix’s solution is to provide tag-based policies that essentially create whitelists. When those policies are attached to a gateway (which if you remember, is the way in or out for traffic), allows machines in that VPC to securely connect to allowed sites.

My Thoughts: While I agree that hair-pinning through the on-premises route is a bad idea for the reasons mentioned (and it also causes unnecessary congestion), I don’t think machines/people would even notice it unless it’s hideously bad.

It’s a moot point though as I would just deploy an auto-scaled squid proxy server in each VPC where it’s required. No? Well, I could but doing it the Aviatrix way provides the following benefits:

  • Squid whitelisting is cumbersome to maintain
  • Programmatic changes are easy using tags but not so much to replace whitelists on squid
  • No appliances are required

So, I agree with the Aviatrix advantage really. I was just messing with you. That said, a proxy-based solution with automatic updates of whitelists immediately after modification will work just fine for environments without Aviatrix.

Multi-Cloud Peering

If you are connected to multiple public clouds, it’s not easy to get them to talk to each other directly. As you are already connected to both, maybe it’s easier to bring down the traffic to your environment and back up to the other cloud but that’s not very efficient (and is costly), as you can expect.

As Aviatrix is essentially abstracting the underlying networking from you (not replacing it as it’s talking to it on your behalf using native APIs), it becomes the common factor on both sides, making multi-cloud peering a doddle! Once you’ve brought both clouds under Aviatrix’s management, it’s a simple matter of selecting the VPCs (or equivalent) from both clouds and creating the peer.

Aviatrix suggests that it could even allow onboarding of the other public cloud quicker if on-premises connectivity is taking long as traffic from the second cloud can flow through to the first via the peering and travel to the on-premises environment via that route.

My Thoughts: Again, a brilliant option especially considering the speed of implementation and from the cloud networking point of view.

I wouldn’t recommend passing traffic from one cloud to another with a view to sending it on-premises unless it’s minor (due to inter-cloud costs and out to on-premises too) but if that’s the only option and other teams can move forward while on-premises connectivity is achieved, then why not!

Site to Cloud/Remote User VPNs

Aviatrix also talked about these two use cases too but while they’re very important, they’re not really groundbreaking. So in the interest of post length control, I won’t discuss them here but please look these up too.

Now, as always, I would recommend going through all the videos that cover Aviatrix’s sessions as they cover the product in more detail. However, you must watch Sherry Wei’s Cloud Routing Architecture Chalk-Talk, to hear about the architecture and all these use cases in her own words.

Other Benefits

Now that you know a fair bit about what Aviatrix’s infrastructure looks like and what it does, here are a few other nuggets that you should know about:

One Controller to rule them all

Do you know that you only need one controller set for all your environments? That’s right! You could have it on-premises or in one of your connected public clouds but that’s all you need. All other environments are managed through this single controller deployment.

But I want to move them

No problem there! All configuration is stored in your favourite cloud but you can export it, deploy the controllers in the desired cloud environment and restore the configuration on top. That’s it!

Encrypted Peering

I am not sure about Azure or Google but at least for AWS, intra-region peers are not encrypted. They are isolated from everything but how isolated they are and how is not known.

Due to how Aviatrix works, the peer is encrypted so you can be safe in the knowledge that all traffic between your peers is safe and you don’t have to take the cloud provider’s word for it. Great if you need to assure the security guy!

Multi-Cloud/Multi-Region Encrypted Peering

If routing between two clouds is difficult, just imagine how it would look like for encrypted traffic. Again, by the nature of how it works, the traffic is encrypted.

Abstraction from specific cloud complexities

As the same product is handling all networking, it transforms the hardest part of the cloud into the easiest part. Operations staff are also happy that they need to learn just one product and it’s managed centrally from one place.

All of it is also driven by the same API, regardless of which cloud platform it might be. Say when creating a multi-cloud peer, Aviatrix edits routing tables of both VPCs to talk to the Aviatrix gateway. For that reason, you are able to define one process/template for all the platforms (bar any cloud-specific restrictions, I assume).

Transitive Routing

To some transitive routing is actually a desirable feature as it allows them to sleep easy in the knowledge that lack of it, protects them from spoke VPCs to talk to each other to on-premises directly etc.

Aviatrix provides the functionality to enable it “by design” if required. The default position is still to have it disabled, to mirror default functionality.

Freedom from configuration limits

Most cloud constructs have soft limits applied to them to protect the customer from accidentally creating too many, which can easily happen in an automated environment.

Then there are hard limits too which can also be hit for certain objects e.g. AWS Network ACL has a limit of 20 (which can be increased to 40 but could impact network performance severely).

Being a VPN based environment, those limits are no longer an issue.

But wait! There’s more…

There are few more tricks up Aviatrix’s sleeve that they are about to release in September and here is a brief introduction to them:

Insane Mode Encryption

It’s not well-known or publicised but cloud providers typically only provide an aggregate bandwidth of 1 – 1.25 Gbps on their gateways so a typical transit solution will always be capped at that speed.

CFD4 - Encryption Bottleneck

As Sherry explains in the video below, this is due to just a single core working on encryption. Even with bigger instances that are capable of delivering huge amounts of bandwidth, they are not able to deliver fast encryption, due to that single CPU limitation.

Aviatrix believes that the transit network-based solution that uses end-to-end encryption is the future because of security, abstraction and other features it provides. So, they’ve been working to solve this encryption performance issue and the September release should include the update that removes this IPsec performance bottleneck.

Mixed Protocol Routing

BGP is well-known and understood for traditional networking and forms the basis of communication between an on-premises environment to public cloud connectivity.

But the cloud environment can suddenly grow into hundreds of VPCs. Using BGP to discover/update the whole infrastructure would not scale well. As you are the master of your destiny in terms of routing, you are aware of everything and consequently, don’t need all that.

Aviatrix supports both mechanisms and they call it “Mixed Protocol Routing” for want of a better name. I would suggest “Flexible Protocol Routing”.

AVX Event Broker

Aviatrix controls all connectivity in all clouds via a single controller. As the cloud environment scales, you could potentially end up with hundreds of gateways.

As mentioned at the beginning, there are many functions that a controller performs so there’s a lot of communication between the controller and every gateway in the system. So, the controller has to not only scale it’s messaging to all those gateways but also send the message once and be happy in the knowledge that it “will be” delivered.

In cloud-native environments, decoupling is achieved using guaranteed message delivery via message queuing services e.g. SQS (AWS), Storage/Service Bus Queues (Azure) or Cloud Pub/Sub (Google) etc.

Aviatrix is incorporating such queues for its controller/gateway communications, which will allow the mechanism to scale its communication, in line with the growth of the infrastructure deployed.

So, I’d say please do go and watch this video too as it explains all these topics in greater detail:

Conclusion

Where do I start? I think by looking at the length of this post and my enthusiasm in the videos, you’ll be able to tell that I absolutely love the product. It might not be the answer to all the problems of the world but what it does well is to abstract all the complexities that cloud engineers/operations encounter whenever they start learning about a new cloud environment.

That abstraction also provides that single view and management of the whole environment, regardless of the backend cloud environment. Networking constructs and everything else can, therefore, be programmatically-created uniformly for all those different environments.

We talked about the use cases it addresses but in addition to those, there are all those additional features that I also mentioned.

While I understand that some of these features e.g. automation, transit networking and a few other things can also be achieved using native mechanisms available in those respective clouds as well but think about how much skill and experience that will require and still, all of them will be widely different when compared to one another. Aviatrix removes all that complexity and enables you to spend your energies in the right places e.g. providing your customers with better service.

Like I said, Aviatrix’s slogan should be “Cloud Networking Made Easy”.

Related Posts

My fellow Cloud Field Day colleagues have also written blog posts about Aviatrix and here are the links to them:

Disclaimer: As is customary for Tech Field Day delegates (and just in case), I would like to say that while Gestalt IT paid for my travel, accommodation etc. to attend Cloud Field Day 4, I am not being paid or being asked to write anything either good or bad about Tech Field Day and/or any of the companies that presented at Cloud Field Day 4.