Cohesity

Cohesity is a web scale, appliance-based, secondary storage platform started by Mohit Aron formerly from Nutanix and Google. Each 2U appliance is Intel x86-based and consists of four 16 core nodes (Dual 8-way). Each node contains three 8TB spinning disks, one 1.6TB PCIe MLC Flash drive, and 2x 10Gb ports.

Much like the Rubrik solution, this also runs backup software natively on the platform thus eliminating the need for costly licenses for Veeam or Commvault. The need for server hardware is also eliminated.

Also similar to Rubrik, this solution leverages policies to take backups instead of the traditional method of specifying the data set, specifying a target, setting up a schedule, and saying, “go!”. Using these policies based on your RTO/RPO, one can use the built-in cloud gateway functionality to ship the coldest (idle) data off to the cloud.

Different to most backup platforms is the ability to use the data that is backed up immediately. Data isnt locked up in a backup job anymore. Leveraging the Journaled OASIS file system, they are able to present NFS, SMB, and S3 protocols natively. Part of the file system is something called SnapTree which allows for an infinite amount of snapshots with no performance penalties  like traditional arrays create.

The solution supports variable length global deduplication (even to the cloud!), encryption, replication to AWS cloud natively (CloudReplicate), compression, and even runs MapReduce natively. One benefit of this solution is the ability to index all the data coming in. Similar to what Object Storage solutions provide, you get fully searchable data. Pretty neat stuff and a real game changer.

In the enterprise, most people have their primary storage for databases for the application. Something like Pure Storage. These enterprises have “other” data which needs a home. Cohesity (and Gartner) refer to this as dark data. Cohesity is looking to be the platform for this dark data. When I was a manager at my last gig, we had a basic flash solution for our databases and then had an entire FlexPod environment for “everything else”. We also had an entire environment for backups for compliance/contractual reasons more than anything. Those environments mandated a lot of hardware, consumed a lot of rackspace and power and cooling. Not to mention the various management interfaces I would need to “run” things. Looking back, this would have helped to solve the problem of our backup environment and eliminated the need for “Utility” filer. Any storage admin knows the deal. “We cant delete that 4 TB of data”. There are usually 4-5 instances (volumes) like that, at least. I could have ripped out about 30 RU worth of gear and replaced it with 4 – 8 using Cohesity. Strategically, I would have immediately been able to leverage the cloud for cold data instead of buying yet another appliance for that use case (think AltaVault). Beyond that, my RTO would be reduced to minutes vs hours. That’s peace of mind and a HUGE win in the efficiency column.

If you would like to know more about this solution or have any questions on the data provided, please email me or leave a comment below.

 

Rubrik

Rubrik (Swedish for New Standard) is a web scale, appliance-based, data backup and recovery platform started by Bipul Sinha and Arvind “Nitro” Nithrakashyap. This appliance is called a Brik. It consists of 4 dual processor compute nodes (commodity) in a 2U chassis. Each node can provide up to 30k IOPS and 1.2 GBps of throughput. Each node consists of 1 SSD and 3 HDDs. The most dense config currently provides 30TB of raw capacity. Compression (Inline), Dedupe (Inline and Global), and encryption are supported and they average about 75-80% data efficiency in the field. Extraction of snapshotted data is nearly instantaneous thanks to some proprietary algorithms developed around flash architecture parallelism.

Backup software that handles catalogs and data movement runs natively on the platform saving organizations money on 3rd party backup software such as Commvault or Veeam.  No additional servers to run that software are required either. No additional storage device. Generally, you buy some software, some servers to run it on, and some storage to send the data to. This is all that in one…and then some.

Another big difference with this solution is the elimination of the RTO generally associated with restoring from backup or even the time spent making your remote replica writable. This appliance basically makes your backup device act as a primary storage device until you can get your primary storage back online and ready for a migration of the data back on to that primary storage. Are we allowed to say Hyper-converged?

Leveraging policies (instead of traditional jobs) along with a resource scheduler and algorithms developed in house that optimizes data extraction, they are able pull data using VADP and CBT with minimal impact to the production environment. Using the SLAs and policies, data can then be moved off to an object storage solution (think Amazon but many are supported). They claim that it virtually eliminates your RTO since recovery simply means bringing up the VM or SQL DB in the remote location. This would be done with the data living on the Rubrik itself until the data can be moved (vMotioned) back to primary storage.

One other cost saving feature is the indexing of all snapshot data. If a single file is needed for a restore (say a 50 GB MDF), but it lives on a volume that is 400 GB and contains other data, instead of incurring the cost of the 400 GB transfer out of <insert cloud storage provider here>, you only incur the cost of the 50 GB MDF. This is also possible with a single file restore out of a VMDK.

Hardware Summary:

2U “Brik”

4 Dual Proc Nodes per Brik

30K IOPS per Brik

1.2 GBps throughput per Brik

30 TB RAW (most dense config)

Scales to over 40 nodes easily

Software Summary:

Web-Scale architecture

Atlas Scale Out file system

Compression

Global Inline Dedupe

Replication

Single File Restore

Supports all cloud providers and object stores

Interviews

Throughout my career I have been on many interviews and interviewed a few as well. I realized quickly that its about your mind more than it is about your brain. One interview stuck in my head. I was the one being interviewed. Best interview ever. I replicated that interrogation method when hiring my teams and the folks that were hired as a result of that process, flourished. They all went from generic break/fix Sys Admins to well respected architects and directors/VPs in our industry.

As a systems engineer, we are required to have to have technical prowess. We should know how to navigate our way through some pretty challenging scenarios. Quite often, its under a tremendous  amount of pressure (“get the site back up, we are losing 1000s of dollars every minute!”). Any engineer that has been in the trenches knows what that feels like. They also know how many things are learned…and eventually forgotten. Nobody knows it all. We have all discovered a lot, resolved complex issues as a direct result of our better than average reading comprehension skills, potentially picked up some “good habits” along the way, and again, probably forgot a lot of the minutia involved in resolving the issues. The more of them we fix, the more we realize the minutia is unimportant. Do you remember the KB/TR  number of an issue you fixed 3 years ago? Probably not. However, we all remember how DNS works all the time though, right? Well yes…because that’s fundamental to computing. See my point?

I recently interviewed at a company and was mostly embarrassed at the questions I was being asked. This is a very well known company with a market cap in the billions and growing every day. The position was a middle manager of systems engineering. Nothing I couldnt handle. Unfortunately, two minutes in, I felt like I was 25 again trying to get a job as a desktop support guy when the questions began. No matter how well I did, I didnt want this job.

My point here is that the questions that are asked during an interview process are crucial to the level of talent that is acquired. Be smart with them. A lot of times, the question will be, “what are the 7 layers of the OSI model”. Bad question. Why? You end up with a guy that knows how to memorize things. It has no reflection on the candidates critical thinking skills. Going back to my “best interview ever”…what was the question that was asked that gave me the ability to showcase my talent?

“Tell me everything that happens between the time you type in ‘Google.com’ in your web browser to the time you see the logo and search box”.

Bingo. He not only was able to determine that I understood the OSI model, but that my understanding of applications, proxy servers, DNS, html, tcp/ip and web servers (“the internet”) was up to snuff. I understood the order of operations and clearly, my troubleshooting skills would allow me to isolate and pinpoint the problem. Memorizing the layers of the OSI model doesnt really help you figure out why users cant get to the application.

As an interviewer, dont play Jeopardy with your candidates. As an interviewee, if you see someone playing Jeopardy during an interview, keep faith that you are a professional with industry standard skills and talent. Not knowing the specifics of a google search which enlightened someone a few days prior, doesnt make you a bad candidate. It just means they have an amateur hiring manager without any vision or leadership skills. Keep searching for someone that values your mind.

Good luck out there.