A common Domo question is “Which transformation tool should I use?” If you’re an experienced Domo user, don’t laugh at it—it is indeed a very good question! Domo has 3 main transformation methods: MySQL, Magic ETL, and Redshift. (There is a 4th tool called Data Fusion which is intended for very specific use cases. We won’t be covering that as part of this article).
If you’re not familiar with these tools in Domo, take a quick detour to our Intro to Domo Dataflows post for a quick overview. In this article, we’re going to dive a step deeper and look at some key considerations for picking the right dataflow tool.
Making the wrong tool decision can lead to process inefficiencies, performance hits, and spaghetti code down the line. (Yes, there truly is a Domo version of “spaghetti code,” but picking the right tool goes a long way in avoiding it. At Search Discovery, we’re a “lasagna” type of company: We’ll do anything to help you stay clear of the spaghetti!) Later in this article we’ll explain what “Domo spaghetti” looks like. But first let’s start with two rules of thumb to keep in mind from the beginning:
Rule #1 – This dataflow will need to be maintained. Who will be managing it long-term, and what is their level of SQL experience?
Magic ETL uses a drag and drop interface and is very easy to use. This makes it excellent for less technical users. (In fact, I’ll tell you a secret: Although they might never admit it, Magic ETL is actually preferred by many highly technical users, too!)
If you’re new to SQL or don’t consider yourself particularly “technical,” Magic ETL is a good place to start. As I’ve mentioned in my SQL Code for Non-Coders post, using a visual SQL interface helps you become familiar with SQL concepts.
In contrast, if you have a lot of experience working with MySQL databases, you’ll be a lot happier using MySQL instead of Magic ETL. Don’t underestimate the power of being familiar with a SQL flavor’s ins & outs!
Rule #2 – What’s your data size and how fast do you need this to run?
Not long ago, I was helping a client prepare 300 million rows of data in Domo. The client was already familiar with MySQL and had assumed that would be the ideal transformation method for them. After seeing the velocity and size of their data, I surprised them by recommending that they use Redshift instead of MySQL. Why?
When it comes to very large datasets with high speed requirements, Redshift is king! Under the hood, it’s using an Amazon Redshift petabyte-scale cloud data warehouse. You don’t need to have an AWS account or anything, though. Domo takes care of all that behind the scenes and just runs your code within a Domo-owned Redshift environment. It’s the same functionality as native Redshift, but with zero overhead to manage.
(Please note that Redshift is not enabled by default. If your use case warrants it, submit a ticket to Domo Support to have them turn it on).
If your datasets aren’t in the gigabyte range, but you still want high speed, Magic ETL is not far behind Redshift! Make sure that you have Magic ETL V2 enabled. It includes the best built-in speed optimizations that are available in Domo.
While MySQL often lags behind in the speed realm, it can still run fairly fast as long as you configure your indexing correctly. If you’re a true MySQL optimization wizard, it’s possible to speed up MySQL to become even faster than Redshift, in some cases.
Here’s a screenshot showing where to find the indexing options in your MySQL dataflow:
Great, we’ve covered the key fundamentals for starting out with Domo transformations. Now let’s get to the meat of it – How to avoid building spaghetti and meatballs out of your data?
What makes “spaghetti” “spaghetti”? There are three main things:
- The pasta is tightly combined and similar looking: You can’t identify what makes one piece different from the others.
- Those noodles are tangled together: You can’t pull out one piece without getting other pieces too.
- There’s a lot of it in one spot. (And if not, you’re probably going to be hungry after dinner. Just saying.)
In reality, it’s the same thing with Domo Spaghetti code:
- Your SQL logic does similar things many times. You can’t identify what’s going on at any point in the dataflow.
- The logic is intertwined with itself. You can’t remove one piece of logic without breaking the dataflow or removing other steps at the same time.
- You’re doing a lot of steps within one dataflow.
With regards to the first two points, check out my previous blog post for Best Practices with Data Transformations. Those tips will help you isolate each step of logic and self-document your dataflows as you go along.
It’s the third point that I’d like to focus on here: Doing too many things in one dataflow.
I mentioned earlier that Magic ETL is very popular, and for good reason. However, there is one major consideration to know before diving into Magic ETL: A single step in SQL can often require multiple steps in Magic ETL.
I know you’re probably thinking: “C’mon, Shira. First you ruined my plan of having spaghetti for dinner, and now you’re telling me that it’s a big deal to split up a SQL command into multiple steps?!”
But the answer, unfortunately, is yes. It’s a bigger deal than you think. Short-term it’s fine! But when you start expanding your dataflows and including more logic, Magic ETL dataflows can become hard to manage in the long-run.
For example, here’s a simple CASE statement that was implemented in Magic ETL:
Is this manageable? Definitely. Would this dataflow become overly long if we added in 2 or 3 more CASE statements just like it, in addition to all the other business logic? Yes. Absolutely.
Note that Magic ETL V2 has a SQL-like filter option now! This can remove some of the challenges that Magic ETL generally has with CASE statements, like we saw in the screenshot above. If your Domo instance doesn’t have it enabled yet, submit a request to Domo Support or talk with your Domo AE.
Boiling it down, here’s another rule of thumb:
Rule #3 – Keep an eye out for rapidly growing Magic ETL dataflows.
If you’re using many CASE statements, or if your dataflow increases to more than 30 action boxes, consider switching over to a SQL dataflow (MySQL or Redshift).
Alternatively, split your dataflow into two different dataflows – One can use MySQL and the other can use Magic ETL (or they both can use the same tool). The key thing to ask yourself is “What will this dataflow look like in another 6 months, or to someone who’s viewing it for the first time?”
Lastly, remember that there are certain limitations within MySQL or Redshift.
Rule #4 – Match your required functions to the tool which provides that functionality.
Domo’s currently enabled version of MySQL does not allow for Window or Partition functions. You’d need to use Magic ETL’s “Rank and Window” box or Redshift dataflows for that.
On the flip side, Domo’s implementation of Redshift does not currently allow you to create User-Defined Functions (UDFs) or stored procedures. Head over to MySQL if you’ll need that type of functionality.
Domo’s 3 transformation options provide a great amount of flexibility and power. Now that you’re aware of the tradeoffs between each option, you’ll be able to decide on the correct tool to meet your needs. To summarize, here’s a handy decision tree to help you make the right choice from the beginning.
For more information, be sure to read the Data Processing Tools article from Domo’s Knowledge Base.
Happy Domo-ing! I’m heading off to make myself some lasagna.