SQL is the key which opens the door to data visualizations and positive business impact.
If you’re scared of spiders or of visiting the dentist, I completely empathize with you. (I’m on the same page! Shh, don’t tell my dentist 🙂 .) But if you’re afraid of SQL, then it’s time to get things straight: SQL is not scary. You don’t need to be a Data Scientist or Software Developer in order to use it, and it even reads in sentences similar to everyday English. To quote Mike Martinez from his post on Quora, “SQL is halfway to English already”.
Many business users or analysts who work with data already know how to use Excel data functions or visual interfaces for SQL e.g. Magic ETL in Domo, Tableau Prep in Tableau, Query Editor in Power BI. Those tools are great – yet the functionality can be limited. For the purpose of this article, I’ll call those tools “Fake SQL” since you’re actually using SQL – just without knowing it!
If you’ve ever wished that you could have greater control over the data manipulation process – or if you’ve considered learning SQL to expand your skill set, then you’ve reached the right spot. Meant for regular, non-technical users, this post will guide you through 5 steps to build your skills and help you use “Real SQL” to prepare your data properly.
Ready? Let’s dive right in.
Step 1: Ensure you understand basic SQL concepts.
SQL stands for “Structured Query Language” and it’s simply a way of communicating with your database. To use it properly, there are a few concepts you’ll need to understand. We’re not talking about the syntax; that comes after. At this point, we need to understand what SQL is and how it behaves.
Data in a database is organized into tables, rows, and columns. “Queries” are the statements that ask the database to find specific piece(s) of data.
Let’s say you own a grocery store. Using the Consumer_Table below, you can find out information such as: how many total customers you had on a given day, which day of the week is the most profitable, and so on.
If you want to find out information which is located in both tables, you’ll need to join these two tables together. When joining tables, there needs to be a match key, also known as a join key. That is at least one column from each table which has a match in the corresponding table. (There are many types of joins – Make sure you understand what each join type accomplishes. Keep an eye out for our next post to learn more about that! )
For example, to list the names of the fastest-selling foods, you’d need to do a left join where Consumer_Table is the left side and Food_Table is the right side. By joining on the Food_ID column in both tables, you have access to all the columns from both files, including the Food_Name column which we want to use.
In our Food_Table, the Food_ID column is called our primary key since it identifies a unique row. It matches up to the Food_ID column in the consumer_table. Notice that the Food_ID column (in Consumer_Table) repeats values (ie. Food_ID 4 is listed twice). That’s because it is simply referencing the full directory of foods that exist within the Food_Table. Here’s an article that provides a detailed explanation of primary keys and reference columns (a.k.a. foreign keys).
Once you understand the fundamental concepts, it’s time to dip your toes into the “Real SQL” syntax.
Step 2: Take an existing “Fake SQL” transform or Excel function and convert it to SQL. Test the results and see if you get the same results with both methods.
Using the same example from before, let’s say we want to find out how many unique people made purchases on 8/22/2018.
In Excel, we can filter to the specific date and count the people. Clearly, there were 2 customers that day.
When using SQL, it’s the same idea. In English, this request can be stated as: “Get the count of unique people who made a purchase on 8/22/2018”. Translating that into SQL, “Get” becomes “Select“, “Count of Unique People” becomes “Count(distinct people)“, and “who made a purchase on 8/22/2018” becomes “where date_purchased = ‘2018-08-22‘”. Put it together and you have your very own SQL statement:
SELECT count(distinct consumer_name) WHERE date_purchased = ‘2018-08-22’
After running that query, we can validate that it worked by checking that the result is still 2.
If you’ve used “Fake SQL” tools, practice converting your transformations into SQL. I recommend spending the most time on this step since this gets you accustomed to the core syntax of SQL actions.
Quick note: There are many flavors of SQL, including MySQL, Redshift SQL, and SQL Server, among others. These “flavors” are all based on the same standard SQL syntax, but there are syntax and function differences. Unless specified, the examples here apply to standard SQL syntax, but make sure that you’re practicing with the SQL flavor that you want to use.
Once you’re comfortable with this part, you’re ready to move into slightly more advanced stuff:
Step 3: Gain familiarity with the functions that weren’t available in your “Fake SQL” tool.
Switching to “Real SQL” gives you access to a lot more SQL functions. For instance, MySQL (and many of the other SQL flavors as well) provides the LEFT() function. The request to the database is fairly straightforward – Take a string and pull out a certain number of characters, starting from the left.
Let’s say our column is named “customer” and the first row has a value “Joe Smith”. Our query would look like: SELECT LEFT(customer, 2) FROM table_name , and the result would be “Jo”. Simple enough, no?
This example is just the tip of the iceberg: There are quite a lot of functions available. Once you find functions that you want to use, research them and ensure you understand how they work. When it comes to this step, Google (or your favorite search engine) is your best friend!
Quick Note: In general, don’t worry about memorizing SQL syntax. It’s more worthwhile to focus on the concepts and implementation of SQL since you usually can find the exact function syntax with a quick Google search.
Step 4: Challenge yourself to implement one of those functions.
Let’s take another example. Perhaps you’re in charge of managing pricing for a grocery store (wouldn’t that be nice? 🙂 and you’re trying to list all food items where the price is between $1.50 and $2.50.
We can use the same table as before:
Assuming our “Price” column is saved as a decimal, we can use the MySQL “BETWEEN” function to limit the data to a specific numeric range. In English, we’re asking: Please get all the data where “Price” is greater than or equal to 1.50 and less than or equal to 2.50.
In SQL, we say: SELECT * FROM food_table WHERE Price BETWEEN 1.50 AND 2.50;
The star (*) is the SQL code word for “get all the data”. It’ll retrieve all available columns instead of asking to explicitly list which columns of data you care about.
Not too hard, right?
And finally… the last step to becoming a proficient SQL user:
Step 5: Repeat steps 3 and 4 until you’re 100% comfortable with “Real SQL”
I know… I can hear your groan all the way from my computer. Yep, Step 5 is cliche: practice, practice, practice. But for better or for worse, it’s true!
Implementing SQL and learning new functions requires a decent amount of trial and error. Don’t get concerned if you hit up against errors or even the wrong result at times; The more you do it, the more you learn and the easier it gets!
Be sure to grab hold of resources along the way to help you out – whether that’s a website, article, colleague, friend, or consultant.
To summarize: You might’ve caught the trend here. My “5 steps” are really just two basic iteration steps: learn the concepts, try it out in practical terms, repeat!
SQL is fabulous. It’s easy enough for non-technical folks and yet it’s powerful enough for extremely advanced data manipulation. As you know, data is only valuable if it’s prepared properly and is accurate, which is exactly what you’re learning how to do!
At Search Discovery, we believe data holds the keys to understanding your business and its potential growth. This is the first post in a SQL series intended to help you move your data in the right direction.
I’d love to hear your feedback and comments on this post. If you need a hand with SQL manipulations or are interested in further explanations for how data can be valuable for your business, please reach out to me personally at firstname.lastname@example.org or click here and our expert team of Data and Cloud Engineering consultants would be happy to help!