These SQL Queries Will Help You  To Become a Data Analyst :

In this article, I will cover the basic SQL queries and constructs that will get you started with your data analysis needs. But before I jump into it, let me take you through what SQL is and its general applications.


What Is SQL?

SQL is a standard language for accessing and manipulating databases. In short, it helps you ‘talk’ to your database.

You can use it to retrieve information already stored in the database, create new tables, and store or update data as needed. Sounds simple, right? It is. But that is where its power lies: It is simple but produces beneficial results.

Organizations that follow a data-centric approach to their business decisions thrive. And if you present your ideas using data analysis, you are much more likely to get buy-in from your stakeholders. In fact, every company needs data analysis in one form or another to grow.

SQL helps make this process of analyzing data seamlessly. It can be used for countless data-related applications no matter the domain – tech, HR, finance, marketing, sales – or the size of the company. For instance, SQL can help you with the following:
  • Analyze customer data and generate insights related to customer behavior.
  • Analyze your PnL data to reveal your biggest cost heads.
  • Perform general financial analyses.
  • Understand which departments have the most expensive payroll/workforce.
  • Find areas in your operations with the most errors or inefficiencies.
  • Identify your best-performing salespeople, stores, locations, regions, etc.
  • Analyze bottlenecks in your website and other data-related tasks.
Basic SQL Constructs and Queries :

There are many more SQL clauses and functions than we can cover in this article, but the following are the foundation of most SQL queries.
In reality, you will work with thousands of rows and many tables. Don’t worry; the constructs remain the same irrespective of the amount of data. These basic queries will work on any database table.

Here are the basic parts of a SQL query:

1.   SELECT and FROM

The SELECT keyword is used to retrieve data from the database. You can specify the names of the columns (i.e. the type of data you want to see, like customer name or age), or you can see all the data in the table.

The FROM keyword specifies the table(s) from which the data is to be retrieved. For instance, say you want to get a list of all the cities in your database with their country code, district, and population information. This information is in the City table shown below:




Here’s the query we’d use to get a list of all the customer names with their email addresses:

SELECT Name, Population
FROM city;

And the output:


2.   WHERE

Suppose we want to know how many a particular city having how much of population then we have to use WHERE.

This is when you can use the WHERE clause.

So, if you use this query …

SELECT Name, Population

FROM city

WHERE Name = 'kabul';

And the output:




You can filter for multiple conditions if you use the AND keyword:

SELECT CountryCode,Population
FROM city
WHERE CountryCode = 'AFG' and Population= '237500';

And the output:



If you want to return the results when either (or any) of the WHERE conditions is true, use the OR keyword:

SELECT CountryCode,Population
FROM city
WHERE CountryCode = 'AFG' or Population= '237500';

And the output:



3.   ORDER BY

You want to see the top orders and their details. One way to do this is to list orders in descending order (10–1, Z–A) by value. You can use something like this query:

SELECT *
FROM city
ORDER BY population DESC;

And the output:



Using a ‘*’ after SELECT tells the SQL engine to retrieve all the columns for a table. The ORDER BY clause directs that the results be shown in a particular order. By default, the results are returned in ascending order (1–10, A–Z); however, since we explicitly mention the DESC keyword, our results are returned in descending order.

4.   CASE

Typically, a business will have thousands of rows in the table. You may not always want to see the total value of each order. Instead, maybe you want to categorize them based on value or some other logic.

If you want to categorize the population into ‘High’ or ‘Low’ values – depending on whether the population is above or below 186800 – you can use the CASE construct:

SELECT CountryCode, Population,
       CASE WHEN  Population > 186800 THEN 'HIGH'
            WHEN Population < 186800 THEN 'LOW'
            ELSE 'MEDIUM' END as Population_Category
FROM city;

And the output:


In this query, each row is evaluated by the CASE function. Rows are assigned a category based on the first condition (i.e.  WHEN .. THEN ..) that evaluates to true.

The ELSE specifies a default return value if none of the conditions evaluate to true for any row.

5.   GROUP BY

Oftentimes, to generate insights from large data sets, we may want to group similar items together. We can use this information to make informed decisions;  grouping the data helps us find patterns, behaviors, or metrics. This grouping is called aggregation; it can be achieved with the SQL GROUP BY clause.

You generally use a function to calculate the group metric. Commonly used aggregate functions are SUM(), COUNT(), AVG(), MAX(), and MIN(). With these, you can do many things, e.g. calculating the total expenses for a department or counting the number of employees in each location.

Here’s the Example query:

SELECT CountryCode, sum(Population)
FROM city
GROUP BY CountryCode
ORDER BY 2 DESC;

And the output:



Clearly, your top country is CHN 175953614.

The query works by first selecting all the unique CountryCode population and then using GROUP BY to find the aggregate total sum for each (i.e. the total population of a country). The ORDER BY 2 DESC clause displays the results in decreasing order by SUM(Population).

6.   JOIN

The SQL JOIN feature lets you select information from multiple tables in a single query. Typically, you will connect two tables based on one or more column values common to both tables.

create database db1;
use db1;

create table t1(id int,en char(20),gender int);

insert into t1 values(1,"p1",1);
insert into t1 values(2,"p2",2);
insert into t1 values(3,"p3",1);
insert into t1 values(1,"p4",1);

select * from t1 as a
join t1 as b
on a.id=b.gender;

And the output:


There are two more join queries and they are left join and right join.