To excel and refresh my skills in MySQL, I practiced this case study of 8 Weeks SQL Challenge. Big shout out to Danny Ma for making this challenge. If you interested to join this challenge, click on this link.
Because of this challenge is long enough to post, and I’m afraid you’re gonna get bored. So I write some of the questions and solutions. But, in case you want to explore more, I wrote the complete query on my Github.
Did you know that over 115 million kilograms of pizza is consumed daily worldwide??? (Well according to Wikipedia anyway…)
Danny was scrolling through his Instagram feed when something really caught his eye — “80s Retro Styling and Pizza Is The Future!”
Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire — so he had one more genius idea to combine with it — he was going to Uberize it — and so Pizza Runner was launched!
Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.
Because Danny had a few years of experience as a data scientist — he was very aware that data collection was going to be critical for his business’ growth.
He has prepared for us an entity relationship diagram of his database design but requires further assistance to clean his data and apply some basic calculations so he can better direct his runners and optimise Pizza Runner’s operations.
All datasets exist within the
pizza_runner database schema - be sure to include this reference within your SQL scripts as you start exploring the data and answering the case study questions.
I’m using MySQL to create this schema. If you want to copy paste this query, go to my Github.
Case Study Question
This case study has LOTS of questions — they are broken up by area of focus including:
- Pizza Metrics
- Runner and Customer Experience
- Ingredient Optimisation
- Pricing and Ratings
- Bonus DML Challenges (DML = Data Manipulation Language) Each of the following case study questions can be answered using a single SQL statement. Again, there are many questions in this case study — please feel free to pick and choose which ones you’d like to try!
Clean up your data!
Before you start writing your SQL queries however — you might want to investigate the data, you may want to do something with some of those null values and data types in the customer_orders and runner_orders tables. I clean up the data by doing these:
- On the runner_orders table, I created the temporary table just in case we accidentally lost the data.
- Replace ‘null’ strings and NULL values with blank space “ ”.
- In distance column, remove “km” and replace “null” string with blank space “ ”.
- In duration column, remove “mins”, “minute”, “minutes”, and “null” strings then replace it with blank space “ ”.
- Modify the data type of pickup_time to DATETIME, distance to FLOAT, and duration to INT.
2. On the customer_orders table, I created a temporary table. This also to avoid the data loss.
- Replace “null” strings and NULL values with blank space in exclusions and extras column.
If you get stuck, you can check out my query on my Github! Now, let’s get into the questions.
A. Pizza Metrics
- How many pizzas were ordered?
Here I used the COUNT clause to count pizza_id as how many ordered of pizzas had been made and extract it from the temp_customer_orders table.
- There are 14 pizzas were ordered.
2. How many unique customer orders were made?
Because it is asked of the unique customer, I used DISTINCT statement to get return only different values and aggregate it with COUNT clause to find out how many unique customer orders were made.
- There are 10 customer unique customer orders.
3. How many successful orders were delivered by each runner?
I used COUNT clause to count the successful orders and I used GROUP BY to arrange identical data into groups of runner_id, then I used the WHERE clause to filter the cancellation is a blank space. A blank space in cancellation column indicates the order is successfully delivered.
- Runner 1 has succesfully delivered 4 pizzas.
- Runner 2 has succesfully delivered 3 pizzas.
- Runner 3 has succesfully delivered 1 pizza.
4. How many of each type of pizza was delivered?
I used COUNT clause to count how many type of pizza was delivered and GROUP BY the aggregation with pizza_name from the pizza_names table. I used INNER JOIN to select the same values in tables. I joined the pizza_names table with temp_customer_orders by the pizza_id because there is pizza_id in both tables. Then, I joined the temp_runner_orders with temp_customer_orders by order_id. Why I join the temp_runner_orders here? Because we want to include the cancellation, so we can make sure that the pizzas were delivered.
- There are 9 Meatlovers pizzas were delivered
- There are 3 Vegetarian pizzas were delivered
5. How many Vegetarian and Meatlovers were ordered by each customer?
Using SUM with IF clause which if the pizza_id is 1 or 2 it returns 1 otherwise it returns 0. I used GROUP BY to group the summary of the pizza_id by the customer_id, from that we can find out how many Meat lovers pizzas and Vegetarian pizzas were delivered.
- Customer 101 ordered 2 Meatlovers and 1 Vegetarian pizzas.
- Customer 102 ordered 2 Meatlovers and 1 Vegetarian pizzas.
- Customer 103 ordered 3 Meatlovers and 1 Vegetarian pizzas.
- Customer 104 ordered 3 Meatlovers pizzas.
- Customer 105 ordered 1 Vegetarian pizza.
6. What was the maximum number of pizzas delivered in a single order?
To know what was the maximum number of pizzas delivered in a single order, we should count how many pizzas successfully delivered by using CTE to make a temporary result within a single statement. I used COUNT clause to count the pizza_id and GROUP BY order_id, also JOIN the temp_customer_orders and temp_runner_orders by order_id, and adding condition with WHERE statement which cancellation = “ ” from the temp_runner_orders. Then we can find the maximum number with the MAX clause and extract the result from the CTE.
- Maximum number of pizzas delivered in a single order is 3 pizzas.
7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
It will summarize the delivered pizza with or without changes by implementing the CASE statements. Because we want to find out the delivered pizzas, we can JOIN the temp_runner_orders and temp_customer_orders tables, and with condition using WHERE clause which distance is not equal to 0.
- Customer 101 and 102 ordered pizzas with no changes.
- Customer 103, 104, and 105 ordered pizzas with at least 1 change either ordered their pizzas with exclusions or extras. Customer 104 also ordered pizza with no change once.
8. How many pizzas were delivered that had both exclusions and extras?
I used the COUNT clause to count the orders that had both exclusions and extras then group it by customer_id using GROUP BY clause. JOIN the temp_runner_orders and temp_customer_orders by order_id, and using WHERE clause to get the condition which exclusions and extras is not “ ” (a blank space), and cancellation from the temp_runner_orders is equal to “ ” (a blank space).
- There is just one pizza delivered with both exclusions and extras
9. What was the total volume of pizzas ordered for each hour of the day?
I used HOUR() to extract ‘hour’ from the order_date column, and COUNT the order_id to find out the total volume of pizzas ordered. Added condition with the WHERE clause that order_date is not equal to “ ” (a blank space). We can group it with the column we extracted before which is hour_of_day using GROUP BY and sorted it in ascending order using ORDER BY clause.
- The highest volume of pizzas ordered at 13.00, 18.00, 21.00, and 23.00
- The lowest volume of pizzas ordered at 11.00 and 19.00
10. What was the volume of orders for each day of the week?
I used DAYNAME clause to extract the name of day from order_date column. Then COUNT it all to find out the volume of orders, you can use GROUP BY and ORDER BY clauses to grouped and sorted it by the day_of_week (the extracted day name).
- The highest volume of orders on Saturday and Wednesday with 5 pizzas.
- There are 3 ordered of pizzas on Thursday.
- The lowest volume of orders on Friday with only 1 pizza.
You can see the rest of the query on my Github by clicking the links below:
Along the journey you will find several temporary tables to simplify the query.
Thank you so much for reading and I hope you find this post useful. :)