Hi friends! I’m sorry I haven’t posted the next challenge in a while because I have a lot of things to do in real life. But, today I push myself to write this post so we can learn SQL further.
As you know, this SQL challenges really improve my logical thinking to solve the problem through the query language. This is the third challenge of the 8 Week SQL challenge. This challenge gets harder on each week. But, if you sit down for little longer I believe your skills will improve. Check out the challenge here!
Subscription based businesses are super popular and Danny realised that there was a large gap in the market — he wanted to create a new streaming service that only had food related content — something like Netflix but with only cooking shows!
Danny finds a few smart friends to launch his new startup Foodie-Fi in 2020 and started selling monthly and annual subscriptions, giving their customers unlimited on-demand access to exclusive food videos from around the world!
Danny created Foodie-Fi with a data driven mindset and wanted to ensure all future investment decisions and new features were decided using data. This case study focuses on using subscription style digital data to answer important business questions.
Danny has shared the data design for Foodie-Fi and also short descriptions on each of the database tables — our case study focuses on only 2 tables but there will be a challenge to create a new table for the Foodie-Fi team.
All datasets exist within the
foodie_fi database schema - be sure to include this reference within your SQL scripts as you start exploring the data and answering the case study questions.
Entity Relationship Diagram
Table 1: plans
Customers can choose which plans to join Foodie-Fi when they first sign up. Basic plan customers have limited access and can only stream their videos and is only available monthly at $9.90. Pro plan customers have no watch time limits and are able to download videos for offline viewing. Pro plans start at $19.90 a month or $199 for an annual subscription. Customers can sign up to an initial 7 day free trial will automatically continue with the pro monthly subscription plan unless they cancel, downgrade to basic or upgrade to an annual pro plan at any point during the trial. When customers cancel their Foodie-Fi service — they will have a
churn plan record with a
null price but their plan will continue until the end of the billing period.
Table 2: subscriptions
Customer subscriptions show the exact date where their specific
If customers downgrade from a pro plan or cancel their subscription — the higher plan will remain in place until the period is over — the
start_date in the
subscriptions table will reflect the date that the actual plan changes.
When customers upgrade their account from a basic plan to a pro or annual pro plan — the higher plan will take effect straightaway.
When customers churn — they will keep their access until the end of their current billing period but the
start_date will be technically the day they decided to cancel their service.
You can check this link to create these tables.
Case Study Questions
This case study is split into an initial data understanding question before diving straight into data analysis questions before finishing with 1 single extension challenge.
Based off the sample customers provided in the sample from the
subscriptions table, write a brief description about each customer’s onboarding journey.
Try to keep it as short as possible — you may also want to run some sort of join to make your explanations a bit easier!
Use the SELECT statement to return the customer_id, plan_name, and start_date to find the customer’s journey. Here is the result of one of the customers.
As you can see, the customer number 6 started the trial plan in 2020–12–23 and upgraded the plan to basic monthly a week later, until the customer number 6 churned the plan in 2021–02–26. That’s how you read the customer’s journey.
Data Analysis Questions
- How many customers has Foodie-Fi ever had?
You can use the COUNT DISTINCT clause to return a count of number rows of customer_id with different values.
Foodie-Fi has 1000 customers.
2. What is the monthly distribution of trial plan start_date values for our dataset — use the start of the month as the GROUP BY value
- Use the MONTH clause to return the month part of the date and named it as months for the GROUP BY value
- Use COUNT clause to count the distribution of trial plan, since every customer starts with the trial plan we don’t need to filter the plan_id.
March has the biggest number of trial plan distribution.
3. What plan ‘start_date’ values occur after the year 2020 for our dataset? Show the breakdown by count of events for each ‘plan_name’.
- First, we have to JOIN the start_date which in subscriptions table and plan_name which in plans table.
- To return values after the year 2020, we can filter the values by using WHERE statement.
The number of customers who churned the plan is the biggest one after the year 2020.
4. What is the customer count and percentage of customers who have churned the rounded to 1 decimal place?
To find the percentage we can calculate the number of rows multiplied by 100 divided by the total number of customers, then we use ROUND clause and type 1 to rounded to 1 decimal place.
- It is 30.7% of customers who have churned the plans.
5. How many the customers have churned straight after their initial free trial — what the percentage is this rounded to the nearest whole number?
- We use CTE to create a temporary result that can be referred later on.
- LAG clause is used to return the value of the expression from the row that precedes the current row by offset number of rows within its partition or result set.
- Then, we named the result from the LAG clause as prev_plan so we can recall it outside the CTE set.
There are 92 customers who have churned straight after their initial free trial, which 9% of the customer base.
6. What is the number and percentage of customer plans after their initial free trial?
- First, we create CTE to look forward a number of rows and access data of that row from the current row by using LEAD clause and named it as next_plan.
- To calculate the percentage, we can multiply the number of transactions by 100 and divide it by the number of customer.
- Filter the plan_id = 0 because we want to find the percentage of the trial plan.
- 54.6% of customers choose basic monthly or next_plan = 1 after their initial trial.
- 32.5% of customers choose pro monthly or next_plan = 2 after their initial trial.
- 3.7% of customers choose pro annual or next_plan = 3 after their initial trial.
- 9.2% of customers choose churn or next_plan = 4 after their initial trial.
7. What is the customer count and percentage breakdown of all 5 plan_name values at 2020–12–31?
To breakdown all 5 plan_name at the exact date which is 2020–12–31, there are few steps to do:
- Create CTE to find out the date everytime the customers change their subscription plans, then filter the start_date less or equal to ‘2020–12–31’. Named it as cte_next_date.
- Create other cte to count how many customers take each subscriptions.
- Last, outside the cte we call the plan_id, total customers of each plans, and the percentage of each plans.
8. How many customers have upgraded to an annual in 2020?
Use the COUNT clause to find the number of customers who take the ‘pro annual’ that we can filter using the WHERE clause of plan_id = 3.
There’s 195 customers who have upgraded to annual in 2020.
9. How many days on average does it take for a customer to an annual plan from the day they joined Foodie-Fi?
- Create two CTEs to distinguish the plan_id, first cte to filter the annual plan and the second cte is to filter the trial plan using the WHERE clause.
- Then, use DATEDIFF to return the number of days between two date values.
- Next, we find the average of days for how long the customers take the annual plan using AVERAGE clause.
On average, it takes 105 days for a customer take an annual plan from the day they joined Foodie-Fi.
10. Can you further breakdown this average value into 30 day periods? (i.e. 0–30 days, 31–60 days etc)
From the previous query we can add a few statements to breakdown the average value into 30 days period:
- Create the new cte named bins and use the FLOOR clause to return the largest integer value that is less than or equal to the difference between annual_date and trial_date (named it as diff in day_period CTE) divided by 30.
- Use CONCAT clause to create the 30 days period.
11. How many customers downgraded from a pro monthly to a basic monthly plan in 2020?
- Use LEAD clause to find out the customers’ further plans.
- Then COUNT different value of customer_id using COUNT DISTINCT.
- Last, filter the plan_name = ‘pro monthly’ and plan_id = 1 which is basic monthly, and start_date <= ‘2020–12–31’
There is none of the customers downgraded from a pro monthly to a basic monthly plan in 2020.
You can see the rest of the query on my Github by clicking this link. Let me know if you find the easier way to query this challenge. Thank you so much for reading and I hope you find this post useful. :)