Duplicate Emails

Write an SQL query to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL. Return the result table in any order.

table name: Email


Input:


Expected Output:

Question Explanation:
We have given a table named Person in which person_id and their emails are stored. Same emails has been saved multiple times. Our task is to return those emails which are not unique, which has been stored multiple times (more than 1) in table.
Concepts we are going to cover before directly jumping to solution

    • count() function
    • group by clause
    • having clause


    Count() Function
    count is a built-in  function which count the number of rows to be returned.

    Consider the below query
    select count(*) from Person
    select count(id) from Person
    Since, we haven't provided any where clause it will fetch all the 3 rows from the table and count it. Then return 3 (as number of rows) as output.
    These are termed as aggregate functions. 

    Group by Clause
    Group by clause is used to fetch some kind of information based on certain similarities. All aggreate functions is used to fetch information.
    Consider a table student where we have columns like id,name,class and score. We want to count number of students in each class.
    select class, count(*) as student_count from student group by class.
    It will basically divide the total records in different groups and then count students in those particular groups one by one and will return those number of rows as output.

    Output

    So, the same concept we are going to apply to solve our main question which is to return duplicate emails. Since id is unique for each record and if we apply group by on id column then it won't help because it will create those many groups based on each unique id.
    If we apply group by clause on email it will create group for each unique emails present in whole table.

    select email from Person group by email
    Two groups has been created internally as there is only 2 unique emails.

    Count() with group by
    Now, let's count how many times a particular email is repeated.
    For example
    • a@b.com is repeated two times.
    • c@d is only one time.
    select count(*) as repeated, email from Person group by email


    Did you guess the solution? What was our main question to return all those emails which are repeated meaning it should be present more than 1 time.
    Can we apply a condition like return all those emails which count(email) is is greater than 1?

    Yes! Got it right.
    select email from Person group by email
    where count(email) >1

    We can't apply where clause after group by. Why because where clause filter records from the table itself and return those records only which fits into the condition. But in case of group by clause it already has filtered the records in different groups. So, in that case having clause is used which basically used to filter the records from the group created by group by clause.

    Having clause is always used in aggregated functions and group by clause.
    select email from Person having email=''"
    Where clause can be applied in group by only before group by clause not after that.

    This is valid query
    select email from Person where email ='' group by email

    Final Solution:
    select email from Person group by email
    having count(email)>1

    Output:

    SQL Script:
    CREATE TABLE [dbo].[Person](
    [id] [int] NOT NULL,
    [email] [varchar](150) NULL,
    )
    GO
    INSERT [dbo].[Person] ([id], [email]) VALUES (1, N'a@b.com')
    GO
    INSERT [dbo].[Person] ([id], [email]) VALUES (2, N'c@d.com')
    GO
    INSERT [dbo].[Person] ([id], [email]) VALUES (3, N'a@b.com')
    GO

    Comments (0)