Full Data Analysis Project: SQL + Tableau (HR Dataset)

An HR dataset was preprocessed, and three Tableau dashboards were created to highlight key HR KPIs.

Introduction

In this project, I preprocessed an HR dataset and created three insightful Tableau dashboards to visualize the data. The objective was to clean and transform the raw data to make it suitable for analysis, and then present key insights in an interactive and user-friendly dashboard.

These dashboards and analyses are based on HR KPIs (Key Performance Indicators). For a deeper understanding, you can read my Medium articles about the most important HR KPIs, available in two parts with formulas and examples:


Project Steps

1. Data Collection : 

The dataset was sourced from HR Dataset Kaggle.
It contained 311 records and 36 features, including:

Employee_NameEmpIDMarriedIDMaritalStatusID
GenderIDEmpStatusIDDeptIDPerfScoreID
FromDiversityJobFairIDSalaryTermdPositionID
PositionStateStateDOB
SexMaritalDescCitizenDescHispanicLatino
RaceDescDateofHireDateofTerminationTermReason
EmploymentStatusDepartmentManagerNameManagerID
RecruitmentSourcePerformanceScoreEngagementSurveyEmpSatisfaction
SpecialProjectsCountLastPerformanceReview_DateDaysLateLast30Absences

2. Data Cleaning by SQL :  

Data cleaning was performed to ensure the dataset was free from errors and inconsistencies.
The following steps were taken:

  • Remove Duplicates
  • Standardizing data
  • Convert data types
  • Check Missing/NULL Values

Overview of the SQL Database (PostgreSQL)

Download SQL queries: ../Data-Cleaning-SQL-Queries.sql
SQL
CREATE TABLE employee_tbl (
    Employee_Name VARCHAR(100),
    EmpID INT PRIMARY KEY,
    MarriedID INT,
    MaritalStatusID INT,
    GenderID INT,
    EmpStatusID INT,
    DeptID INT,
    PerfScoreID INT,
    FromDiversityJobFairID INT,
    Salary NUMERIC(12, 2),
    Termd VARCHAR(10),
    PositionID INT,
    Position VARCHAR(100),
    State VARCHAR(50),
    Zip VARCHAR(20),
    DOB VARCHAR(20),
    Sex VARCHAR(10),
    MaritalDesc VARCHAR(50),
    CitizenDesc VARCHAR(50),
    HispanicLatino VARCHAR(10),
    RaceDesc VARCHAR(50),
    DateofHire VARCHAR(20),
    DateofTermination VARCHAR(20),
    TermReason VARCHAR(100),
    EmploymentStatus VARCHAR(50),
    Department VARCHAR(50),
    ManagerName VARCHAR(100),
    ManagerID INT,
    RecruitmentSource VARCHAR(100),
    PerformanceScore VARCHAR(50),
    EngagementSurvey NUMERIC(5, 2),
    EmpSatisfaction INT,
    SpecialProjectsCount INT,
    LastPerformanceReview_Date VARCHAR(20),
    DaysLateLast30 INT,
    Absences INT
);


SELECT *
FROM employee_tbl

-- Data Cleaning

-- Remove Duplicates
SELECT *,
       ROW_NUMBER() OVER(PARTITION BY employee_name, empid, salary, dob) AS ROW_NUM
FROM employee_tbl;



WITH duplicate_cte AS
(
SELECT *,
       ROW_NUMBER() OVER(PARTITION BY employee_name, empid, salary, dob) AS ROW_NUM
FROM employee_tbl
)
SELECT *
FROM duplicate_cte
WHERE ROW_NUM > 1;



-- Standardizing data

SELECT position, TRIM(position)
FROM employee_tbl


UPDATE employee_tbl
SET position = TRIM(position)


-- Convert data types

SELECT dob, TO_DATE(dob, 'MM/DD/YY')
FROM employee_tbl;


UPDATE employee_tbl
SET dob = TO_DATE(dob, 'MM/DD/YY');


SELECT dob
FROM employee_tbl;


ALTER TABLE employee_tbl
ALTER COLUMN dob TYPE DATE
USING dob::DATE;


-- Convert other Columns

SELECT dateoftermination, TO_DATE(dateoftermination, 'MM/DD/YY'),
		lastperformancereview_date, TO_DATE(lastperformancereview_date, 'MM/DD/YY'),
		dateofhire, TO_DATE(dateofhire, 'MM/DD/YY')
FROM employee_tbl;

ALTER TABLE employee_tbl
ALTER COLUMN dateoftermination TYPE DATE
USING TO_DATE(dateoftermination, 'MM/DD/YY');


ALTER TABLE employee_tbl
ALTER COLUMN lastperformancereview_date TYPE DATE
USING TO_DATE(lastperformancereview_date, 'MM/DD/YY');


ALTER TABLE employee_tbl
ALTER COLUMN dateofhire TYPE DATE
USING TO_DATE(dateofhire, 'MM/DD/YY');



SELECT dateoftermination,
		lastperformancereview_date,
		dateofhire
FROM employee_tbl;


-- check Missing/NULL Values 

SELECT *
FROM employee_tbl

SELECT DISTINCT termreason
FROM employee_tbl
WHERE termreason IS NULL


SELECT employee_name ,dateoftermination, termreason
FROM employee_tbl
WHERE  dateoftermination IS NULL
View Full Code


3. Connecting Tableau to PostgreSQL as a data source


4. Visualize the Data by Tableau

  • Or Download the PDF version
 ../People-Analytics-Dashboard.pdf


Remind that codes can be found in the corresponding GitHub repository, or you can access them below :


Preprocessing by SQL (.sql)
HR Dataset (.csv)
Tableau Dashboards file (.twbx)(.pdf)

Thank you for taking the time to read this article; your valuable feedback is warmly welcomed.
Furthermore, I would be happy to assist you in solving a puzzle in your Data journey.
pouya [at] sattari [dot] org