Leveraging Correlation and Clustering: An Exploration of Data Scientist Salaries

Chandra  Agoeng; Nurul Dini Faqriah  Miza Azmi; Hakimah  Mat Harun; Nurzulaikha  Abdullah; Wan Azani  Mustafa; Fakhitah  Ridzuan

doi:10.37934/arca.35.1.1020

Authors

Chandra Agoeng Faculty of Computer Science, Universitas Mercu Buana , 11650 Jakarta, Indonesia
Nurul Dini Faqriah Miza Azmi Faculty of Data Science and Computing, Universiti Malaysia Kelantan, 16100 Kota Bharu, Kelantan, Malaysia
Hakimah Mat Harun Faculty of Data Science and Computing, Universiti Malaysia Kelantan, 16100 Kota Bharu, Kelantan, Malaysia
Nurzulaikha Abdullah Faculty of Data Science and Computing, Universiti Malaysia Kelantan, 16100 Kota Bharu, Kelantan, Malaysia
Wan Azani Mustafa Faculty of Electrical Engineering & Technology, Universiti Malaysia Perlis, Pauh Putra Campus, 02600 Arau, Perlis
Fakhitah Ridzuan Faculty of Data Science and Computing, Universiti Malaysia Kelantan, 16100 Kota Bharu, Kelantan, Malaysia

DOI:

https://doi.org/10.37934/arca.35.1.1020

Keywords:

Salary, exploratory data analysis, data scientist, clustering, correlation

Abstract

Data science is a dynamic field with ever-evolving job descriptions and salary structures. While data science offers high earning potential, the factors influencing data scientist salaries remain unclear. This lack of clarity makes it challenging for both employers to determine competitive compensation packages and for employees to understand how career choices like experience level and job title can impact their earning potential. Thus, this study aims to explore the interrelationship between related variables with salary. To achieve the objectives of this research, correlation analysis was employed to identify the strength and direction of linear relationships between these attributes in the dataset. Additionally, k-means clustering was utilized to group data scientists with similar characteristics, allowing for the exploration of potential salary segments within the data science field. It was found that there was a very strong correlation between employee residence and company location (r=0.90). There was a significant moderate positive correlation between salary with company location (r=0.46), residence (r=0.48) and experience level (r=0.41) respectively. Based on the clustering analysis, the group was divided into four different popular roles in data science salary group. Therefore, employers can leverage this knowledge to design the salary packages considering location and experience.