Leveraging Correlation and Clustering: An Exploration of Data Scientist Salaries

Authors

  • Chandra Agoeng Faculty of Computer Science, Universitas Mercu Buana , 11650 Jakarta, Indonesia
  • Nurul Dini Faqriah Miza Azmi Faculty of Data Science and Computing, Universiti Malaysia Kelantan, 16100 Kota Bharu, Kelantan, Malaysia
  • Hakimah Mat Harun Faculty of Data Science and Computing, Universiti Malaysia Kelantan, 16100 Kota Bharu, Kelantan, Malaysia
  • Nurzulaikha Abdullah Faculty of Data Science and Computing, Universiti Malaysia Kelantan, 16100 Kota Bharu, Kelantan, Malaysia
  • Wan Azani Mustafa Faculty of Electrical Engineering & Technology, Universiti Malaysia Perlis, Pauh Putra Campus, 02600 Arau, Perlis
  • Fakhitah Ridzuan Faculty of Data Science and Computing, Universiti Malaysia Kelantan, 16100 Kota Bharu, Kelantan, Malaysia

DOI:

https://doi.org/10.37934/arca.35.1.1020

Keywords:

Salary, exploratory data analysis, data scientist, clustering, correlation

Abstract

Data science is a dynamic field with ever-evolving job descriptions and salary structures.  While data science offers high earning potential, the factors influencing data scientist salaries remain unclear. This lack of clarity makes it challenging for both employers to determine competitive compensation packages and for employees to understand how career choices like experience level and job title can impact their earning potential. Thus, this study aims to explore the interrelationship between related variables with salary. To achieve the objectives of this research, correlation analysis was employed to identify the strength and direction of linear relationships between these attributes in the dataset. Additionally, k-means clustering was utilized to group data scientists with similar characteristics, allowing for the exploration of potential salary segments within the data science field. It was found that there was a very strong correlation between employee residence and company location (r=0.90). There was a significant moderate positive correlation between salary with company location (r=0.46), residence (r=0.48) and experience level (r=0.41) respectively. Based on the clustering analysis, the group was divided into four different popular roles in data science salary group. Therefore, employers can leverage this knowledge to design the salary packages considering location and experience.  

Author Biography

Fakhitah Ridzuan, Faculty of Data Science and Computing, Universiti Malaysia Kelantan, 16100 Kota Bharu, Kelantan, Malaysia

fakhitah.r@umk.edu.my

Downloads

Published

2024-05-17
صندلی اداری سرور مجازی ایران Decentralized Exchange

Issue

Section

Articles
فروشگاه اینترنتی