Site icon Research

Data Analytics using Python

Setup your environment

Installed Microsoft VS Code
Get a Github education account
Installed extensions like Github Copilot, Data Wrangler on VS Code.
Use Data Diagram tools like SQLDraw

Visually scan datasets

Examine each dataset visually to identify patterns. E.g. What is EmpCode? Is it EmployeeCode?

Change easily readable variable names by using a widely accepted naming convention.

Using a tool like Excel or even NotePad

Rename Empcode to employeeCode or EmployeeCode or employee_code, so you understand it better.

Using DataFrames in Python

# Rename columns in the DataFrames for a better naming convention
df_employee.rename(columns={'EmpCode': 'employee_Code', 'Gender': 'gender', 'DateofJoin': 'joined_Date', 'Estate': 'estate_Name'}, inplace=True)

Check for data quality (null, wrongly input data, incomplete records)

Treat missing values

Here you can use a ML classifier like K-Nearest Neighbor

Identify relationships between datasets

Here you can run matching commands in python to identify unique values and match them in different dataframes.

Merge DataFrames

# merge the two dataframes on the basis of EmpCode and ID columns
merged_employee_fields_df = pd.merge(employee_df, fields_df, left_on='employee_Code', right_on='field_Id')
Exit mobile version