Sitemap

My Experience with Machine learning — From Data to Insights

3 min readAug 5, 2024

Through a series of structured and tough tasks, I have gotten the chance to touch deeper into what machine learning is all about. This learning path has not only enhanced my understanding of fundamental ML concepts but also enabled me to delve into how different models and techniques were being implemented. Like the A-Z creative writing challenge for a month, I finished several tasks and learned quite a bit from each challenge — of which you’ll find some highlights in this article.

Problem Solving by Using Linear Regression for Housing Prices

The most revealing task was applying a linear regression model to make predictions of housing prices with the Boston Housing dataset. There were a few crucial steps to this task:

1. Dataset: I loaded the Boston Housing dataset with Scikit-Learn which has different features e.g. Number of rooms, Crime rate and Property tax rate etc.

2. Model Training: Trained a linear regression model on the data

3. Model Coefficients and Intercept: I calculated, as can be seen above in the image from a Jupiter notebook printed to demonstrate the impact each feature has on housing prices.

4. Predictions and Evaluation: I predicted prices of the house on a test set and calculated mean squared error as evaluation.

5. Visualization: Visualized the scatter plot along with regression line to better understand fitness of model.

# Load the dataset
boston = load_boston()
X = boston.data
y = boston.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
p
# Visualization
plt.scatter(y_test, y_pred)
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs Predicted Housing Prices')
plt.show()

Iris Dataset Decision Tree Classifier

A more exciting project in which a decision tree classifier was created to classify species of iris flowers. The steps involved loading the Iris dataset, training our classifier, testing its performance by printing a classification report and confusion matrix and finally visualizing the decision tree to grasp how decisions are being made.

# Predict and evaluate
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Exploratory Data Analysis and Basic Statistics

A series that developed an in depth understanding of data exploration and simple statistics. Common activities were loading and viewing datasets before putting the final preprocessing steps into a function or class that we can reuse, implementing some code to get back how many samples vs. features you have in your dataset and doing basic stats as mean/median/std for every feature (up next barplots) and potentially draw histograms of specific data just to see its distribution.

df.describe()
df.hist(column='feature_name')

Simple linear regression

I also did an exercise where I built a simple linear regression model using scikit-learn, that allowed me to understand how this works in practice and what steps we should consider for its evaluation.

model.fit(X_train, y_train)
print("Coefficient:", model.coef_)

My experience has been awesome and full of knowledge as a fellow in Bytewise. These tasks were the practical part of implementing and evaluating Machine learning models, Data exploration as well as understanding insights behind different Types of algorithms. Besides increasing my technical knowledge this journey also added confidence in me to work on machine learning towards solving real-world problems.

Feel free to check out my GitHub repository 100DaysOfML-DLBytewise to see the detailed implementations and more.

--

--

Mohammad Yaqoob
Mohammad Yaqoob

Written by Mohammad Yaqoob

0 followers

👋 Hi there! I’m a Computer Science student at Sukkur IBA University and a Machine Learning Engineer. Follow for insights on tech, coding, and data science!

No responses yet