<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[What is Artificial Intelligence? A Beginner’s Guide]]></title><description><![CDATA[A step-by-step journey through AI, Machine Learning, and Deep Learning, covering concepts, applications, and practical insights.]]></description><link>https://ai-beginners-journey.hashnode.dev</link><image><url>https://cdn.hashnode.com/uploads/logos/6942cafec9d5320a12aa01b1/8bfb40bf-90b6-42f2-9ad6-2b1a10e2e1a1.png</url><title>What is Artificial Intelligence? A Beginner’s Guide</title><link>https://ai-beginners-journey.hashnode.dev</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 18 Jun 2026 02:08:14 GMT</lastBuildDate><atom:link href="https://ai-beginners-journey.hashnode.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Part 10: Introduction to Linear Regression]]></title><description><![CDATA[So far in this series, we've explored datasets, preprocessing, train-test splitting, overfitting, underfitting, bias, and variance. Now it's time to dive into the algorithms that actually make predict]]></description><link>https://ai-beginners-journey.hashnode.dev/understanding-linear-regression</link><guid isPermaLink="true">https://ai-beginners-journey.hashnode.dev/understanding-linear-regression</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[AI]]></category><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[Beginner Developers]]></category><category><![CDATA[linearregression]]></category><category><![CDATA[journey into tech]]></category><dc:creator><![CDATA[Divyajot Kaur]]></dc:creator><pubDate>Thu, 11 Jun 2026 17:03:47 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/f1671c20-2413-456b-adb9-21d63d0a85f6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>So far in this series, we've explored datasets, preprocessing, train-test splitting, overfitting, underfitting, bias, and variance. Now it's time to dive into the algorithms that actually make predictions.</p>
<p>We'll begin with one of the simplest and most important supervised learning algorithms: <strong>Linear Regression</strong>.</p>
<p>Despite its simplicity, Linear Regression is widely used in real-world applications and serves as the foundation for many advanced machine learning techniques.</p>
<p>Let's understand how it works.</p>
<h2>🤔<strong>What is Linear Regression?</strong></h2>
<p>Linear Regression is a supervised learning algorithm used to predict <strong>continuous values</strong> by finding the relationship between input features and an output variable.</p>
<p>A continuous value is a number that can take any value within a range.</p>
<p>In simple terms, it helps answer questions like:</p>
<ul>
<li><p>What will the house price be?</p>
</li>
<li><p>What will next month's sales be?</p>
</li>
<li><p>What score might a student get in an exam?</p>
</li>
</ul>
<p>Unlike classification algorithms that predict categories such as "Spam" or "Not Spam," Linear Regression predicts numerical values.</p>
<h2>📈<strong>Prediction of Continuous Values</strong></h2>
<p>Imagine you're trying to predict a student's exam score based on the number of hours they study.</p>
<p>You might notice a pattern:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/45c457a8-8cf5-4dd1-9fef-3c7d8c5296e6.png" alt="" style="display:block;margin:0 auto" />

<p>As study hours increase, exam scores tends to increase as well.</p>
<p>Linear Regression learns this relationship from the data and uses it to make predictions.</p>
<p>For example:</p>
<p>If a student studies for 7 hours, the model can estimate their expected score.</p>
<p>This ability to predict numerical values is what makes Linear Regression useful for many real-world problems.</p>
<h2>🎯<strong>Understanding the Line Fitting Intuition</strong></h2>
<p>Suppose we plot the study hours and exam scores on a graph.</p>
<p>The points may not form a perfect straight line, but we can still see a trend.</p>
<p>Linear Regression tries to draw a line that best represents this trend.</p>
<p>The goal is not to pass through every data point.</p>
<p>Instead, the goal is to find a line that stays as close as possible to most of the points.</p>
<p>Think of it like this:</p>
<p>Imagine placing a ruler through a scattered set of points on paper. You would position it where it represents the overall pattern rather than trying to touch every point.</p>
<p>Linear Regression does the same thing mathematically.</p>
<p>This line is often called the <strong>Best Fit Line</strong>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/66ffcd2d-db01-435f-8324-26ee370140ae.png" alt="" style="display:block;margin:0 auto" />

<h2>🧮<strong>How Does the Model Know If the Line Is Good?</strong></h2>
<p>Once a line is drawn, the model compares its predictions with the actual values.</p>
<p>For every data point:</p>
<p><strong>Error = Actual Value − Predicted Value</strong></p>
<p>A good line produces smaller errors.</p>
<p>A poor line produces larger errors.</p>
<p>The model's goal is to reduce these errors as much as possible.</p>
<p>This is where the concept of a <strong>Cost Function</strong> comes in.</p>
<h2>📉<strong>Cost Function (High-Level Overview)</strong></h2>
<p>A Cost Function is a measure of how well or poorly the model is performing.</p>
<p>It calculates the overall error made by the model across all data points.</p>
<p>In simple words, it tells us <strong>how far the model's predictions are from the actual values</strong>.</p>
<h3><strong>Simple Idea</strong></h3>
<ul>
<li><p>Smaller cost = Better predictions</p>
</li>
<li><p>Larger cost = Worse predictions</p>
</li>
</ul>
<p>During training, Linear Regression continuously adjusts the line to reduce the cost.</p>
<p>You can think of it as a student improving their answers after checking mistakes on a practice test.</p>
<p>The fewer mistakes they make, the better their performance.</p>
<p>Similarly, the model keeps improving its line until the overall error becomes as small as possible.</p>
<h2>🌍<strong>Real-World Use Cases</strong></h2>
<p>Linear Regression is used in many industries because of its simplicity and effectiveness.</p>
<h3>1. House Price Prediction</h3>
<p>Predict the price of a house based on features such as size, location, and number of bedrooms.</p>
<h3>2. Sales Forecasting</h3>
<p>Estimate future sales using historical sales data.</p>
<h3>3. Salary Prediction</h3>
<p>Predict a person's salary based on factors such as experience and education.</p>
<h3>4. Stock Market Analysis</h3>
<p>Estimate future trends using historical numerical data.</p>
<h3>5. Energy Consumption Prediction</h3>
<p>Forecast electricity usage based on previous consumption patterns.</p>
<h2>📝<strong>Conclusion</strong></h2>
<p>Linear Regression is a simple yet powerful algorithm for predicting continuous values. By learning patterns from data and fitting the best possible line, it helps make accurate predictions for a wide range of real-world problems.</p>
<p>Understanding Linear Regression is an important first step because many advanced machine learning algorithms build upon the same fundamental ideas.</p>
<h2>🚀Coming Up Next</h2>
<p>We've learned how Linear Regression makes predictions using a best-fit line. In the next blog, we'll explore <strong>Gradient Descent</strong> and understand how the model finds that line by continuously reducing errors.</p>
]]></content:encoded></item><item><title><![CDATA[Part 9: Bias–Variance Tradeoff in Machine Learning]]></title><description><![CDATA[Imagine two students preparing for an exam.

One student studies only a few topics and performs poorly because they don't understand enough concepts.

Another student memorizes every question from pre]]></description><link>https://ai-beginners-journey.hashnode.dev/part-9-bias-variance-tradeoff-in-machine-learning</link><guid isPermaLink="true">https://ai-beginners-journey.hashnode.dev/part-9-bias-variance-tradeoff-in-machine-learning</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[AI]]></category><category><![CDATA[bias variance]]></category><category><![CDATA[beginnersguide]]></category><dc:creator><![CDATA[Divyajot Kaur]]></dc:creator><pubDate>Thu, 04 Jun 2026 09:33:46 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/63d04c3c-e559-4011-a607-9c24fc00da65.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine two students preparing for an exam.</p>
<ul>
<li><p>One student studies only a few topics and performs poorly because they don't understand enough concepts.</p>
</li>
<li><p>Another student memorizes every question from previous exams but struggles when new questions appear.</p>
</li>
</ul>
<p>Both students have problems, but for different reasons.</p>
<p>This is exactly what happens in Machine Learning.</p>
<p>Some models learn too little and fail to capture important patterns, while others learn too much and end up memorizing the training data.</p>
<p>That's why understanding the role of <strong>Bias</strong> and <strong>Variance</strong> is essential. These two concepts help us understand why models underfit, why they overfit, and how we can build models that generalize well to unseen data.</p>
<p>To see how these concepts influence a model's performance, let's first understand them individually.</p>
<h2>🎯<strong>What is Bias?</strong></h2>
<p>Bias is the error that occurs when a machine learning model is too simple to understand and capture the actual patterns in the data. Because the model makes too many assumptions, it misses important relationships and fails to learn effectively.</p>
<p>Think of it as a student who only studies chapter summaries instead of understanding the full concepts.</p>
<p><strong>Example</strong></p>
<p>Suppose we want to predict house prices.</p>
<p>The relationship between house size and price may look like a curve.</p>
<p>But if we use a simple straight-line model to represent this relationship, the model may miss important patterns.</p>
<p>As a result:</p>
<ul>
<li><p>Training accuracy is low</p>
</li>
<li><p>Testing accuracy is also low</p>
</li>
<li><p>The model performs poorly everywhere</p>
</li>
</ul>
<p>This is called <strong>underfitting</strong>.</p>
<p><strong>Signs of High Bias</strong></p>
<p>✅Model is too simple</p>
<p>✅Misses important patterns</p>
<p>✅High training error</p>
<p>✅High testing error</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/4061789c-cac2-4b27-80d4-8a6b13d57c89.png" alt="" style="display:block;margin:0 auto" />

<h2>📈What is Variance?</h2>
<p><strong>Variance</strong>  is the error that occurs when a model becomes <strong>too complex</strong> and learns not only the actual patterns but also the noise and random details in the training data.</p>
<p>A high-variance model becomes <strong>overly sensitive</strong> to the training data. As a result, it performs very well on the training data but struggles when it sees new data.</p>
<p>Think of it as a student who memorizes every practice question instead of understanding the concepts.</p>
<p><strong>Example</strong></p>
<p>Again, consider house price prediction.</p>
<p>If we create a very complex model that tries to fit every single training data point perfectly, it may capture noise instead of actual patterns.</p>
<p>The model performs extremely well on training data but struggles with new data.</p>
<p>This is called <strong>overfitting</strong>.</p>
<p><strong>Signs of High Variance</strong></p>
<p>✅Model is too complex</p>
<p>✅Memorizes training data</p>
<p>✅Very low training error</p>
<p>✅High testing error</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/7e13db34-6823-44bd-9913-1705ed822a5f.png" alt="" style="display:block;margin:0 auto" />

<h2>⚖️<strong>Understanding Underfitting and Overfitting</strong></h2>
<p>The concepts of bias and variance are closely connected to underfitting and overfitting.</p>
<h3>Underfitting = High Bias + Low Variance</h3>
<p>The model is too simple.</p>
<p>It cannot capture the true relationship in the data.</p>
<p><strong>Example</strong></p>
<p>A model predicting student marks using only the number of study hours while ignoring attendance, assignments, and previous performance.</p>
<p>The model misses important information and makes poor predictions.</p>
<h3><strong>Overfitting = Low Bias + High Variance</strong></h3>
<p>The model is too complex.</p>
<p>It learns both useful patterns and unnecessary noise.</p>
<p><strong>Example</strong></p>
<p>A model that memorizes every student's exact exam score from past years but fails when new students take the exam.</p>
<p>The model performs brilliantly during training but poorly in real-world situations.</p>
<h2>🤔<strong>Why Does the Bias–Variance Tradeoff Exist?</strong></h2>
<p>Increasing model complexity usually:</p>
<ul>
<li><p>Reduces bias</p>
</li>
<li><p>Increases variance</p>
</li>
</ul>
<p>Reducing model complexity usually:</p>
<ul>
<li><p>Reduces variance</p>
</li>
<li><p>Increases bias</p>
</li>
</ul>
<p>This creates a tradeoff.</p>
<p>The goal is to find the right balance where both bias and variance are reasonably low.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/7cc9c0e3-0370-4380-8996-300f1d451096.png" alt="" style="display:block;margin:0 auto" />

<h2>🛠<strong>How to Balance Bias and Variance</strong></h2>
<h3><strong>1. Choose the Right Model Complexity</strong></h3>
<p>Avoid models that are too simple or too complex.</p>
<p>Start with a reasonable model and evaluate its performance.</p>
<p><strong>Example</strong></p>
<p>For a classification task:</p>
<ul>
<li><p>A very shallow decision tree may underfit.</p>
</li>
<li><p>A very deep decision tree may overfit.</p>
</li>
</ul>
<p>A moderately deep tree often performs best.</p>
<h3><strong>2. Use More Training Data</strong></h3>
<p>More data helps the model focus on genuine patterns rather than memorizing noise, and reduces the chance of overfitting.</p>
<p><strong>Example</strong></p>
<p>A facial recognition system trained on only 100 images may overfit.</p>
<p>Training it on thousands of images helps it generalize better.</p>
<h3><strong>3. Apply Regularization</strong></h3>
<p>Regularization prevents models from becoming excessively complex, and helps it focus on important patterns in the data.</p>
<p>Common techniques include:</p>
<ul>
<li><p>L1 Regularization (Lasso)</p>
</li>
<li><p>L2 Regularization (Ridge)</p>
</li>
</ul>
<p>These methods help reduce overfitting.</p>
<h3><strong>4. Use Cross-Validation</strong></h3>
<p>Instead of evaluating a model on a single train-test split, cross-validation tests it on multiple portions of the dataset.</p>
<p>This provides a more reliable estimate of how the model will perform on unseen data.</p>
<h3><strong>5. Feature Selection</strong></h3>
<p>Not every feature is useful.</p>
<p>Removing irrelevant features can reduce variance and improve model performance.</p>
<p><strong>Example</strong></p>
<p>When predicting house prices, features like location, size, and number of bedrooms are usually more useful than wall color.</p>
<h2>🌍Real-World Example</h2>
<p>Imagine you're building a machine learning model to predict whether an email is spam or not.</p>
<h3><strong>High Bias Model</strong></h3>
<p>The model only checks whether the email contains the word "free."</p>
<p>Because of this many spam emails are missed.</p>
<p>Result: Underfitting.</p>
<h3><strong>High Variance Model</strong></h3>
<p>The model memorizes every word pattern from the training emails.</p>
<p>It performs extremely well on emails it has already seen, but struggles when new types of spam appear.</p>
<p>Result: Overfitting</p>
<h3><strong>Balanced Model</strong></h3>
<p>The model learns meaningful patterns that commonly appear in spam emails without memorizing every detail.</p>
<p>As a result, it can accurately classify both familiar and new emails.</p>
<p>Result: Better performance on new emails.</p>
<h2>📝Conclusion</h2>
<p>Bias and variance represent two common challenges in machine learning.</p>
<ul>
<li><p><strong>High Bias</strong> leads to <strong>underfitting</strong>, where the model fails to learn important patterns.</p>
</li>
<li><p><strong>High Variance</strong> leads to <strong>overfitting</strong>, where the model learns the training data too closely and struggles with new data.</p>
</li>
</ul>
<p>The goal is to find the right balance between the two.</p>
<p>The best machine learning models don't memorize the data—they learn meaningful patterns and perform well on unseen data. That's the essence of the <strong>Bias–Variance Tradeoff</strong>.</p>
<h2>🚀Coming Up Next</h2>
<p>So far, we've covered datasets, preprocessing, train-test splitting, overfitting, underfitting, bias, and variance—the building blocks of machine learning.</p>
<p>In the next phase of this series, we'll start exploring <strong>Supervised Learning Algorithms</strong> and see how these concepts come together to build predictive models. We'll kick things off with <strong>Linear Regression</strong>, one of the simplest yet most powerful algorithms in machine learning.</p>
<p>Stay tuned!</p>
]]></content:encoded></item><item><title><![CDATA[Part 8: Overfitting vs Underfitting in Machine Learning]]></title><description><![CDATA[Imagine a student preparing for an exam.
One student memorizes every single question and answer from previous papers without actually understanding the concepts.
Another student barely studies and onl]]></description><link>https://ai-beginners-journey.hashnode.dev/overfitting-vs-underfitting-machine-learning</link><guid isPermaLink="true">https://ai-beginners-journey.hashnode.dev/overfitting-vs-underfitting-machine-learning</guid><category><![CDATA[MachineLearning]]></category><category><![CDATA[AI]]></category><category><![CDATA[beginnersguide]]></category><category><![CDATA[Underfitting and Overfitting in Machine Learning]]></category><category><![CDATA[machine learning basics]]></category><dc:creator><![CDATA[Divyajot Kaur]]></dc:creator><pubDate>Thu, 28 May 2026 05:26:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/bc2e482a-1a81-4a82-b6a6-71d3f82f1b31.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine a student preparing for an exam.</p>
<p>One student memorizes every single question and answer from previous papers without actually understanding the concepts.</p>
<p>Another student barely studies and only understands a few basic ideas.</p>
<p>Now imagine both students facing completely new questions in the final exam.</p>
<p>What happens?</p>
<ul>
<li><p>The first student struggles because the questions are slightly different.</p>
</li>
<li><p>The second student struggles because they never learned enough in the first place.</p>
</li>
</ul>
<p>Machine Learning models face the exact same problem.</p>
<p>Some models learn <strong>too much</strong>, while others learn <strong>too little</strong>. These situations are called:</p>
<ul>
<li><p><strong>Overfitting</strong></p>
</li>
<li><p><strong>Underfitting</strong></p>
</li>
</ul>
<p>Finding the right balance between them is one of the most important goals in Machine Learning.</p>
<h2>📉<strong>What is Underfitting?</strong></h2>
<p>Underfitting happens when a model is too simple to understand the patterns in the data.</p>
<p>The model:</p>
<ul>
<li><p>Fails to learn properly</p>
</li>
<li><p>Performs poorly on training data</p>
</li>
<li><p>Also performs poorly on unseen data</p>
</li>
</ul>
<p>This means:</p>
<blockquote>
<p>The model has not learned enough.</p>
</blockquote>
<h2>🎓<strong>Real-Life Analogy for Underfitting</strong></h2>
<p>Imagine trying to pass a math exam after only reading the chapter titles.</p>
<p>You studied something, but not enough to truly understand the subject.</p>
<p>That is underfitting.</p>
<h2>⚠️<strong>Signs of Underfitting</strong></h2>
<p>A model may be underfitting if:</p>
<ul>
<li><p>Training accuracy is low</p>
</li>
<li><p>Testing accuracy is also low</p>
</li>
<li><p>Predictions are overly simple or inaccurate</p>
</li>
</ul>
<h2>📈<strong>What is Overfitting?</strong></h2>
<p>Overfitting happens when a model learns the training data too well—including noise and unnecessary details.</p>
<p>Instead of learning general patterns, the model starts memorizing the dataset.</p>
<p>As a result:</p>
<ul>
<li><p>Training accuracy becomes very high</p>
</li>
<li><p>Testing accuracy becomes poor</p>
</li>
</ul>
<p>This means:</p>
<blockquote>
<p>The model performs well only on data it has already seen.</p>
</blockquote>
<h2>🎓<strong>Real-Life Analogy for Overfitting</strong></h2>
<p>Imagine a student memorizing answers to previous exam papers word for word.</p>
<p>If the final exam contains the exact same questions, the student performs perfectly.</p>
<p>But if the questions are slightly changed, the student struggles because they never truly understood the concepts.</p>
<p>That is overfitting.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/dbb7f81f-443e-40ae-880f-9563bd894d0f.png" alt="" style="display:block;margin:0 auto" />

<h2>⚖️<strong>Model Complexity</strong></h2>
<p>One major reason behind overfitting and underfitting is <strong>model complexity</strong>.</p>
<h3>📌<strong>Simple Models</strong></h3>
<p>Very simple models may:</p>
<ul>
<li><p>Miss important patterns</p>
</li>
<li><p>Make overly basic predictions</p>
</li>
</ul>
<p>This often leads to:</p>
<p>Underfitting</p>
<h3>📌<strong>Complex Models</strong></h3>
<p>Highly complex models may:</p>
<ul>
<li><p>Learn every tiny detail</p>
</li>
<li><p>Capture random noise</p>
</li>
<li><p>Memorize training data</p>
</li>
</ul>
<p>This often leads to:</p>
<p>Overfitting</p>
<h2>🎯<strong>The Goal: Balanced Learning</strong></h2>
<p>The ideal model should:</p>
<ul>
<li><p>Learn meaningful patterns</p>
</li>
<li><p>Ignore unnecessary noise</p>
</li>
<li><p>Perform well on unseen data</p>
</li>
</ul>
<p>In simple terms:</p>
<blockquote>
<p>We want a model that understands the subject instead of memorizing answers.</p>
</blockquote>
<h2>🔁<strong>Summary Table</strong></h2>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/32599d03-61f5-477d-a441-6a2312e3aaa2.png" alt="" style="display:block;margin:0 auto" />

<h2>🛠<strong>How to Reduce Underfitting</strong></h2>
<p>Some common solutions include:</p>
<ul>
<li><p>Using a more complex model</p>
</li>
<li><p>Training for longer</p>
</li>
<li><p>Adding more useful features</p>
</li>
<li><p>Reducing excessive regularization</p>
</li>
</ul>
<h2>🛠<strong>How to Reduce Overfitting</strong></h2>
<p>Common solutions include:</p>
<ul>
<li><p>Using more training data</p>
</li>
<li><p>Simplifying the model</p>
</li>
<li><p>Applying regularization</p>
</li>
<li><p>Using dropout (in deep learning)</p>
</li>
<li><p>Cross-validation</p>
</li>
</ul>
<h2>🎯Conclusion</h2>
<p>A good Machine Learning model should learn meaningful patterns without memorizing noise.<br />In simple terms:</p>
<blockquote>
<p>Underfitting means learning too little, while overfitting means learning too much.</p>
</blockquote>
<p>The goal is to find the right balance between the two.</p>
<h2>🔎Coming Up Next</h2>
<p>In the next part, we’ll explore the <strong>Bias–Variance Tradeoff</strong> and understand how bias and variance are connected to underfitting and overfitting, and how to balance them effectively.</p>
]]></content:encoded></item><item><title><![CDATA[Part 7: Why We Split Data in Machine Learning?
]]></title><description><![CDATA[Imagine preparing for an important exam.
You don’t just:

Read the textbook

Memorize every question

And walk into the final exam directly


Instead, you usually:

Study concepts

Practice with mock ]]></description><link>https://ai-beginners-journey.hashnode.dev/train-validation-test-data-machine-learning</link><guid isPermaLink="true">https://ai-beginners-journey.hashnode.dev/train-validation-test-data-machine-learning</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[AI]]></category><category><![CDATA[#traintestsplit]]></category><category><![CDATA[Data Leakage]]></category><category><![CDATA[beginners guide]]></category><dc:creator><![CDATA[Divyajot Kaur]]></dc:creator><pubDate>Wed, 27 May 2026 16:39:55 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/2b84b570-63f7-42bf-aab4-895a3051a0d4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine preparing for an important exam.</p>
<p>You don’t just:</p>
<ul>
<li><p>Read the textbook</p>
</li>
<li><p>Memorize every question</p>
</li>
<li><p>And walk into the final exam directly</p>
</li>
</ul>
<p>Instead, you usually:</p>
<ul>
<li><p>Study concepts</p>
</li>
<li><p>Practice with mock tests</p>
</li>
<li><p>And finally attempt the actual exam</p>
</li>
</ul>
<p>Models learn in a very similar way.</p>
<p>If we train a model using all the available data without proper evaluation, the model may simply memorize patterns instead of actually learning them. To avoid this, we split the dataset into different parts:</p>
<ul>
<li><p>Training data</p>
</li>
<li><p>Validation data</p>
</li>
<li><p>Test data</p>
</li>
</ul>
<h2>❓<strong>Why Do We Split Data?</strong></h2>
<p>The goal of Machine Learning is not just to perform well on known data.</p>
<p>But to make accurate predictions on new, unseen data.</p>
<p>If we train and test the model on the same dataset, the model may appear accurate because it has already seen the answers before.</p>
<p>This is similar to:</p>
<blockquote>
<p>Practicing the exact same questions that appear in the final exam.</p>
</blockquote>
<p>The student may score well, but that does not necessarily mean they truly understand the subject.</p>
<p>Data splitting helps us evaluate whether the model can generalize to new situations.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/c73dd1c7-1518-4842-bf57-8c9118d2446e.png" alt="" style="display:block;margin:0 auto" />

<h2>📚<strong>Training Data — The Learning Phase</strong></h2>
<p>The <strong>training dataset</strong> is the portion of data used to teach the model.</p>
<p>During this phase, the model:</p>
<ul>
<li><p>Learns patterns</p>
</li>
<li><p>Finds relationships between features</p>
</li>
<li><p>Adjusts its internal parameters</p>
</li>
</ul>
<p>Think of it as:</p>
<blockquote>
<p>A student studying concepts from textbooks and classroom notes.</p>
</blockquote>
<p>This is where the actual learning happens.</p>
<p>Typically, the largest portion of data is used for training.</p>
<h2>🧪<strong>Validation Data — The Practice Test</strong></h2>
<p>The <strong>validation dataset</strong> is used to evaluate the model while it is still learning.</p>
<p>It helps us:</p>
<ul>
<li><p>Tune hyperparameters</p>
</li>
<li><p>Compare different models</p>
</li>
<li><p>Prevent overfitting</p>
</li>
</ul>
<p>This is similar to:</p>
<blockquote>
<p>Taking mock tests before the final exam.</p>
</blockquote>
<p>The validation set allows us to check:</p>
<ul>
<li><p>Is the model improving?</p>
</li>
<li><p>Or is it just memorizing the training data?</p>
</li>
</ul>
<p>Without validation data, we may unknowingly build a model that performs well only on training data.</p>
<h2>🎯<strong>Test Data — The Final Exam</strong></h2>
<p>The <strong>test dataset</strong> is used only after training is completely finished.</p>
<p>It provides:</p>
<ul>
<li><p>The final evaluation of model performance</p>
</li>
<li><p>An unbiased estimate of how the model performs on unseen data</p>
</li>
</ul>
<p>This is similar to:</p>
<blockquote>
<p>The actual final exam.</p>
</blockquote>
<p>The model has never seen this data before.</p>
<p>If the model performs well here, it means:</p>
<p>✔ It has learned useful patterns</p>
<p>✔ It can generalize effectively</p>
<h2>⚠️<strong>What is Data Leakage?</strong></h2>
<p>Imagine a teacher accidentally gives students the exact final exam questions while they are practicing.</p>
<p>During the exam, the students score very high—not because they truly understood the concepts, but because they had already seen the answers earlier.</p>
<p>This is exactly what happens in Machine Learning when <strong>data leakage</strong> occurs.</p>
<p>Data leakage happens when a model gets access to information during training that it should not normally have. As a result, the model appears extremely accurate during testing, but performs poorly in real-world situations.</p>
<p>As a result:</p>
<ul>
<li><p>The model gets unfair hints</p>
</li>
<li><p>Performance appears artificially high</p>
</li>
<li><p>Real-world predictions become unreliable</p>
</li>
</ul>
<h2>🚫<strong>Common Causes of Data Leakage</strong></h2>
<p>Some common causes include:</p>
<h3>1. <strong>Applying Preprocessing Before Splitting Data</strong></h3>
<p>If we scale or normalize the entire dataset before splitting it, the model indirectly gets information from the test data.</p>
<p>In this case:</p>
<ul>
<li>The model learns patterns it should not know yet</li>
</ul>
<h3>2. <strong>Using Future Information</strong></h3>
<p>Sometimes the dataset contains information that would not actually be available at prediction time.</p>
<p>For example:</p>
<ul>
<li><p>Predicting whether a customer will cancel a subscription</p>
</li>
<li><p>But using data collected after the customer already left</p>
</li>
</ul>
<p>The model gets future information, making predictions unrealistically accurate.</p>
<h3>3. <strong>Including Target-Related Features</strong></h3>
<p>Sometimes a feature is directly connected to the answer we want to predict.</p>
<p>For example:</p>
<ul>
<li><p>Predicting whether a student will pass</p>
</li>
<li><p>Including “final result” as an input feature</p>
</li>
</ul>
<p>The model can easily guess the answer instead of learning real patterns.</p>
<h2>⚖️<strong>Typical Data Split Ratios</strong></h2>
<p>There is no single perfect ratio, but some commonly used splits are:</p>
<table style="min-width:75px"><colgroup><col style="min-width:25px"></col><col style="min-width:25px"></col><col style="min-width:25px"></col></colgroup><tbody><tr><td><p><strong>Training</strong></p></td><td><p><strong>Validation</strong></p></td><td><p><strong>Test</strong></p></td></tr><tr><td><p>70%</p></td><td><p>15%</p></td><td><p>15%</p></td></tr><tr><td><p>80%</p></td><td><p>10%</p></td><td><p>10%</p></td></tr></tbody></table>

<p>The choice depends on:</p>
<ul>
<li><p>Dataset size</p>
</li>
<li><p>Problem complexity</p>
</li>
<li><p>Available data</p>
</li>
</ul>
<h2>⚠️<strong>Common Mistakes</strong></h2>
<p>Beginners often:</p>
<ul>
<li><p>Train and test on the same dataset</p>
</li>
<li><p>Skip validation data completely</p>
</li>
<li><p>Apply preprocessing before splitting data</p>
</li>
</ul>
<p>These mistakes can lead to misleading model performance.</p>
<h2>🎯<strong>Conclusion</strong></h2>
<p>Splitting data helps models learn, improve, and perform well on unseen data.<br />In simple terms:</p>
<blockquote>
<p>Training data teaches the model, validation data improves it, and test data evaluates it fairly.</p>
</blockquote>
<p>Without proper data splitting, even a highly accurate model may fail in the real world.</p>
<h2>🔮<strong>Coming Up Next</strong></h2>
<p>In the next part of this series, we’ll explore <strong>Overfitting and Underfitting</strong> and understand why some models memorize too much while others fail to learn enough.</p>
]]></content:encoded></item><item><title><![CDATA[Part 6: Data Preprocessing: The Foundation of Every ML Model
]]></title><description><![CDATA[Imagine trying to cook a great meal with spoiled vegetables, missing ingredients, and random measurements.
No matter how skilled the chef is, the result probably won't be good.
Machine Learning works ]]></description><link>https://ai-beginners-journey.hashnode.dev/data-preprocessing-basics-machine-learning</link><guid isPermaLink="true">https://ai-beginners-journey.hashnode.dev/data-preprocessing-basics-machine-learning</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[DataPreprocessing]]></category><category><![CDATA[datacleaning]]></category><category><![CDATA[Python]]></category><category><![CDATA[beginnersguide]]></category><dc:creator><![CDATA[Divyajot Kaur]]></dc:creator><pubDate>Thu, 14 May 2026 07:40:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/11463c59-4cb5-4355-85b0-872f874e1e91.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine trying to cook a great meal with spoiled vegetables, missing ingredients, and random measurements.</p>
<p>No matter how skilled the chef is, the result probably won't be good.</p>
<p>Machine Learning works in a similar way.</p>
<p>In real-world scenarios, data is rarely clean. It is incomplete, inconsistent, and in formats that machines cannot understand directly.</p>
<p>That's why before training any model, we need to prepare it properly. This preparation step is called <strong>Data Preprocessing</strong>.</p>
<p>Data Preprocessing is the process of cleaning, transforming, and organizing raw data so that it can be effectively used for Machine Learning models.</p>
<h3>🍳Data Preprocessing = Kitchen Preparation</h3>
<p>Think of a Machine Learning project like preparing a dish.</p>
<ul>
<li><p>The <strong>data</strong> is ingredients</p>
</li>
<li><p>The <strong>model</strong> is your chef</p>
</li>
<li><p>The <strong>prediction</strong> is the final dish</p>
</li>
</ul>
<p>Now ask yourself:</p>
<p>👉 What happens if your ingredients are spoiled, unmeasured, or missing?</p>
<p>Even the best chef (algorithm) cannot fix bad ingredients.</p>
<p>That’s exactly why preprocessing matters.</p>
<h3>❗️Why Data Preprocessing Matters</h3>
<p>Raw data from the real world contains:</p>
<ul>
<li><p>Missing values</p>
</li>
<li><p>Text data instead of numbers</p>
</li>
<li><p>Different scales for different features</p>
</li>
<li><p>Noise and inconsistencies</p>
</li>
</ul>
<p>Without preprocessing, models may:</p>
<ul>
<li><p>Learn incorrect patterns</p>
</li>
<li><p>Give biased predictions</p>
</li>
<li><p>Perform poorly on real data</p>
</li>
</ul>
<p>So preprocessing ensures:</p>
<p>✔️Clean and usable data</p>
<p>✔️Better model performance</p>
<p>✔️More reliable predictions</p>
<p>✔️Improved model accuracy</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/cd614e28-aabd-451b-afaf-8c02015626d7.png" alt="" style="display:block;margin:0 auto" />

<h3>🧹Handling Missing Values (Fixing Missing Ingredients)</h3>
<p>Imagine you are cooking pasta, but you realize:</p>
<ul>
<li><p>You don’t know how much salt to use</p>
</li>
<li><p>Half the ingredients list is missing</p>
</li>
</ul>
<p>What do you do?</p>
<p>You either:</p>
<ul>
<li><p>Estimate missing quantities</p>
</li>
<li><p>Or remove the missing items entirely</p>
</li>
</ul>
<p>The same thing happens in datasets.</p>
<p>In real-world data, missing values are very common, and most Machine Learning models cannot work with them directly.</p>
<p><strong>In Machine Learning, we solve this problem in a similar way:</strong></p>
<ul>
<li><p>Removing incomplete rows or columns (only if necessary)</p>
</li>
<li><p>Filling missing values using:</p>
<ul>
<li><p>Mean (average)</p>
</li>
<li><p>Median (middle value)</p>
</li>
<li><p>Mode (most frequent value)</p>
</li>
</ul>
</li>
</ul>
<p>This ensures the dataset is complete and usable.</p>
<h3>🔤Encoding Categorical Data (Converting Recipes into Machine Language)</h3>
<p>Now imagine your recipe book says:</p>
<ul>
<li><p>“Add some salt”</p>
</li>
<li><p>“Use a pinch of sugar”</p>
</li>
<li><p>“Add a bit of spice”</p>
</li>
</ul>
<p>A machine cannot understand “some” or “a pinch”.</p>
<p>It needs exact numbers.</p>
<p>Similarly, datasets often contain categorical data like:</p>
<ul>
<li><p>City names</p>
</li>
<li><p>Colors</p>
</li>
<li><p>Product types</p>
</li>
<li><p>Gender</p>
</li>
</ul>
<p>Machines cannot understand text directly, so we need to convert it into numbers using encoding.</p>
<p><strong>Common techniques:</strong></p>
<ul>
<li><p>Label Encoding → assigning a unique number to each category</p>
</li>
<li><p>One-Hot Encoding → creating binary columns for each category (0/1 format)</p>
</li>
</ul>
<p>Now the machine can properly “read” the data.</p>
<h3>⚖️Feature Scaling (Balancing Ingredients)</h3>
<p>Imagine a recipe where:</p>
<ul>
<li><p>One ingredient is measured in grams</p>
</li>
<li><p>Another is measured in kilograms</p>
</li>
</ul>
<p>If you don’t convert them properly, your dish will be ruined because one ingredient will dominate the rest.</p>
<p>In datasets, this happens when features have different scales. Example:</p>
<ul>
<li><p>Age → range from 0 to 100 (small values)</p>
</li>
<li><p>Salary → range from thousands to lakhs (very large values)</p>
</li>
</ul>
<p>Without scaling, models may incorrectly assume larger numbers are more important.</p>
<p><strong>Solution: Feature Scaling</strong></p>
<p>We use methods like:</p>
<ul>
<li><p>Normalization (scaling values between 0 and 1)</p>
</li>
<li><p>Standardization (centering data around mean with unit variance)</p>
</li>
</ul>
<p>This ensures all features contribute equally to the model.</p>
<h3>🔁Data Preprocessing Pipeline (Step-by-Step Cooking Process)</h3>
<p>Just like cooking has stages, preprocessing also follows a flow:</p>
<ul>
<li><p>Collect raw data</p>
</li>
<li><p>Handle missing values</p>
</li>
<li><p>Encode categorical data</p>
</li>
<li><p>Scale features</p>
</li>
<li><p>Remove duplicates</p>
</li>
<li><p>Clean and finalize dataset</p>
</li>
</ul>
<p>Only after this is complete do we train a Machine Learning model.</p>
<h3>🌍Real-World Example: Bank Loan Approval System</h3>
<p>Imagine a <strong>bank loan approval system</strong> that decides whether a person is eligible for a loan.</p>
<p>The dataset includes features like:</p>
<ul>
<li><p>Age of the applicant</p>
</li>
<li><p>Annual income</p>
</li>
<li><p>Loan amount requested</p>
</li>
<li><p>Credit score</p>
</li>
</ul>
<p>Now notice the problem:</p>
<ul>
<li><p>Age ranges from <strong>18 to 70</strong></p>
</li>
<li><p>Income ranges from <strong>₹1,00,000 to ₹20,00,000</strong></p>
</li>
<li><p>Credit score ranges from <strong>300 to 900</strong></p>
</li>
</ul>
<p>If we don’t apply <strong>feature scaling</strong>, the model will assume:</p>
<p>👉 Income is the most important feature</p>
<p>👉 Because its values are much larger than others</p>
<p>Even if credit score is actually more important for loan decisions, it gets “ignored” due to scale differences.</p>
<p><strong>After Feature Scaling:</strong></p>
<p>All features are brought to a similar range.</p>
<p>Now the model:</p>
<p>✔️Treats all features fairly</p>
<p>✔️Learns real patterns (like credit score impact)</p>
<p>✔️Makes more accurate loan approval predictions</p>
<h3>⚠️Common Mistakes (Burning the Dish)</h3>
<p>Even good cooks can make mistakes. Similarly, beginners often make mistakes such as:</p>
<ul>
<li><p>Applying scaling before splitting data into train/test sets -&gt; lead to <strong>data leakage</strong>.</p>
</li>
<li><p>Using wrong encoding techniques -&gt; may cause the model to misinterpret categorical information.</p>
</li>
<li><p>Ignoring missing values -&gt; can result in errors or unreliable predictions.</p>
</li>
<li><p>Over-cleaning data and removing important information</p>
</li>
</ul>
<p>Avoiding these mistakes can significantly improve model performance.</p>
<h3>🔚Conclusion</h3>
<p>Data Preprocessing is not just a technical step—it is the foundation of Machine Learning.</p>
<p>If Machine Learning is cooking, then preprocessing is, washing, cutting, measuring, and preparing everything before the fire is even turned on.</p>
<p>Without it, even the best algorithm cannot produce good results.</p>
<p>With it, even simple models can perform surprisingly well.</p>
<h3>🔮What's Next?</h3>
<p>In the next blog, we’ll explore how to split data into training, validation, and test sets so we can evaluate models properly and avoid data leakage.</p>
]]></content:encoded></item><item><title><![CDATA[Part 5: Understanding Datasets: The Building Blocks of Machine Learning]]></title><description><![CDATA[Imagine you're trying to predict whether a student will pass an exam or not.
You look at things like study hours, attendance, and marks. Based on this, you make a guess*—*pass or fail.
Now imagine doi]]></description><link>https://ai-beginners-journey.hashnode.dev/understanding-datasets-machine-learning</link><guid isPermaLink="true">https://ai-beginners-journey.hashnode.dev/understanding-datasets-machine-learning</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[dataset]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[AI]]></category><category><![CDATA[beginnersguide]]></category><dc:creator><![CDATA[Divyajot Kaur]]></dc:creator><pubDate>Wed, 06 May 2026 16:56:23 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/2afd6618-4e5c-4941-8209-abb5a1572610.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine you're trying to predict whether a student will pass an exam or not.</p>
<p>You look at things like study hours, attendance, and marks. Based on this, you make a guess*—*pass or fail.</p>
<p>Now imagine doing this for hundreds of students.</p>
<p>This collection of information is called a <strong>dataset.</strong></p>
<p>In Machine Learning, datasets are the <strong>foundation</strong>. Every model learns patterns from data*—*and that data comes from datasets.</p>
<p>Let's break this down in the simplest way possible.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/51d4db73-dc47-4aff-90a8-db581c013f2a.png" alt="" style="display:block;margin:0 auto" />

<p>The image above shows the basic structure of a dataset.</p>
<p>Now, let’s understand each part step by step.</p>
<h2>📊What is a Dataset?</h2>
<p>A <strong>dataset</strong> is simply a collection of data.</p>
<p>In Machine Learning, the model is trained on a dataset to learn patterns and make predictions.</p>
<p>Example:</p>
<p>This entire table is a dataset.</p>
<h2>📋How is a Dataset Organized?</h2>
<p>In Machine Learning, data is usually organized in a <strong>table format</strong>, similar to an Excel sheet.</p>
<ul>
<li><p>Each <strong>row</strong> represents a single record (called a sample)</p>
</li>
<li><p>Each <strong>column</strong> represents a type of information (called a feature)</p>
</li>
<li><p>The final column often represents the <strong>output</strong> (called the label)</p>
</li>
</ul>
<p>This structured format helps machines easily read and learn from data.</p>
<h2>🧩What are Samples?</h2>
<p>Each row in a dataset is called a <strong>sample.</strong></p>
<p>A sample represents one record or observation.</p>
<p>In our example:</p>
<p>Each row = one student</p>
<p>So:</p>
<ul>
<li><p>1 row = 1 sample</p>
</li>
<li><p>3 rows = 3 samples</p>
</li>
</ul>
<p>A sample contains all the information (features + label) for that dataset.</p>
<p>In simple terms, a sample is a single entry in a dataset.</p>
<h2>⚙️What are Features?</h2>
<p><strong>Features</strong> are the input variables. They contain the information used to train a model and make predictions.</p>
<p>In our dataset, features are:</p>
<ul>
<li><p>Study hours</p>
</li>
<li><p>Attendance</p>
</li>
<li><p>Marks</p>
</li>
</ul>
<p>These are the factors that help us decide the result.</p>
<p>Think of features as clues to solve a problem.</p>
<h2>🎯What is a Label?</h2>
<p>A <strong>label</strong> is the final result or output that we want to predict.</p>
<p>In our example:</p>
<p>Result = Label (whether a student passed or failed)</p>
<p>In simple terms, the label is what the model is trying to learn.</p>
<h2>🌍Real-Life Analogy</h2>
<p>Now that we understand the structure of a dataset. Let's understand it with the help of a real-world example.</p>
<h3>🩺Doctor Diagnosing a Patient</h3>
<p>Imagine a doctor trying to diagnose a patient:</p>
<ul>
<li><p>Symptoms (fever, cough, age) -&gt; <strong>Features</strong></p>
</li>
<li><p>Diagnosis (disease) -&gt; <strong>Label</strong></p>
</li>
<li><p>Each patient -&gt; <strong>Sample</strong></p>
</li>
<li><p>All patient records -&gt; <strong>Dataset</strong></p>
</li>
</ul>
<p>The doctor use symptoms (features) to predict the disease (label).</p>
<p>This is exactly how Machine Learning works.</p>
<h2>🔚Conclusion</h2>
<p>Understanding datasets is the first step in Machine Learning.</p>
<p>Once you know how data is organized into samples, features, and labels, everything else becomes much easier to understand.</p>
<h2>🔮What's Next?</h2>
<p>In the next blog, we'll explore <strong>data preprocessing</strong>*—*how datasets are cleaned and prepared before training a machine learning model, an essential step for building accurate and reliable models.</p>
]]></content:encoded></item><item><title><![CDATA[Part 4: Regression vs Classification: How Machines Predict & Decide]]></title><description><![CDATA[In the previous blog, we explored what Supervised Learning is and how models learn from data.
Now, let's understand how Supervised Learning is used to solve real-world problems using two key approache]]></description><link>https://ai-beginners-journey.hashnode.dev/supervised-learning-regression-vs-classification</link><guid isPermaLink="true">https://ai-beginners-journey.hashnode.dev/supervised-learning-regression-vs-classification</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Supervised learning]]></category><category><![CDATA[#Regression]]></category><category><![CDATA[classification]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Divyajot Kaur]]></dc:creator><pubDate>Thu, 30 Apr 2026 03:26:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/c313a151-cde7-4127-b078-e548e3f6eecb.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the previous blog, we explored what Supervised Learning is and how models learn from data.</p>
<p>Now, let's understand how Supervised Learning is used to solve real-world problems using two key approaches: <strong>Regression</strong> &amp; <strong>Classification.</strong></p>
<h2>❓How Machines Actually Learn</h2>
<p>As we discussed earlier, Supervised Learning is a technique where the model learns from labeled data.</p>
<p>But at its core, Supervised Learning is all about finding patterns in data.</p>
<p>Think of it as:</p>
<p>You give your model the data about the songs you listen to, it starts noticing patterns- if you prefer rock or pop songs, it will recommend similar tracks.</p>
<p>Over time, it learns your preferences and makes better predictions.</p>
<p>It doesn't "understand" the way humans do. Instead, it:</p>
<ul>
<li><p>Observes relationships between input (features) and output (labels)</p>
</li>
<li><p>Learns patterns from past data</p>
</li>
<li><p>Use those patterns to make predictions on new, unseen data</p>
</li>
</ul>
<p>This is what makes Machine Learning powerful.</p>
<p>Now, based on the type of problem we want to solve, supervised learning can be divided into two main approaches:</p>
<h2>📈Regression: Predicting Continuous Value</h2>
<p>Regression is a Machine Learning technique used to predict numerical (continuous) values.</p>
<p>In other words, it is used when the output you want is a number rather than a category.</p>
<h3>💡What does that mean?</h3>
<p>Regression answers questions like:</p>
<ul>
<li><p>"What will be the price?"</p>
</li>
<li><p>"What will be the temperature?"</p>
</li>
<li><p>"How much sales will we get?"</p>
</li>
</ul>
<h3>🏡Example</h3>
<p>Imagine predicting the price of a house based on:</p>
<ul>
<li><p>Location</p>
</li>
<li><p>Number of rooms</p>
</li>
<li><p>Condition</p>
</li>
</ul>
<p>The model learns from past data and predicts a numeric value, like 45,00,000.</p>
<h3>🧠Intuition</h3>
<p>Think of it as finding a relationship between things by drawing a best-fit line through data points to predict values. E.g., more study hours -&gt; higher marks.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/6cc1ecb4-a053-475e-a065-c1a3b9da037a.png" alt="" style="display:block;margin:0 auto" />

<h3>📌Where is it used?</h3>
<ul>
<li><p>Real estate price prediction</p>
</li>
<li><p>Sales forecasting</p>
</li>
<li><p>Weather prediction</p>
</li>
</ul>
<h2>🎯Classification: Predicting Categories</h2>
<p>Classification is a Machine Learning technique used to categorize data.</p>
<p>In simple words, it is used when the output you want belongs to a specific group or class.</p>
<h3>💡What does that mean?</h3>
<p>Classification answers questions like:</p>
<ul>
<li><p>Yes or No</p>
</li>
<li><p>"Is this spam or not?"</p>
</li>
<li><p>"Which category does this belong to?"</p>
</li>
</ul>
<h3>📧Example</h3>
<p>Imagine predicting whether an email is spam or not.</p>
<p>You provide the model with:</p>
<ul>
<li><p>Email content</p>
</li>
<li><p>Sender details</p>
</li>
<li><p>Keywords</p>
</li>
</ul>
<p>The model learns patterns and predicts: Spam or Not Spam.</p>
<h3>🧠Intuition</h3>
<p>Instead of predicting a number, classification predicts which category something belongs to.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/7479e6cf-9696-4417-aae9-e19299d9237a.png" alt="" style="display:block;margin:0 auto" />

<h3>📌Types of Classification</h3>
<ul>
<li><p><strong>Binary Classification:</strong> Two classes (e.g., Yes/No, Spam/Not Spam)</p>
</li>
<li><p><strong>Multi-class Classification:</strong> More than two classes (e.g., classifying types of fruits or different animal species)</p>
</li>
</ul>
<h3>📌Where is it used?</h3>
<ul>
<li><p>Email filtering</p>
</li>
<li><p>Medical diagnosis</p>
</li>
<li><p>Image recognition</p>
</li>
</ul>
<h2>⚖️Difference Between Both</h2>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/43a3c811-b78e-40a5-ac28-58dff4099805.png" alt="" style="display:block;margin:0 auto" />

<h2>⚠️Challenges in Supervised Learning</h2>
<p>While Supervised Learning is powerful, it's not perfect.</p>
<ul>
<li><p><strong>Overfitting:</strong> Instead of learning patterns model memorizes the training data i.e., it performs well during training but poor on new unseen data.</p>
</li>
<li><p><strong>Underfitting:</strong> Model is too simple and fails to capture patterns, thus performs poor on both training and testing data.</p>
</li>
<li><p><strong>Data Quality:</strong> Poor or insufficient data leads to poor predictions.</p>
</li>
</ul>
<h2>🧩Where is This Used in Real Life?</h2>
<ul>
<li><p><strong>Healthcare:</strong> Used for detecting and diagnosing diseases.</p>
</li>
<li><p><strong>E-commerce platforms:</strong> Recommend products based on user behavior.</p>
</li>
<li><p><strong>Finance:</strong> Used to detect fraudulent transactions.</p>
</li>
</ul>
<h2>🔚Conclusion</h2>
<p>Supervised Learning is one of the most practical and widely used approaches that helps models learn from data and make predictions.</p>
<p>By using <strong>Regression</strong> and <strong>Classification</strong>, it can:</p>
<ul>
<li><p>Predict outcomes</p>
</li>
<li><p>Identify patterns</p>
</li>
<li><p>Solve real-world problems efficiently</p>
</li>
</ul>
<p>Understanding these concepts is a key step toward building practical Machine Learning systems.</p>
<h2>🔮What's Next?</h2>
<p>In the next blog, we'll break down datasets, features, and labels- the building blocks of every machine learning system.</p>
]]></content:encoded></item><item><title><![CDATA[Part 3: Types of Machine Learning]]></title><description><![CDATA[Think about how you learned things growing up-being taught by someone, figuring things out on your own, or learning from mistakes.
Machine Learning follows these same patterns
In this blog, we'll brea]]></description><link>https://ai-beginners-journey.hashnode.dev/understanding-types-of-machine-learning</link><guid isPermaLink="true">https://ai-beginners-journey.hashnode.dev/understanding-types-of-machine-learning</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[types of machine learning]]></category><category><![CDATA[#AIForBeginners ]]></category><category><![CDATA[Supervised learning]]></category><category><![CDATA[Unsupervised learning]]></category><category><![CDATA[Reinforcement Learning]]></category><dc:creator><![CDATA[Divyajot Kaur]]></dc:creator><pubDate>Wed, 22 Apr 2026 18:10:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/ff1b9b9b-d9c5-4621-a5c8-864cc33b146e.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Think about how you learned things growing up-being taught by someone, figuring things out on your own, or learning from mistakes.</p>
<p>Machine Learning follows these same patterns</p>
<p>In this blog, we'll break down the types of Machine Learning and explore how machines learn from data</p>
<h2>⚙️Types of Machine Learning</h2>
<p>Machine Learning can be divided into different types based on how models learn from data.</p>
<p>Let's explore each of these types in detail.</p>
<h2>1. Supervised Learning</h2>
<p>Supervised Learning is a type of Machine Learning where the model learns from labeled data, i.e., the correct output is already given.</p>
<h3>How it works</h3>
<p>Imagine a student is learning from a teacher. The teacher gives the questions along with its answers and points out mistakes. Over time, the student starts recognizing patterns and improves gradually.</p>
<p>In Supervised Learning the model learns the same way.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/3df1c230-9b9a-4ef9-b181-9d9c3c3ad09d.png" alt="" style="display:block;margin:0 auto" />

<h3>Example</h3>
<p>Predicting the house prices based on features like location, condition, and number of rooms available.</p>
<p>Here the model is trained on labeled features and then tested on new unseen data.</p>
<h3>Types of problems</h3>
<ol>
<li><p>Regression (predicting continuous values, e.g., house price prediction)</p>
</li>
<li><p>Classification (predicting categories, e.g., spam and not spam emails)</p>
</li>
</ol>
<h3>Common Algorithms</h3>
<ol>
<li><p>Linear Regression</p>
</li>
<li><p>Logistic Regression</p>
</li>
<li><p>KNN</p>
</li>
<li><p>Decision Trees</p>
</li>
<li><p>Support Vector Machines (SVM)</p>
</li>
</ol>
<h3>Use Cases</h3>
<ol>
<li><p>Email Spam Detection</p>
</li>
<li><p>Stock price prediction</p>
</li>
<li><p>Medical diagnosis</p>
</li>
</ol>
<h2>2. Unsupervised Learning</h2>
<p>Unsupervised Learning is a technique where models are trained on unlabeled data. It is used to find hidden patterns and relationships by grouping similar data together.</p>
<h3>How it works</h3>
<p>Imagine you walk into a party where you don't know anyone. As you observe, you start noticing groups forming-people with similar interests, professions, or personalities naturally gather together.</p>
<p>No one tells you who belongs where, but patterns still emerge.</p>
<p>Unsupervised Learning works the same way-it identifies hidden patterns and groups in data without any predefined labels.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/1f5b0993-d3d3-44b1-809f-4a6ddfa5d0ae.png" alt="" style="display:block;margin:0 auto" />

<h3>Example</h3>
<p>Grouping customers based on their shopping behavior.</p>
<h3>Types of tasks</h3>
<ol>
<li><p>Clustering (Grouping similar data points)</p>
</li>
<li><p>Dimensionality Reduction (simplifying data while keeping important information)</p>
</li>
</ol>
<h3>Common Algorithms</h3>
<ol>
<li><p>K-Means Clustering</p>
</li>
<li><p>Hierarchical Clustering</p>
</li>
<li><p>Principal Component Analysis (PCA)</p>
</li>
</ol>
<h3>Use Cases</h3>
<ol>
<li><p>Customer Segmentation</p>
</li>
<li><p>Anomaly Detection (fraud detection)</p>
</li>
<li><p>Market Research</p>
</li>
</ol>
<h2>3. Semi-Supervised Learning</h2>
<p>Semi-Supervised Learning is a combination of supervised and unsupervised, where a small amount of labeled data and large amount of unlabeled data is used for model training.</p>
<p>Here the model first learns from labeled data and then improve using unlabeled data.</p>
<h3>How it works</h3>
<p>Imagine you are shown how to solve a particular set of problems, and than are given many similar ones to solve.</p>
<p>You start to recognize and learn patterns by solving the problems and then complete the rest accordingly.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/03c10a9c-7c5b-42da-b596-cb7e467b0571.png" alt="" style="display:block;margin:0 auto" />

<h3>Example</h3>
<p>Image classification where few images are labeled, but thousands are not.</p>
<h3>Use Cases</h3>
<ol>
<li><p>Medical Imaging</p>
</li>
<li><p>Speech Recognition</p>
</li>
<li><p>Large-scale classification problems</p>
</li>
</ol>
<h2>4. Reinforcement Learning</h2>
<p>Reinforcement Learning is a technique where an agent learns by interacting with an environment and receiving rewards and penalties.</p>
<p>Here the model learns through trial and error method, aiming to maximize rewards over time.</p>
<h3>How it works</h3>
<p>Think about the first time you rode a bicycle. There were no specific instructions for every movement-you try, fall, adjust, and improve with practice.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/1013cad6-1bc9-42d8-9298-4a525a902a8c.png" alt="" style="display:block;margin:0 auto" />

<h3>Example</h3>
<p>Training an AI to play a game of chess. For every correct move points are given and for every incorrect move points are deducted.</p>
<h3>Common Algorithms</h3>
<ol>
<li><p>Q-Learning</p>
</li>
<li><p>Deep Q Learning (DQN)</p>
</li>
</ol>
<h3>Use Cases</h3>
<ol>
<li><p>Game AI</p>
</li>
<li><p>Robotics</p>
</li>
<li><p>Self-driving cars</p>
</li>
</ol>
<h2>🚀Final Thoughts</h2>
<p>Each type of Machine Learning represents a different way of learning-just like humans. Whether it's learning from examples, exploring patterns, or improving through trial and error, these approaches shape how machines make decisions.</p>
<p>Understanding these different learning approaches helps build a strong foundation for seeing how machines learn and make decisions.</p>
<h2>🔎Coming Up Next</h2>
<p>In the next blog, we'll dive deeper into Supervised Learning and understand how Regression and Classification helps in solving real-world problems.</p>
]]></content:encoded></item><item><title><![CDATA[Part 2: Introduction to Machine Learning]]></title><description><![CDATA[Imagine teaching a child to recognize fruits. Instead of giving strict instructions like "apples are round and red", you show them pictures of apples. This is repeated until they start recognizing the]]></description><link>https://ai-beginners-journey.hashnode.dev/part-2-introduction-to-machine-learning</link><guid isPermaLink="true">https://ai-beginners-journey.hashnode.dev/part-2-introduction-to-machine-learning</guid><category><![CDATA[MachineLearning]]></category><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[ml basics]]></category><category><![CDATA[machine learning for beginners]]></category><dc:creator><![CDATA[Divyajot Kaur]]></dc:creator><pubDate>Wed, 15 Apr 2026 17:55:35 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/7c3dee9c-0397-4d6d-ad12-dcac20f2a3d5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine teaching a child to recognize fruits. Instead of giving strict instructions like "apples are round and red", you show them pictures of apples. This is repeated until they start recognizing the pattern.</p>
<p>Machine Learning works in the same way - It allows systems to learn from data instead of relying on fixed rules.</p>
<h3>❓So, What Exactly is Machine Learning?</h3>
<p>Machine Learning is a branch of Artificial Intelligence that enables machines to learn from data, just like humans from experience. In machine learning, models are trained on data to recognize patterns and make decisions and predictions.</p>
<p>Machine Learning can be defined as:</p>
<blockquote>
<p>"Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed."</p>
<p>-Arthur Samuel, 1959</p>
</blockquote>
<h3>⚙️How Machines Actually Learn?</h3>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/8208422b-7d6f-47c2-9f5f-fbf0cf99ffae.png" alt="" style="display:block;margin:0 auto" />

<p>Now you might be wondering how machines learn from data. Machines don't just read and understand information like humans; instead, they learn through a series of steps.</p>
<p>For example, imagine building a system that can identify whether an email is spam or not.</p>
<p>Let's understand how this works step by step:</p>
<ol>
<li><p><strong>Collecting Data:</strong> The model is given data, such as dataset of spam and non-spam emails.</p>
</li>
<li><p><strong>Training the Model:</strong> The model is trained on this data to learn the difference between spam and not spam emails.</p>
</li>
<li><p><strong>Finding Patterns:</strong> It identifies patterns, like words or phrases found in spam emails (like free, won prize).</p>
</li>
<li><p><strong>Making Predictions:</strong> Once trained, the model predicts whether a new email is spam or not.</p>
</li>
<li><p><strong>Improving the Model:</strong> With more data and tuning, the model becomes better at detecting spam emails.</p>
</li>
</ol>
<h3>🧠Key Concepts in Machine Learning</h3>
<p>Let's look at some basic concepts that form the foundation of machine learning:</p>
<ol>
<li><p><strong>Features:</strong> The input data or variables used by the model. These can be anything like image, text, audio, video, or numerical data.</p>
</li>
<li><p><strong>Labels:</strong> The correct output or answer the model is trained on (e.g., spam or not spam).</p>
</li>
<li><p><strong>Model:</strong> A system or algorithm the learns patterns from data to make predictions.</p>
</li>
<li><p><strong>Training:</strong> The process of teaching the model using data so it can learn patterns.</p>
</li>
</ol>
<h3>🧩Types of Machine Learning</h3>
<p>Machine Learning is divided into different types based on how they learn from data. Each type uses a different approach to find patterns and make predictions.</p>
<ol>
<li><p><strong>Supervised Learning:</strong> The model is trained on a set of labeled data. Here the output is already known.</p>
</li>
<li><p><strong>Unsupervised Learning:</strong> The model is trained on unlabeled dataset. It is used to find hidden patterns and structure.</p>
</li>
<li><p><strong>Semi-Supervised Learning:</strong> It is a combination of both labeled and unlabeled training data.</p>
</li>
<li><p><strong>Reinforcement Learning:</strong> The model learns by interacting with the environment and improves through rewards and penalties.</p>
</li>
</ol>
<h3>🌎Where is Machine Learning Used in Real Life?</h3>
<p>Now that we understand how machines learn, you might be wondering - is machine learning used in real life? The answer is yes!</p>
<p>Let's look at some everyday applications that you use without even realizing it.</p>
<ol>
<li><p><strong>Recommendation Systems:</strong> Platforms like Netflix and YouTube use ML to suggest movies and videos you are interested in.</p>
</li>
<li><p><strong>Virtual Assistants:</strong> Assistants like Siri and Google Assistant recognize your voice and respond to your queries.</p>
</li>
<li><p><strong>Image Recognition:</strong> ML is used in apps like Google Photos to recognize faces, objects, and even scam documents.</p>
</li>
</ol>
<h3>⚠️Limitations and Challenges of Machine Learning</h3>
<ol>
<li><p><strong>Data Dependency:</strong> ML relies heavily on data, and poor quality data can lead to wrong predictions.</p>
</li>
<li><p><strong>Needs lots of Data:</strong> ML requires large amount of data - without enough data, accuracy may be low.</p>
</li>
<li><p><strong>Overfitting &amp; Underfitting:</strong> Models can either learn too much (overfitting) or too little (underfitting), which affects the overall model performance.</p>
</li>
</ol>
<h3>🚀Final Thoughts</h3>
<p>Machine Learning may seem complex at first, but at it's core, it's all about learning from data and making smarter decisions. With its growing use in everyday life, understanding ML is becoming more important than ever.</p>
<h3>🔍Coming up Next</h3>
<p>In my next blog, we will explore different types of Machine Learning and understand how each approach works.</p>
]]></content:encoded></item><item><title><![CDATA[AI, ML & Deep Learning: A Beginner’s Journey (Part 1: Introduction to AI)]]></title><description><![CDATA[🧠The Beginning of Intelligent Machines
Have you ever wondered how Netflix knows what you like to watch and recommends it to you?
Or how self-driving cars work, or how your phone recognizes your voice]]></description><link>https://ai-beginners-journey.hashnode.dev/ai-ml-deep-learning-a-beginner-s-journey-part-1-introduction-to-ai</link><guid isPermaLink="true">https://ai-beginners-journey.hashnode.dev/ai-ml-deep-learning-a-beginner-s-journey-part-1-introduction-to-ai</guid><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[AI]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[Beginner Developers]]></category><dc:creator><![CDATA[Divyajot Kaur]]></dc:creator><pubDate>Mon, 06 Apr 2026 16:57:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/1246d4b5-f8fc-415f-b5d3-4d9353cfe25d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>🧠The Beginning of Intelligent Machines</h3>
<p>Have you ever wondered how Netflix knows what you like to watch and recommends it to you?</p>
<p>Or how self-driving cars work, or how your phone recognizes your voice?</p>
<p>Interesting, right!</p>
<p>Some thinks it's magic - but is it?</p>
<p>This is the power of Artificial Intelligence.</p>
<p>This blog marks the beginning of a series where we will explore Artificial Intelligence, Machine learning, and Deep Learning in depth - from basic concepts to real-world projects.</p>
<h3>🤖What is Artificial Intelligence?</h3>
<p>Artificial Intelligence (AI) refers to the ability of machines to perform tasks that usually requires human intelligence. It allows machines to learn from data, recognize patterns, understand language and make decisions.</p>
<p>AI can also be defined as:</p>
<blockquote>
<p>Artificial Intelligence is the science and engineering of making intelligent machines.</p>
<p>- John McCarthy (Father of AI)</p>
</blockquote>
<p>In simple terms, AI is about teaching machines to think, learn and make decisions just like humans.</p>
<h3>⚖️AI vs Human Intelligence</h3>
<img src="https://cdn.hashnode.com/uploads/covers/6942cafec9d5320a12aa01b1/66c12497-5b1a-41d9-a2cd-93778b347035.png" alt="" style="display:block;margin:0 auto" />

<p>Now before proceeding further let's understand the difference between AI and human intelligence:</p>
<ul>
<li><p><strong>Learning:</strong> AI learns from data and algorithms, whereas humans learn from experience, emotions and reasoning.</p>
</li>
<li><p><strong>Creativity:</strong> Although AI can generate content (like art, music) but humans possess the originality, emotions and creativity.</p>
</li>
<li><p><strong>Speed &amp; Accuracy:</strong> AI is extremely fast and accurate in repetitive tasks while humans are slow and are prone to errors.</p>
</li>
<li><p><strong>Emotional Intelligence:</strong> AI does not understand emotions and feelings while humans have deep understanding and empathy.</p>
</li>
<li><p><strong>Decision Making:</strong> AI makes decisions fully based on data and logic whereas humans make decisions based on intuition, emotions and ethics.</p>
</li>
</ul>
<p>From the above , it is clear that the real power lies in the collaboration between humans and AI.</p>
<p>While AI handles speed, accuracy, data, humans provide meaning, creativity and ethical judgement that give purpose and direction to that data.</p>
<h3>🌎AI in Action: Real-World Applications</h3>
<p>AI is used in various industries all around the world:</p>
<ul>
<li><p><strong>Heathcare:</strong> Helps in early disease detection, diagnosis, and personalized treatment.</p>
</li>
<li><p><strong>Finance:</strong> Detects fraud, manages risks and makes smart investment decisions.</p>
</li>
<li><p><strong>E-commerce:</strong> Recommends products based on past shopping behavior and user preferences.</p>
</li>
<li><p><strong>Entertainment:</strong> Recommends shows, movies based on user interest.</p>
</li>
<li><p><strong>Manufacturing:</strong> Used to automate repetitive tasks thus improving efficiency and accuracy of the tasks.</p>
</li>
</ul>
<h3>⚠️Limitations and Challenges of AI</h3>
<p>Although AI improves speed, accuracy and automate tasks, but it also has its own limitations:</p>
<ul>
<li><p><strong>Bias in decision:</strong> AI models can become biased if trained on biased data.</p>
</li>
<li><p><strong>Data Dependency:</strong> It relies heavily on data, and poor quality data can lead to wrong predictions.</p>
</li>
<li><p><strong>Privacy concerns:</strong> AI systems save personal data of the user thus is prone to data leakage and misuse of personal information.</p>
</li>
<li><p><strong>Job displacement:</strong> AI is transforming the job market by replacing repetitive work and reshaping career paths.</p>
</li>
<li><p><strong>Lack of emotion:</strong> It lacks feelings and does not understand human emotions.</p>
</li>
</ul>
<p>Understanding limitations of AI is important to ensure that it is used responsibly.</p>
<h3>🚀Final Thoughts</h3>
<p>AI is no longer a concept of the future - it is a part of everyday life, shaping the way we interact with technology and the world.</p>
<p>The true power of AI does not lie in replacing humans, but in enhancing human potential and solving real-world problems.</p>
<h3>🔍Coming Up Next</h3>
<p>In the next part of this series, we will explore Machine Learning and understand how machines actually learn from data.</p>
]]></content:encoded></item></channel></rss>