Denis Jakus

making tech less cryptic

CTO x Unique People

Denis Jakus

making tech less cryptic

CTO x Unique People

CTO Note

Exploring the basics of Machine Learning5 min read

22 October 2023 AI/ML
Exploring the basics of Machine Learning5 min read

🤖 Machine learning is a groundbreaking field that’s revolutionizing the world of technology. Whether you’re an aspiring data scientist, a developer, or just a tech enthusiast, understanding the fundamentals of machine learning is a crucial step in unleashing its immense potential. In this blog post, we’ll explore the core concepts that underpin machine learning and set you on a path to mastering this exciting field.

What Is Machine Learning?

Machine learning is a cutting-edge technology that empowers computers and applications to learn from data without explicit programming. Instead of writing code to instruct the system on performing specific tasks, machine learning allows the software to analyze and understand patterns within the provided data. This data-driven approach enables applications to make predictions and classifications based on their learned information.

It is like teaching a computer to think for itself, akin to how a child’s brain develops through experiences and exposure to various information. Just as parents teach their children about the world, data scientists play a similar role by training machine learning models to comprehend and categorize information. These models are exposed to a diverse range of data, enabling them to recognize patterns and confidently make predictions.

Machine learning has transformative potential across various fields, extending beyond simple applications that classify photos. For example, it can assist doctors in diagnosing diseases, help in relationship matching, and optimize manufacturing processes. It operates through different techniques, such as supervised machine learning, where the model learns from labeled examples, and unsupervised machine learning, where it identifies patterns independently.

The structured data in machine learning models can take the form of decision trees, similar to branching logic, and artificial neural networks, which mimic the connections in the human brain. The primary goal of using machine learning in apps is to personalize the user experience. By integrating machine learning, apps can adapt to user behavior and offer tailored recommendations and predictions, enhancing user engagement and satisfaction.

Machine learning is becoming increasingly accessible and easy to implement, making it an exciting field to explore for developers.

Implementation Steps

Here are the fundamental steps to successfully implement machine learning into your projects:

Step 1: Get Data

The first essential step in machine learning is acquiring data. The beauty of machine learning is that you can train models using freely available datasets from the internet. These datasets are the foundation on which your machine-learning models will be built.

Here are the best online resources to get your datasets to apply machine learning:

  • Kaggle: Kaggle provides a vast container of datasets, sufficient for the enthusiast to the expert.
  • UCI Machine Learning Repository: The Machine Learning Repository at UCI provides an up-to-date resource for open-source datasets.
  • VisualData: Discover computer vision datasets by category; it allows searchable queries.
  • CMU Libraries: Discover high-quality datasets thanks to the collection of Huajin Wang, at CMU.
  • The Big Bad NLP Database: This cool dataset list contains datasets for various natural language processing tasks, created and curated by Quantum Stat.
  • MNIST Dataset: This is a database of handwritten digits. It contains 60,000 training images and 10,000 testing images. This is a perfect dataset to start implementing image classification where you can classify a digit from 0 to 9.
  • Credit Card Fraud Detection Dataset: The dataset contains transactions made by credit cards; they are labeled as fraudulent or genuine. This is important for companies that have transaction systems to build a model for detecting fraudulent activities.
  • Google’s Open Images: A vast dataset from Google AI containing over 10 million images.
  • Cityscapes Dataset: This is an open-source dataset for Computer Vision projects. It contains high-quality pixel-level annotations of video sequences taken in 50 different city streets. The dataset is useful in semantic segmentation and training deep neural networks to understand the urban scene.
  • Color Detection Dataset: The dataset contains a CSV file that has 865 color names with their corresponding RGB(red, green, and blue) values of the color. It also has the hexadecimal value of the color.
  • Lexicoder Sentiment Dictionary: This dataset is specific for sentiment analysis. The dataset contains over 3000 negative words and over 2000 positive sentiment words.
  • IMDB reviews: An interesting dataset with over 50,000 movie reviews from Kaggle.
  • Stanford Sentiment Treebank: Standard sentiment dataset with sentiment annotations.
  • The Big Bad NLP Database: This cool dataset list contains datasets for various natural language processing tasks, created and curated by Quantum Stat.
  • … and many many more which are available from this web source (this list is taken from a web: )

Step 2: Prepare Data

Once you have your data, it’s crucial to clean and organize it. This involves removing irrelevant columns and ensuring that the data is structured in a way that your machine-learning model can work with. Proper data preparation is essential for the next steps, as you’ll need it in a specific format when building your model.

Step 3: Split Data

With your data ready, it’s time to divide it into two subsets: the training data and the testing data. The majority of your data will be used for training your model, allowing it to learn patterns and associations. A smaller portion, usually around 10-20%, will be reserved for testing.

Step 4: Train the Model

Now, it’s time to expose your model to the labeled training data. For example, if you were building a model to classify flowers, you’d provide data about various flowers and tell the model their labels (e.g., “This is a tulip”). Machine learning algorithms will start identifying patterns and relationships based on this labeled data.

Step 5: Test and Improve

Once your model is trained, it’s time to put it to the test. Unlabeled test data is fed into the model, and it tries to make predictions. The results are compared to the actual labels to evaluate the accuracy of your model. During this stage, you’ll also assess the loss, which measures how well the model identifies elements. The closer the loss is to zero, the better your model is.

If you’re satisfied with the accuracy, you can use your model. However, the journey doesn’t end here. The final step is continuous improvement. There are various ways to enhance your model, from tweaking the data to adjusting neural network complexity. The beauty of machine learning is that there’s always room for improvement, just like our own learning and growth.


Machine learning is not only a powerful tool but also an exciting journey. Just like our brains constantly adapt and grow, machine learning models can always be improved. This dynamism is what makes machine learning so fascinating.

✨ Cheers,
Denis J.

Related Posts
Write a comment