considerHow you would write a spam filter using traditional programming techniques ( Figure 1-1 ):
First, you consider what spam typically looks like. You may notice certain words or phrases such as "4U," "credit card," "free," and "amazing" tend to appear in the subject line. You may also notice some other patterns in the sender name, email body, and other parts of the email.
You'll write a detection algorithm for each pattern you notice, and if more than one of these patterns is detected, your program will mark the email as spam.
You will test your program and repeat steps 1 and 2 until it is sufficient to start.
Since the problem is difficult, your program can become a long list of complex rules - hard to maintain.
In contrast, a machine-learning-based spam filter automatically learns which words and phrases predict spam well by detecting unusually frequent word patterns in the spam example versus the ham example ( Figure 1-2 ) . The program is shorter, easier to maintain, and most likely more accurate.
What if spammers notice that all their emails containing "4U" are blocked? They might start writing "For U". Spam filters using traditional programming techniques need to be updated to flag "For U" emails. If spammers keep bypassing your spam filter, you'll need to write new rules forever.
In contrast, a machine-learning-based spam filter automatically notices that "For U" is becoming unusually frequent in user-flagged spam, and starts flagging them without your intervention ( Figure 1-3 ). ).
Another highlight of machine learning is solving problems that are too complex for traditional methods or have no known algorithms. For example, consider speech recognition. Suppose you want to start simple and write a program that can distinguish between the words "one" and "two." You might notice that the word "two" starts with a treble ("T"), so you could hardcode an algorithm to measure the intensity of the treble and use that to distinguish between one and two - but obviously this technique won't scale to numbers Thousands of words spoken in dozens of languages by millions of different people in noisy environments. The best solution (at least today) is to write an algorithm that learns on its own from many example recordings of each word.
Finally, machine learning can help humans learn ( Figure 1-4 ). ML algorithms can be examined to see what they have learned (although this can be tricky for some algorithms). For example, once a spam filter has been trained on enough spam, it can be easily inspected to show the list of words and word combinations it thinks are the best predictors of spam. Sometimes this can reveal unexpected correlations or new trends that lead to better understanding of the problem. Applying ML techniques to mine large amounts of data can help uncover patterns that are not immediately apparent. This is called data mining .
All in all, machine learning is great for:
Problems where existing solutions require a lot of fine-tuning or long lists of rules: A machine learning algorithm can often simplify code and perform better than traditional methods.
Complex problems that cannot be solved using traditional methods: The best machine learning techniques may be able to find solutions.
Fluctuating environment: Machine learning systems can adapt to new data.
Gain insight into complex problems and large amounts of data.
let's see Among some concrete examples of machine learning tasks, and techniques that can solve them:
Analysis of product images on the production line, automatic sorting
This is image classification, usually performed using Convolutional Neural Networks (CNN; see Chapter 14 ).
Detecting tumors in brain scans
This is semantic segmentation, where each pixel in the image is classified (since we want to determine the exact location and shape of the tumor), usually using a CNN as well.
Automatically categorize news articles
This is Natural Language Processing (NLP), and more specifically text classification, which can be solved using Recurrent Neural Networks (RNN), CNNs, or Transformers (see Chapter 16 ).
Automatically flag offensive comments on forums
This is also text classification, using the same NLP tools.
Automatically summarize long documents
This is a branch of NLP called text summarization that also uses the same tools.
Create a chatbot or personal assistant
This involves many NLP components, including natural language understanding (NLU) and question answering modules.
Predict your company's revenue for the next year based on many performance metrics
This is a regression task (i.e. predicting values) and can be solved using any regression model, such as linear regression or polynomial regression models (see Chapter 4 ), regression SVMs (see Chapter 5 ), regression random forests (see Chapter 7) ), or artificial neural networks (see Chapter 10 ). If you want to account for sequences of past performance indicators, you may need to use RNNs, CNNs, or Transformers (see Sections 15 and 16 ).
Make your app react to voice commands
This is speech recognition, which requires processing audio samples: since they are long and complex sequences, they are usually processed using RNNs, cellular neural networks, or transformers (see pp. 15 and 16 ).
Detect Credit Card Fraud
This is anomaly detection (see Chapter 9 ).
Segment customers based on their purchases so you can design different marketing strategies for each segment
This is clustering (see Chapter 9 ).
Represent complex, high-dimensional datasets in clear and insightful diagrams
This is data visualization, usually involving dimensionality reduction techniques (see Chapter 8 ).
Recommend products that customers may be interested in based on past purchases
This is a recommender system. One way is to feed past purchases (and other information about customers) to an artificial neural network (see Chapter 10 ) and have it output the most likely next purchase. This neural network is typically trained on the sequence of past purchases of all customers.
Build smart bots for games
This is often addressed using reinforcement learning (RL; see Chapter 18 ), a branch of machine learning that trains an agent (e.g. a robot) to choose actions that will maximize its reward over time (e.g. a robot might Get a reward every time the player loses some life points), within a given environment (like a game). The famous AlphaGo program that beat the world champion at the game of Go was built using RL.
This list could go on and on, but hopefully it gives you an idea of the breadth and complexity of tasks that machine learning can handle, and the types of techniques you'll use for each.