machine learning and text classification

Machine Learning and Text Classification

Words can be deceiving. Ironic phrases, double negations: often a single word can change the meaning of an entire sentence. Just an “intellectual delicacy”, typical of the human brain, can understand them.

But, what if a machine manages to overcome this node, through textual classifiers capable of working at higher semantic levels, without human intervention? Yes, that’s what machine learning does!

Would you like to know how a Digital Consumer Intelligence approach can help your Business?

Guarda i nostri webinar

Webinars are held in Italian

Machine Learning and Text Classification: the two sides of the same coin

Machine learning is a technique of data analysis that automates the construction of analytical models based on the principle that computers can learn to perform specific tasks without being programmed for doing that, thanks to the recognition of patterns between data. 

Part of Artificial Intelligence, Machine Learning uses algorithms that learn from data iteratively. For example, it allows computers to find information, even unknown, without being explicitly told where to look for it.

You might think that is a mystery .. instead no! The most important aspect of machine learning is repetitiveness because the more the models are exposed to the data, the more they are able to adapt independently to produce results and make decisions that are reliable and replicable.

The ability to apply complex mathematical calculations to big data is a more recent development. Today, in fact, machine learning is used by various sectors and we are surrounded by examples:

  • The self-driving car: development of artificial intelligence and machine learning.
  • Suggestions for online offers such as Amazon or Netflix? The application of machine learning to daily life.
  • Interception of a fraud? One of its less obvious but increasingly frequent uses.
  • To know what customers say about your company on Twitter? Machine learning combined with the creation of linguistic rules.

Machine learning for each type of industry

Many industries that work with large volumes of data have recognized the value of machine learning technology. By collecting information from data, even in real-time, organizations are able to work more efficiently and acquire a competitive advantage over competitors.

Financial services

Banks and other financial institutions use machine learning technologies with two main purposes: to identify important information in the data and to prevent fraud. The information can identify investment opportunities and help investors know when to act.

Public administration

Public entities that deal, for example, with public security or services, have a particular need for machine learning, having multiple sources of data available that can be useful, for example, to reduce the phenomenon of identity theft.


Wearable devices and sensors that use data to check a patient’s health status in real-time.


The tools in machine learning for data analysis and modeling are useful for delivery companies, public transport and other transport companies to rely on, for example, creating more efficient routes and on predicting potential problems.

Marketing and sales

The retail trade is based on machine learning to stores, analyze and use data to personalize the shopping experience or marketing campaigns.

What are the most used methods of machine learning?

There are various machine learning methods, but the best known and most adopted are supervised learning and unsupervised learning.

Supervised learning

It consists of providing the machine’s computer system with a series of specific and coded notions, like models and examples that allow you to build a real database of information and experiences. In this case, the classification logic is given to the machine as input.

In this way, when the machine is faced with a problem, it has to draw on the experiences entered in its system, analyze them, and decide which answer to give based on already coded experiences.

This type of learning is supplied already packaged and the machine must only be able to choose which is the best response to the stimulus it is given.

Algorithms that use supervised learning are used in many areas, such as in medicine: to give an example of it, if a doctor has an interest in knowing if a patient is sick or not with a certain disease,  previous series of cases are given to the machine, where the sick/healthy outcome has already been defined. Based on these cases, the system is trained to recognize the outcome in new cases not yet defined.

Unsupervised learning

In this type of learning, the machine is responsible for finding a classification logic. The system is therefore not given the “right answer”. The algorithm must find out what is shown to it. The goal is to explore the data and identify some internal structure. 

For example, it can identify consumers with similar characteristics to whom specific marketing campaigns can be addressed. Or it can discover the main features that differentiate consumer segments from others.

These algorithms are also used to segment textual topics, recommend products or identify outliers. For example, if you have a population to be classified into groups, depending on defined characteristics, heuristics or algorithms made by the human being are used to define the groupings.

Text Classification

In the case of working with text Classifiers, a comparison could be made between dictionary-based methods and neural network-based methods.

For example, using a Lexicon approach assigns a pre-calculated weight indicating the sentiment of the single word (10 = super positive; 1 = super negative) and it is calculated the average of each word to obtain the sentiment of the main sentence. This approach can have problems in case of negation, especially if it is far from the weighted word. Example: “It’s not true that football is fantastic.” A lexicon-based approach would give a positive sentiment to this sentence, as the “not” can’t deny “fantastic”,  because the negation is put several words first. Furthermore, with this approach, it is impossible to understand the ironic sentences.

A neural network approach, like a machine learning classifier, is capable of overcoming these problems because it works on a more abstract semantic level. With machine learning, it is possible to train the classifier by hand and make AI understand that that type of sentence has a positive sentiment. Accuracy is very high in semantic textual analysis, as a classifier of this type reads the whole, not every single weighted word.

An important solution to allow you to have a classifier that is the extension of the mind and hand of man, who is able to grasp the irony, the double negation, the feeling … just like a human being would do. A classifier that could be useful in many situations, above all in the identification and classification of social media contents, where often the semantic meaning of a sentence can deceive a machine not trained with a neural approach.

Stay in touch with us! Subscribe to our Newsletter…