Natural language processing (NLP) is a component of AI that powers a number of everyday applications such as digital assistants like Siri or Alexa, GPS systems, and predictive texts on smartphones.
Earlier versions of NLP used rule-based computational linguistics with statistical methods and machine learning to understand and draw conclusions from social messages, reviews and other data. More recent approaches leverage neural networks and large-language models (LLMs) to accomplish the tasks below
To facilitate NLP, a number of sub-tasks are often conducted, including:
- Tokenization: Text is broken down into smaller, more digestible single clauses.
- Stemming: Words are broken down into root forms. For example, reading, reader, reads are stemmed into the word “read”.
- Lemmatization: Contextually similar words or degrees are reduced to their root word. For example, better, best and very good are reduced to “good”.
- Stop word removal: Words such as prepositions and articles are removed.
- Part-of-speech-tagging: Nouns, verbs, adjectives, adverbs, pronouns, etc. are tagged.
To facilitate conversational communication with a human, NLP employs two other sub-branches called natural language understanding (NLU) and natural language generation (NLG). NLU uses algorithms that analyze text to understand words contextually, while NLG assists in generating meaningful words as a human would. Together, they power intelligent chatbots such as ChatGPT.
Here are the main NLP techniques used in business and B2C environments.
- Text summarizations: NLP algorithms scan vast amounts of data and condense the information to provide a summary with key insights.
- Speech recognition: This technique analyzes audio data to translates it into text or maps it to known words. It’s used make closed captions and has been pivotal in empowering the hearing impaired.
- Machine translations: NLP can automatically translate words in different languages so that users can absorb non-native information with minimal effort. Google Translate is a good example
- Question answering systems: NLP algorithms scan data and search for relevant information to provide answers to a user. These systems can be rules-based or based on generative pre-trained models, like ChatGPT, that derive information by accessing publicly accessible data on the internet.
- Named entity recognition: Named entity recognition (NER) is an NLP technique that identifies and extracts entities such as people, locations, brands, objects, currencies and such.
- Semantic search: Semantic search is search technique that allows a user to retrieve information by understanding the intention of the search beyond just using keywords.
- Sentiment analysis: NLP algorithms that can categorize the emotions in a text to show whether it is positive, negative or neutral and to what extent.
- Aspect-based sentiment: This advanced technique analyzes sentiment in aspects that have been extracted from topics in a text. This fine-grained view of market sentiment can give brands insight into exactly where they need to improve and what aspects are going well.