N-grams

← Back to Glossary

What are N-grams?

N-grams are contiguous sequences of n items from a given sample of text or speech. In the context of natural language processing, an n-gram is a sequence of n words or characters. N-grams are used to capture the linguistic structure in a text, such as word or character dependencies, and can be employed in various NLP tasks, such as language modeling, text classification, and information retrieval.

Examples of N-grams:

Unigrams (n = 1): Single words or characters, e.g., “the”, “cat”, “sat”.
Bigrams (n = 2): Sequences of two words or characters, e.g., “the cat”, “cat sat”, “sat on”.
Trigrams (n = 3): Sequences of three words or characters, e.g., “the cat sat”, “cat sat on”, “sat on the”.

Resources to learn more about N-grams:

N-grams and how to implement it in python, a tutorial on N-grams and their implementation in Python.
What is N-grams by Kavitta Ganesan, an article explaining the concept of N-grams.
Understanding word N-grams and N-grams probability, an article discussing word N-grams and their probabilities in natural language processing.

Saturn Cloud