Named entity recognition

Named entity recognition (NER) is the task of identifying and categorizing named entities in text. Every detected entity is classified into a predetermined category. For example, an NER machine learning (ML) model might detect the word “super.AI” in a text and classify it as a “Company”.

The super.AI named entity recognition (NER) project type features granular design options for defining your entities, explaining how they are labeled, and deciding what is excluded from labeling. On this page, you will find details on each of the following:

  • Entity classes
    • Entity pre-labeling
  • Custom entity recognition
  • Parts of speech to exclude
  • Task granularity

Entity classes

When you define an entity, you have to provide instructions to our human labelers on how to identify it within your text. There are three parts to creating entity classes:

  1. Entity name
  2. Explanation and examples
    • Provide clear, simple information that will help our human labelers identify the entity in your text. For example, “A dog is any domesticated canine descended from the gray wolf, e.g., labrador, chihuahua, or rottweiler.”
  3. Parent entity
    • If you select Yes on Has a parent entity?, you can then define the parent entity name. For example, a DOG entity might have an ANIMAL parent entity.
  4. Model class your entity class maps to
    • If you want to use a class supported by our entity pre-labeling but under a custom class name, you can map your custom name to the entity name supported by the pre-labeling model. For example, if you want to use the PERSON entity but have Human as the class name, you can just enter Human as the Entity name and select PERSON from the The model class your entity class maps to dropdown and we'll map it for you.

Entity pre-labeling

The super.AI NER project type supports pre-labeling of entities. The entities in the table below are recognized by our pre-labeling model. You can set any of these as the model class that your entity maps to when designing your project and they will automatically be labeled before your text is sent to our human labelers. The names and descriptions come from OntoNotes 5.0.




People, including fictional characters


Nationalities or religious or political groups


Buildings, airports, highways, bridges, etc.


Companies, agencies, institutions, etc.


Countries, cities, states, etc.


Non-GPE locations, e.g., mountain ranges and bodies of water


Objects, vehicles, foods, etc. (not services)


Named battles, wars, sports events, hurricanes, etc.


Titles of books, songs, etc.


Named documents made into laws


Any named language


Absolute (e.g., July 4, 2010) or relative (e.g., two weeks ago) dates or periods


Times shorter than a day


Percentage, including %


Monetary values, including unit


Measurements, as of weight or distance


Numbers of order, e.g., first, second, etc.


Numerals that do not fall under another type

Custom entity recognition

Define a search pattern (a text string or regex pattern) that you want consistently labeled with a specified entity name throughout any text you submit to us for labeling. You must also state whether the search pattern is an exact string or a regex pattern.

For example, if your text frequently features the name George Orwell you can set that as a search pattern along with the entity name PERSON, as in this screenshot:


If you don’t have any custom rules to apply, click the ‘x’ in the top right to remove the field.

Parts of speech to exclude

You can exclude any part-of-speech (POS) tag from labeling out of the options in the table below. You can find detailed explanations of each on the Universal Dependencies website.

Entities that contain masked parts of speech can still be labeled when at least one of the words that comprise the entity is not a masked part of speech. For example, if you enable POS tag masking for ADJ, you can still label Del the Funky Homosapien as PERSON, even though Funky is an adjective.

Part of speech



big, old, green, African, incomprehensible, first


in, to, during


very, well, exactly, tomorrow, up, down, how, now


has (done), will (do), was (done), should (do), is (a teacher)

Coordinating conjunction

and, or, but


a, an, the, this


psst, ouch, bravo, hello


girl, cat, tree, air, beauty


1, 2020, one, seventy-seven, II, MMXIV


’s, not, let’s, may you


I, you, he, myself, themselves, who, nobody

Proper noun

Mary, John, London, NATO, HBO


., (, ), ?

Subordinating conjunction

that, if, while


$, %, §, ©, +, −, ×, ÷, =, <, >, :), ♥‿♥, 😝


run, eat, runs, ate, running, eating


xfgh pdl jklw

Task granularity

These two settings let you define how labelers received your input data and how they label it.

Entity classes per task

Define whether labelers have to label all entities in the text as a single task or each labeler only labels one entity class per task. If your text input is quite dense with entities, it's better to select Separate tasks for each top-level entity.

Utterances per task

This lets you set the length of the dialogue users will label by limiting it to a certain number of utterances. We will break the input text up into chunks of this size to send to labelers.

Did this page help you?