Named entity recognition (NER) is the task of identifying and categorizing named entities in text. Every detected entity is classified into a predetermined category. For example, an NER machine learning (ML) model might detect the word “super.AI” in a text and classify it as a “Company”.

The super.AI named entity recognition (NER) project type features granular design options for defining your entities, explaining how they are labeled, and deciding what is excluded from labeling. On this page, you will find details on each of the following:

Entity classes
- Entity pre-labeling
Custom entity recognition
Parts of speech to exclude
Task granularity

Entity classes

When you define an entity, you have to provide instructions to our human labelers on how to identify it within your text. There are three parts to creating entity classes:

Entity name
- This can be anything you like, but it helps to be something quickly identifiable for our labelers, e.g., DOG. Below, you can find information on some common entity names that our pre-labeling model accepts.
Explanation and examples
- Provide clear, simple information that will help our human labelers identify the entity in your text. For example, “A dog is any domesticated canine descended from the gray wolf, e.g., labrador, chihuahua, or rottweiler.”
Parent entity
- If you select Yes on Has a parent entity?, you can then define the parent entity name. For example, a DOG entity might have an ANIMAL parent entity.
Model class your entity class maps to
- If you want to use a class supported by our entity pre-labeling but under a custom class name, you can map your custom name to the entity name supported by the pre-labeling model. For example, if you want to use the PERSON entity but have Human as the class name, you can just enter Human as the Entity name and select PERSON from the The model class your entity class maps to dropdown and we'll map it for you.

Entity pre-labeling

The super.AI NER project type supports pre-labeling of entities. The entities in the table below are recognized by our pre-labeling model. You can set any of these as the model class that your entity maps to when designing your project and they will automatically be labeled before your text is sent to our human labelers. The names and descriptions come from OntoNotes 5.0.

Entity	Description
PERSON	People, including fictional characters
NORP	Nationalities or religious or political groups
FAC	Buildings, airports, highways, bridges, etc.
ORG	Companies, agencies, institutions, etc.
GPE	Countries, cities, states, etc.
LOC	Non-GPE locations, e.g., mountain ranges and bodies of water
PRODUCT	Objects, vehicles, foods, etc. (not services)
EVENT	Named battles, wars, sports events, hurricanes, etc.
WORK_OF_ART	Titles of books, songs, etc.
LAW	Named documents made into laws
LANGUAGE	Any named language
DATE	Absolute (e.g., `July 4, 2010`) or relative (e.g., `two weeks ago`) dates or periods
TIME	Times shorter than a day
PERCENT	Percentage, including `%`
MONEY	Monetary values, including unit
QUANTITY	Measurements, as of weight or distance
ORDINAL	Numbers of order, e.g., `first`, `second`, etc.
CARDINAL	Numerals that do not fall under another type

Custom entity recognition

Define a search pattern (a text string or regex pattern) that you want consistently labeled with a specified entity name throughout any text you submit to us for labeling. You must also state whether the search pattern is an exact string or a regex pattern.

For example, if your text frequently features the name George Orwell you can set that as a search pattern along with the entity name PERSON, as in this screenshot:

If you don’t have any custom rules to apply, click the ‘x’ in the top right to remove the field.

Parts of speech to exclude

You can exclude any part-of-speech (POS) tag from labeling out of the options in the table below. You can find detailed explanations of each on the Universal Dependencies website.

Entities that contain masked parts of speech can still be labeled when at least one of the words that comprise the entity is not a masked part of speech. For example, if you enable POS tag masking for ADJ, you can still label Del the Funky Homosapien as PERSON, even though Funky is an adjective.

Part of speech	Examples
Adjective	big, old, green, African, incomprehensible, first
Adposition	in, to, during
Adverb	very, well, exactly, tomorrow, up, down, how, now
Auxiliary	has (done), will (do), was (done), should (do), is (a teacher)
Coordinating conjunction	and, or, but
Determiner	a, an, the, this
Interjection	psst, ouch, bravo, hello
Noun	girl, cat, tree, air, beauty
Numeral	1, 2020, one, seventy-seven, II, MMXIV
Particle	’s, not, let’s, may you
Pronoun	I, you, he, myself, themselves, who, nobody
Proper noun	Mary, John, London, NATO, HBO
Punctuation	., (, ), ?
Subordinating conjunction	that, if, while
Symbol	$, %, §, ©, +, −, ×, ÷, =, <, >, :), ♥‿♥, 😝
Verb	run, eat, runs, ate, running, eating
Other	xfgh pdl jklw

Task granularity

These two settings let you define how labelers received your input data and how they label it.

Entity classes per task

Define whether labelers have to label all entities in the text as a single task or each labeler only labels one entity class per task. If your text input is quite dense with entities, it's better to select Separate tasks for each top-level entity.

Utterances per task

This lets you set the length of the dialogue users will label by limiting it to a certain number of utterances. We will break the input text up into chunks of this size to send to labelers.