Named entity recognition

Named entity recognition (NER) is the task of identifying and categorizing named entities in text. Every detected entity is classified into a predetermined category. For example, an NER machine learning (ML) model might detect the word “super.AI” in a text and classify it as a “Company”.

The super.AI named entity recognition (NER) project type features granular design options for defining your entities, explaining how they are labeled, and deciding what is excluded from labeling. On this page, you will find details on each of the following:

  • Entity classes
    • Entity pre-labeling
  • Custom entity recognition
  • Parts of speech to exclude
  • Task granularity

Entity classes

When you define an entity, you have to provide instructions to our human labelers on how to identify it within your text. There are three parts to creating entity classes:

  1. Entity name
  2. Explanation and examples
    • Provide clear, simple information that will help our human labelers identify the entity in your text. For example, “A dog is any domesticated canine descended from the gray wolf, e.g., labrador, chihuahua, or rottweiler.”
  3. Parent entity
    • If you select Yes on Has a parent entity?, you can then define the parent entity name. For example, a DOG entity might have an ANIMAL parent entity.
  4. Model class your entity class maps to
    • If you want to use a class supported by our entity pre-labeling but under a custom class name, you can map your custom name to the entity name supported by the pre-labeling model. For example, if you want to use the PERSON entity but have Human as the class name, you can just enter Human as the Entity name and select PERSON from the The model class your entity class maps to dropdown and we'll map it for you.

Entity pre-labeling

The super.AI NER project type supports pre-labeling of entities. The entities in the table below are recognized by our pre-labeling model. You can set any of these as the model class that your entity maps to when designing your project and they will automatically be labeled before your text is sent to our human labelers. The names and descriptions come from OntoNotes 5.0.

Entity

Description

PERSON

People, including fictional characters

NORP

Nationalities or religious or political groups

FAC

Buildings, airports, highways, bridges, etc.

ORG

Companies, agencies, institutions, etc.

GPE

Countries, cities, states, etc.

LOC

Non-GPE locations, e.g., mountain ranges and bodies of water

PRODUCT

Objects, vehicles, foods, etc. (not services)

EVENT

Named battles, wars, sports events, hurricanes, etc.

WORK_OF_ART

Titles of books, songs, etc.

LAW

Named documents made into laws

LANGUAGE

Any named language

DATE

Absolute (e.g., July 4, 2010) or relative (e.g., two weeks ago) dates or periods

TIME

Times shorter than a day

PERCENT

Percentage, including %

MONEY

Monetary values, including unit

QUANTITY

Measurements, as of weight or distance

ORDINAL

Numbers of order, e.g., first, second, etc.

CARDINAL

Numerals that do not fall under another type

Custom entity recognition

Define a search pattern (a text string or regex pattern) that you want consistently labeled with a specified entity name throughout any text you submit to us for labeling. You must also state whether the search pattern is an exact string or a regex pattern.

For example, if your text frequently features the name George Orwell you can set that as a search pattern along with the entity name PERSON, as in this screenshot:

720720

If you don’t have any custom rules to apply, click the ‘x’ in the top right to remove the field.

Parts of speech to exclude

You can exclude any part-of-speech (POS) tag from labeling out of the options in the table below. You can find detailed explanations of each on the Universal Dependencies website.

Entities that contain masked parts of speech can still be labeled when at least one of the words that comprise the entity is not a masked part of speech. For example, if you enable POS tag masking for ADJ, you can still label Del the Funky Homosapien as PERSON, even though Funky is an adjective.

Part of speech

Examples

Adjective

big, old, green, African, incomprehensible, first

Adposition

in, to, during

Adverb

very, well, exactly, tomorrow, up, down, how, now

Auxiliary

has (done), will (do), was (done), should (do), is (a teacher)

Coordinating conjunction

and, or, but

Determiner

a, an, the, this

Interjection

psst, ouch, bravo, hello

Noun

girl, cat, tree, air, beauty

Numeral

1, 2020, one, seventy-seven, II, MMXIV

Particle

’s, not, let’s, may you

Pronoun

I, you, he, myself, themselves, who, nobody

Proper noun

Mary, John, London, NATO, HBO

Punctuation

., (, ), ?

Subordinating conjunction

that, if, while

Symbol

$, %, §, ©, +, −, ×, ÷, =, <, >, :), ♥‿♥, 😝

Verb

run, eat, runs, ate, running, eating

Other

xfgh pdl jklw

Task granularity

These two settings let you define how labelers received your input data and how they label it.

Entity classes per task

Define whether labelers have to label all entities in the text as a single task or each labeler only labels one entity class per task. If your text input is quite dense with entities, it's better to select Separate tasks for each top-level entity.

Utterances per task

This lets you set the length of the dialogue users will label by limiting it to a certain number of utterances. We will break the input text up into chunks of this size to send to labelers.


Did this page help you?