Named Entity Recognition (NER)
Learn how to leverage the NER project types within the super.AI platform.
Named entity recognition (NER) is the task of identifying and categorizing named entities in text. Every detected entity is classified into a predetermined category. For example, an NER machine learning (ML) model might detect the word “super.AI” in a text and classify it as a “Company”.
The super.AI named entity recognition (NER) project type features granular design options for defining your entities, explaining how they are labeled, and deciding what is excluded from labeling. On this page, you will find details on each of the following:
- Entity classes
- Entity pre-labeling
- Custom entity recognition
- Parts of speech to exclude
- Task granularity
Entity classes
When you define an entity, you have to provide instructions to our human labelers on how to identify it within your text. There are three parts to creating entity classes:
- Entity name
- This can be anything you like, but it helps to be something quickly identifiable for our labelers, e.g.,
DOG
. Below, you can find information on some common entity names that our pre-labeling model accepts.
- This can be anything you like, but it helps to be something quickly identifiable for our labelers, e.g.,
- Explanation and examples
- Provide clear, simple information that will help our human labelers identify the entity in your text. For example, “A dog is any domesticated canine descended from the gray wolf, e.g., labrador, chihuahua, or rottweiler.”
- Parent entity
- If you select Yes on Has a parent entity?, you can then define the parent entity name. For example, a
DOG
entity might have anANIMAL
parent entity.
- If you select Yes on Has a parent entity?, you can then define the parent entity name. For example, a
- Model class your entity class maps to
- If you want to use a class supported by our entity pre-labeling but under a custom class name, you can map your custom name to the entity name supported by the pre-labeling model. For example, if you want to use the
PERSON
entity but haveHuman
as the class name, you can just enterHuman
as the Entity name and selectPERSON
from the The model class your entity class maps to dropdown and we'll map it for you.
- If you want to use a class supported by our entity pre-labeling but under a custom class name, you can map your custom name to the entity name supported by the pre-labeling model. For example, if you want to use the
Entity pre-labeling
The super.AI NER project type supports pre-labeling of entities. The entities in the table below are recognized by our pre-labeling model. You can set any of these as the model class that your entity maps to when designing your project and they will automatically be labeled before your text is sent to our human labelers. The names and descriptions come from OntoNotes 5.0.
Entity | Description |
---|---|
PERSON | People, including fictional characters |
NORP | Nationalities or religious or political groups |
FAC | Buildings, airports, highways, bridges, etc. |
ORG | Companies, agencies, institutions, etc. |
GPE | Countries, cities, states, etc. |
LOC | Non-GPE locations, e.g., mountain ranges and bodies of water |
PRODUCT | Objects, vehicles, foods, etc. (not services) |
EVENT | Named battles, wars, sports events, hurricanes, etc. |
WORK_OF_ART | Titles of books, songs, etc. |
LAW | Named documents made into laws |
LANGUAGE | Any named language |
DATE | Absolute (e.g., July 4, 2010 ) or relative (e.g., two weeks ago ) dates or periods |
TIME | Times shorter than a day |
PERCENT | Percentage, including % |
MONEY | Monetary values, including unit |
QUANTITY | Measurements, as of weight or distance |
ORDINAL | Numbers of order, e.g., first , second , etc. |
CARDINAL | Numerals that do not fall under another type |
Custom entity recognition
Define a search pattern (a text string or regex pattern) that you want consistently labeled with a specified entity name throughout any text you submit to us for labeling. You must also state whether the search pattern is an exact string or a regex pattern.
For example, if your text frequently features the name George Orwell
you can set that as a search pattern along with the entity name PERSON
, as in this screenshot:
If you don’t have any custom rules to apply, click the ‘x’ in the top right to remove the field.
Parts of speech to exclude
You can exclude any part-of-speech (POS) tag from labeling out of the options in the table below. You can find detailed explanations of each on the Universal Dependencies website.
Entities that contain masked parts of speech can still be labeled when at least one of the words that comprise the entity is not a masked part of speech. For example, if you enable POS tag masking for ADJ
, you can still label Del the Funky Homosapien
as PERSON
, even though Funky
is an adjective.
Part of speech | Examples |
---|---|
Adjective | big, old, green, African, incomprehensible, first |
Adposition | in, to, during |
Adverb | very, well, exactly, tomorrow, up, down, how, now |
Auxiliary | has (done), will (do), was (done), should (do), is (a teacher) |
Coordinating conjunction | and, or, but |
Determiner | a, an, the, this |
Interjection | psst, ouch, bravo, hello |
Noun | girl, cat, tree, air, beauty |
Numeral | 1, 2020, one, seventy-seven, II, MMXIV |
Particle | ’s, not, let’s, may you |
Pronoun | I, you, he, myself, themselves, who, nobody |
Proper noun | Mary, John, London, NATO, HBO |
Punctuation | ., (, ), ? |
Subordinating conjunction | that, if, while |
Symbol | $, %, §, ©, +, −, ×, ÷, =, <, >, :), ♥‿♥, 😝 |
Verb | run, eat, runs, ate, running, eating |
Other | xfgh pdl jklw |
Task granularity
These two settings let you define how labelers received your input data and how they label it.
Entity classes per task
Define whether labelers have to label all entities in the text as a single task or each labeler only labels one entity class per task. If your text input is quite dense with entities, it's better to select Separate tasks for each top-level entity.
Utterances per task
This lets you set the length of the dialogue users will label by limiting it to a certain number of utterances. We will break the input text up into chunks of this size to send to labelers.
Updated over 1 year ago