Examples
PATTERNS
Input
The deadline for submission is 2023-07-15.
The event is scheduled for 07/15/2023.
The conference is on July 15th, 2023.
The meeting is scheduled for 2:30 PM. Please arrive by 9:00 AM.
The meeting starts at 14:30.
Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance.
It is available at β¬149.99 in Europe.
Simple Matching
Simple patterns are used to match specific entities that follow a certain format.
Patterns:
classes:
Date:
- "\d{4}-\d{2}-\d{2}"
- "\d{2}/\d{2}/\d{4}"
- "(?:January|February|March|April|May|June|July|August|September|October|November|December)\s\d{1,2}(?:st|nd|rd|th)?,\s\d{4}"
Time:
- "\d{1,2}:\d{2}\s?[APM]{2}"
- "\d{2}:\d{2}"
Phone:
- "(\+\d{1,3}\s)?\(?\d{3}\)?[-\.\s]?\d{3}[-.\s]?\d{4}"
Money:
- "[$β¬Β£]\d+(?:\.\d+)?"
Structured Text Output:
The deadline for submission is {@Date: 2023-07-15}.
The event is scheduled for {@Date: 07/15/2023}.
The conference is on {@Date: July 15th, 2023}.
The meeting is scheduled for {@Time: 2:30 PM}.
Please arrive by {@Time: 9:00 AM}.
The meeting starts at {@Time: 14:30}.
Contact us at {@Phone: +1 (555) 123-4567} or {@Phone: 123-456-7890} for assistance.
It is available at {@Money: β¬149.99} in Europe.
JSON Output
[
{
"text": "The deadline for submission is 2023-07-15.",
"entities": [
{
"class": "Date",
"text": "2023-07-15"
}
]
},
{
"text": "The event is scheduled for 07/15/2023.",
"entities": [
{
"class": "Date",
"text": "07/15/2023"
}
]
},
{
"text": "The conference is on July 15th, 2023.",
"entities": [
{
"class": "Date",
"text": "July 15th, 2023"
}
]
},
{
"text": "The meeting is scheduled for 2:30 PM. ",
"entities": [
{
"class": "Time",
"text": "2:30 PM"
}
]
},
{
"text": "Please arrive by 9:00 AM.",
"entities": [
{
"class": "Time",
"text": "9:00 AM"
}
]
},
{
"text": "The meeting starts at 14:30.",
"entities": [
{
"class": "Time",
"text": "14:30"
}
]
},
{
"text": "Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance.",
"entities": [
{
"class": "Phone",
"text": "+1 (555) 123-4567"
},
{
"class": "Phone",
"text": "123-456-7890"
}
]
},
{
"text": "It is available at β¬149.99 in Europe.",
"entities": [
{
"class": "Money",
"text": "β¬149.99"
}
]
}
]
Fields
Fields can be used to extract specific attributes of entities.
Patterns:
classes:
Date:
- "{#year: \d{4}}-{#month: \d{2}}-{#day: \d{2}}"
- "{#month: \d{2}}/{#day: \d{2}}/{#year: \d{4}}"
- "{#month: (?:January|February|March|April|May|June|July|August|September|October|November|December)}\s{#day: \d{1,2}}(?:st|nd|rd|th)?,\s{#year: \d{4}}"
Time:
- "{#hours: \d{1,2}}:{#minutes: \d{2}}\s?{#period: [APM]{2}}"
- "{#hours: \d{2}}:{#minutes: \d{2}}"
Money:
- "{#currency: [$β¬Β£]}{#value: \d+(?:\.\d+)?}"
Structured Text Output:
The deadline for submission is {@Date: {#year: 2023}-{#month: 07}-{#day: 15}}.
The event is scheduled for {@Date: {#month: 07}/{#day: 15}/{#year: 2023}}.
The conference is on {@Date: {#month: July} {#day: 15}th, {#year: 2023}}.
The meeting is scheduled for {@Time: {#hours: 2}:{#minutes: 30} {#period: PM}}.
Please arrive by {@Time: {#hours: 9}:{#minutes: 00} {#period: AM}}.
The meeting starts at {@Time: {#hours: 14}:{#minutes: 30}}.
Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance.
It is available at {@Money: {#currency: β¬}{#value: 149.99}} in Europe.
JSON Output
[
{
"text": "The deadline for submission is 2023-07-15.",
"entities": [
{
"class": "Date",
"text": "2023-07-15",
"entities": [
{
"field": "year",
"text": "2023"
},
{
"field": "month",
"text": "07"
},
{
"field": "day",
"text": "15"
}
]
}
]
},
{
"text": "The event is scheduled for 07/15/2023.",
"entities": [
{
"class": "Date",
"text": "07/15/2023",
"entities": [
{
"field": "month",
"text": "07"
},
{
"field": "day",
"text": "15"
},
{
"field": "year",
"text": "2023"
}
]
}
]
},
{
"text": "The conference is on July 15th, 2023.",
"entities": [
{
"class": "Date",
"text": "July 15th, 2023",
"entities": [
{
"field": "month",
"text": "July"
},
{
"field": "day",
"text": "15"
},
{
"field": "year",
"text": "2023"
}
]
}
]
},
{
"text": "The meeting is scheduled for 2:30 PM. ",
"entities": [
{
"class": "Time",
"text": "2:30 PM",
"entities": [
{
"field": "hours",
"text": "2"
},
{
"field": "minutes",
"text": "30"
},
{
"field": "period",
"text": "PM"
}
]
}
]
},
{
"text": "Please arrive by 9:00 AM.",
"entities": [
{
"class": "Time",
"text": "9:00 AM",
"entities": [
{
"field": "hours",
"text": "9"
},
{
"field": "minutes",
"text": "00"
},
{
"field": "period",
"text": "AM"
}
]
}
]
},
{
"text": "The meeting starts at 14:30.",
"entities": [
{
"class": "Time",
"text": "14:30",
"entities": [
{
"field": "hours",
"text": "14"
},
{
"field": "minutes",
"text": "30"
}
]
}
]
},
{
"text": "Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance."
},
{
"text": "It is available at β¬149.99 in Europe.",
"entities": [
{
"class": "Money",
"text": "β¬149.99",
"entities": [
{
"field": "currency",
"text": "β¬"
},
{
"field": "value",
"text": "149.99"
}
]
}
]
}
]
Classes
Classes can use other classes in their patterns. When a class uses another class in its pattern, it will match any instance of its child classes.
Patterns:
classes:
Number:
Real:
- "\d+\.\d+"
Integer:
- "\d+"
Month:
1:
- January
2:
- February
3:
- March
4:
- April
5:
- May
6:
- June
7:
- July
8:
- August
9:
- September
10:
- October
11:
- November
12:
- December
Date:
- "{#year: {@Number.Integer}}-{#month: {@Number.Integer}}-{#day: {@Number.Integer}}"
- "{#month: {@Number.Integer}}/{#day: {@Number.Integer}}/{#year: {@Number.Integer}}"
- "{#month: {@Month}}\s{#day: {@Number.Integer}}(?:st|nd|rd|th)?,\s{#year: {@Number.Integer}}"
TimePeriod:
AM:
- "(?-i)AM"
PM:
- "(?-i)PM"
Time:
- "{#hours: {@Number.Integer}}:{#minutes: {@Number.Integer}}\s?{#period: {@TimePeriod}}"
- "{#hours: {@Number.Integer}}:{#minutes: {@Number.Integer}}"
Currency:
USD:
- $
EUR:
- β¬
POUND:
- Β£
Money:
- "{@Currency}{@Number}"
Structured Text Output:
The deadline for submission is {@Date: {#year: {@Number.Integer: 2023}}-{#month: {@Number.Integer: 07}}-{#day: {@Number.Integer: 15}}}.
The event is scheduled for {@Date: {#month: {@Number.Integer: 07}}/{#day: {@Number.Integer: 15}}/{#year: {@Number.Integer: 2023}}}.
The conference is on {@Date: {#month: {@Month.7: July}} {#day: {@Number.Integer: 15}}th, {#year: {@Number.Integer: 2023}}}.
The meeting is scheduled for {@Time: {#hours: {@Number.Integer: 2}}:{#minutes: {@Number.Integer: 30}} {#period: {@TimePeriod.PM: PM}}}.
Please arrive by {@Time: {#hours: {@Number.Integer: 9}}:{#minutes: {@Number.Integer: 00}} {#period: {@TimePeriod.AM: AM}}}.
The meeting starts at {@Time: {#hours: {@Number.Integer: 14}}:{#minutes: {@Number.Integer: 30}}}.
Contact us at +{@Number.Integer: 1} ({@Number.Integer: 555}) {@Number.Integer: 123}-{@Number.Integer: 4567} or {@Date: {#year: {@Number.Integer: 123}}-{#month: {@Number.Integer: 456}}-{#day: {@Number.Integer: 7890}}} for assistance.
It is available at {@Money: {@Currency.EUR: β¬}{@Number.Real: 149.99}} in Europe.
JSON Output
[
{
"text": "The deadline for submission is 2023-07-15.",
"entities": [
{
"class": "Date",
"text": "2023-07-15",
"entities": [
{
"field": "year",
"class": "Number.Integer",
"text": "2023"
},
{
"field": "month",
"class": "Number.Integer",
"text": "07"
},
{
"field": "day",
"class": "Number.Integer",
"text": "15"
}
]
}
]
},
{
"text": "The event is scheduled for 07/15/2023.",
"entities": [
{
"class": "Date",
"text": "07/15/2023",
"entities": [
{
"field": "month",
"class": "Number.Integer",
"text": "07"
},
{
"field": "day",
"class": "Number.Integer",
"text": "15"
},
{
"field": "year",
"class": "Number.Integer",
"text": "2023"
}
]
}
]
},
{
"text": "The conference is on July 15th, 2023.",
"entities": [
{
"class": "Date",
"text": "July 15th, 2023",
"entities": [
{
"field": "month",
"class": "Month.7",
"text": "July"
},
{
"field": "day",
"class": "Number.Integer",
"text": "15"
},
{
"field": "year",
"class": "Number.Integer",
"text": "2023"
}
]
}
]
},
{
"text": "The meeting is scheduled for 2:30 PM. ",
"entities": [
{
"class": "Time",
"text": "2:30 PM",
"entities": [
{
"field": "hours",
"class": "Number.Integer",
"text": "2"
},
{
"field": "minutes",
"class": "Number.Integer",
"text": "30"
},
{
"field": "period",
"class": "TimePeriod.PM",
"text": "PM"
}
]
}
]
},
{
"text": "Please arrive by 9:00 AM.",
"entities": [
{
"class": "Time",
"text": "9:00 AM",
"entities": [
{
"field": "hours",
"class": "Number.Integer",
"text": "9"
},
{
"field": "minutes",
"class": "Number.Integer",
"text": "00"
},
{
"field": "period",
"class": "TimePeriod.AM",
"text": "AM"
}
]
}
]
},
{
"text": "The meeting starts at 14:30.",
"entities": [
{
"class": "Time",
"text": "14:30",
"entities": [
{
"field": "hours",
"class": "Number.Integer",
"text": "14"
},
{
"field": "minutes",
"class": "Number.Integer",
"text": "30"
}
]
}
]
},
{
"text": "Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance.",
"entities": [
{
"class": "Number.Integer",
"text": "1"
},
{
"class": "Number.Integer",
"text": "555"
},
{
"class": "Number.Integer",
"text": "123"
},
{
"class": "Number.Integer",
"text": "4567"
},
{
"class": "Date",
"text": "123-456-7890",
"entities": [
{
"field": "year",
"class": "Number.Integer",
"text": "123"
},
{
"field": "month",
"class": "Number.Integer",
"text": "456"
},
{
"field": "day",
"class": "Number.Integer",
"text": "7890"
}
]
}
]
},
{
"text": "It is available at β¬149.99 in Europe.",
"entities": [
{
"class": "Money",
"text": "β¬149.99",
"entities": [
{
"class": "Currency.EUR",
"text": "β¬"
},
{
"class": "Number.Real",
"text": "149.99"
}
]
}
]
}
]
Sample-based Patterns
Users can define classes with associated patterns and provide specific sample instances that the system will use to recognize and tag similar occurrences in the text.
Patterns:
classes:
Number:
Real:
- "\d+\.\d+"
Integer:
- "\d+"
Month:
1:
- January
2:
- February
3:
- March
4:
- April
5:
- May
6:
- June
7:
- July
8:
- August
9:
- September
10:
- October
11:
- November
12:
- December
Date:
- "~{#year: 2000}-{#month: 05}-{#day: 10}"
- "~{#month: 07}/{#day: 12}/{#year: 2020}"
- "~{#month: November} {#day: 1}(?:st|nd|rd|th)?, {#year: 1991}"
TimePeriod:
AM:
- "(?-i)AM"
PM:
- "(?-i)PM"
Time:
- "~{#hours: 1}:{#minutes: 30}\s?{#period: AM}"
- "~{#hours: 13}:{#minutes: 00}"
Currency:
USD:
- $
EUR:
- β¬
POUND:
- Β£
Money:
- "~β¬1"
MeasurementUnit:
British:
Foot:
- foot
- feet
- ft
Inch:
- inch(es)?
SI:
Meter:
- meters?
- metres?
Centimeter:
- centimeters?
- centimetres?
Measurement:
- "~1 meter"
Structured Text Output:
The deadline for submission is {@Date: {#year: {@Number.Integer: 2023}}-{#month: {@Number.Integer: 07}}-{#day: {@Number.Integer: 15}}}.
The event is scheduled for {@Date: {#month: {@Number.Integer: 07}}/{#day: {@Number.Integer: 15}}/{#year: {@Number.Integer: 2023}}}.
The conference is on {@Date: {#month: {@Month.7: July}} {#day: {@Number.Integer: 15}}th, {#year: {@Number.Integer: 2023}}}.
The meeting is scheduled for {@Time: {#hours: {@Number.Integer: 2}}:{#minutes: {@Number.Integer: 30}} {#period: {@TimePeriod.PM: PM}}}.
Please arrive by {@Time: {#hours: {@Number.Integer: 9}}:{#minutes: {@Number.Integer: 00}} {#period: {@TimePeriod.AM: AM}}}.
The meeting starts at {@Time: {#hours: {@Number.Integer: 14}}:{#minutes: {@Number.Integer: 30}}}.
Contact us at +{@Number.Integer: 1} ({@Number.Integer: 555}) {@Number.Integer: 123}-{@Number.Integer: 4567} or {@Date: {#year: {@Number.Integer: 123}}-{#month: {@Number.Integer: 456}}-{#day: {@Number.Integer: 7890}}} for assistance.
It is available at {@Money: {@Currency.EUR: β¬}{@Number.Real: 149.99}} in Europe.
JSON Output
[
{
"text": "The deadline for submission is 2023-07-15.",
"entities": [
{
"class": "Date",
"text": "2023-07-15",
"entities": [
{
"field": "year",
"class": "Number.Integer",
"text": "2023"
},
{
"field": "month",
"class": "Number.Integer",
"text": "07"
},
{
"field": "day",
"class": "Number.Integer",
"text": "15"
}
]
}
]
},
{
"text": "The event is scheduled for 07/15/2023.",
"entities": [
{
"class": "Date",
"text": "07/15/2023",
"entities": [
{
"field": "month",
"class": "Number.Integer",
"text": "07"
},
{
"field": "day",
"class": "Number.Integer",
"text": "15"
},
{
"field": "year",
"class": "Number.Integer",
"text": "2023"
}
]
}
]
},
{
"text": "The conference is on July 15th, 2023.",
"entities": [
{
"class": "Date",
"text": "July 15th, 2023",
"entities": [
{
"field": "month",
"class": "Month.7",
"text": "July"
},
{
"field": "day",
"class": "Number.Integer",
"text": "15"
},
{
"field": "year",
"class": "Number.Integer",
"text": "2023"
}
]
}
]
},
{
"text": "The meeting is scheduled for 2:30 PM. ",
"entities": [
{
"class": "Time",
"text": "2:30 PM",
"entities": [
{
"field": "hours",
"class": "Number.Integer",
"text": "2"
},
{
"field": "minutes",
"class": "Number.Integer",
"text": "30"
},
{
"field": "period",
"class": "TimePeriod.PM",
"text": "PM"
}
]
}
]
},
{
"text": "Please arrive by 9:00 AM.",
"entities": [
{
"class": "Time",
"text": "9:00 AM",
"entities": [
{
"field": "hours",
"class": "Number.Integer",
"text": "9"
},
{
"field": "minutes",
"class": "Number.Integer",
"text": "00"
},
{
"field": "period",
"class": "TimePeriod.AM",
"text": "AM"
}
]
}
]
},
{
"text": "The meeting starts at 14:30.",
"entities": [
{
"class": "Time",
"text": "14:30",
"entities": [
{
"field": "hours",
"class": "Number.Integer",
"text": "14"
},
{
"field": "minutes",
"class": "Number.Integer",
"text": "30"
}
]
}
]
},
{
"text": "Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance.",
"entities": [
{
"class": "Number.Integer",
"text": "1"
},
{
"class": "Number.Integer",
"text": "555"
},
{
"class": "Number.Integer",
"text": "123"
},
{
"class": "Number.Integer",
"text": "4567"
},
{
"class": "Date",
"text": "123-456-7890",
"entities": [
{
"field": "year",
"class": "Number.Integer",
"text": "123"
},
{
"field": "month",
"class": "Number.Integer",
"text": "456"
},
{
"field": "day",
"class": "Number.Integer",
"text": "7890"
}
]
}
]
},
{
"text": "It is available at β¬149.99 in Europe.",
"entities": [
{
"class": "Money",
"text": "β¬149.99",
"entities": [
{
"class": "Currency.EUR",
"text": "β¬"
},
{
"class": "Number.Real",
"text": "149.99"
}
]
}
]
}
]
More examples on demo.cariochi.com
Last updated