Examples

PATTERNS

Input

The deadline for submission is 2023-07-15.
The event is scheduled for 07/15/2023.
The conference is on July 15th, 2023.
The meeting is scheduled for 2:30 PM. Please arrive by 9:00 AM.
The meeting starts at 14:30.
Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance.
It is available at €149.99 in Europe.

Simple Matching

Simple patterns are used to match specific entities that follow a certain format.

Patterns:
classes:
 
  Date:
    - "\d{4}-\d{2}-\d{2}"
    - "\d{2}/\d{2}/\d{4}"
    - "(?:January|February|March|April|May|June|July|August|September|October|November|December)\s\d{1,2}(?:st|nd|rd|th)?,\s\d{4}"

  Time:
    - "\d{1,2}:\d{2}\s?[APM]{2}"
    - "\d{2}:\d{2}"

  Phone:
    - "(\+\d{1,3}\s)?\(?\d{3}\)?[-\.\s]?\d{3}[-.\s]?\d{4}"
    
  Money:
    - "[$€£]\d+(?:\.\d+)?"
Structured Text Output:
The deadline for submission is {@Date: 2023-07-15}.
The event is scheduled for {@Date: 07/15/2023}.
The conference is on {@Date: July 15th, 2023}.
The meeting is scheduled for {@Time: 2:30 PM}.
Please arrive by {@Time: 9:00 AM}.
The meeting starts at {@Time: 14:30}.
Contact us at {@Phone: +1 (555) 123-4567} or {@Phone: 123-456-7890} for assistance.
It is available at {@Money: €149.99} in Europe.
JSON Output
[
  {
    "text": "The deadline for submission is 2023-07-15.",
    "entities": [
      {
        "class": "Date",
        "text": "2023-07-15"
      }
    ]
  },
  {
    "text": "The event is scheduled for 07/15/2023.",
    "entities": [
      {
        "class": "Date",
        "text": "07/15/2023"
      }
    ]
  },
  {
    "text": "The conference is on July 15th, 2023.",
    "entities": [
      {
        "class": "Date",
        "text": "July 15th, 2023"
      }
    ]
  },
  {
    "text": "The meeting is scheduled for 2:30 PM. ",
    "entities": [
      {
        "class": "Time",
        "text": "2:30 PM"
      }
    ]
  },
  {
    "text": "Please arrive by 9:00 AM.",
    "entities": [
      {
        "class": "Time",
        "text": "9:00 AM"
      }
    ]
  },
  {
    "text": "The meeting starts at 14:30.",
    "entities": [
      {
        "class": "Time",
        "text": "14:30"
      }
    ]
  },
  {
    "text": "Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance.",
    "entities": [
      {
        "class": "Phone",
        "text": "+1 (555) 123-4567"
      },
      {
        "class": "Phone",
        "text": "123-456-7890"
      }
    ]
  },
  {
    "text": "It is available at €149.99 in Europe.",
    "entities": [
      {
        "class": "Money",
        "text": "€149.99"
      }
    ]
  }
]

Fields

Fields can be used to extract specific attributes of entities.

Patterns:
classes:

  Date:
    - "{#year: \d{4}}-{#month: \d{2}}-{#day: \d{2}}"
    - "{#month: \d{2}}/{#day: \d{2}}/{#year: \d{4}}"
    - "{#month: (?:January|February|March|April|May|June|July|August|September|October|November|December)}\s{#day: \d{1,2}}(?:st|nd|rd|th)?,\s{#year: \d{4}}"

  Time:
    - "{#hours: \d{1,2}}:{#minutes: \d{2}}\s?{#period: [APM]{2}}"
    - "{#hours: \d{2}}:{#minutes: \d{2}}"

  Money:
    - "{#currency: [$€£]}{#value: \d+(?:\.\d+)?}"
Structured Text Output:
The deadline for submission is {@Date: {#year: 2023}-{#month: 07}-{#day: 15}}.
The event is scheduled for {@Date: {#month: 07}/{#day: 15}/{#year: 2023}}.
The conference is on {@Date: {#month: July} {#day: 15}th, {#year: 2023}}.
The meeting is scheduled for {@Time: {#hours: 2}:{#minutes: 30} {#period: PM}}.
Please arrive by {@Time: {#hours: 9}:{#minutes: 00} {#period: AM}}.
The meeting starts at {@Time: {#hours: 14}:{#minutes: 30}}.
Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance.
It is available at {@Money: {#currency: €}{#value: 149.99}} in Europe.
JSON Output
[
  {
    "text": "The deadline for submission is 2023-07-15.",
    "entities": [
      {
        "class": "Date",
        "text": "2023-07-15",
        "entities": [
          {
            "field": "year",
            "text": "2023"
          },
          {
            "field": "month",
            "text": "07"
          },
          {
            "field": "day",
            "text": "15"
          }
        ]
      }
    ]
  },
  {
    "text": "The event is scheduled for 07/15/2023.",
    "entities": [
      {
        "class": "Date",
        "text": "07/15/2023",
        "entities": [
          {
            "field": "month",
            "text": "07"
          },
          {
            "field": "day",
            "text": "15"
          },
          {
            "field": "year",
            "text": "2023"
          }
        ]
      }
    ]
  },
  {
    "text": "The conference is on July 15th, 2023.",
    "entities": [
      {
        "class": "Date",
        "text": "July 15th, 2023",
        "entities": [
          {
            "field": "month",
            "text": "July"
          },
          {
            "field": "day",
            "text": "15"
          },
          {
            "field": "year",
            "text": "2023"
          }
        ]
      }
    ]
  },
  {
    "text": "The meeting is scheduled for 2:30 PM. ",
    "entities": [
      {
        "class": "Time",
        "text": "2:30 PM",
        "entities": [
          {
            "field": "hours",
            "text": "2"
          },
          {
            "field": "minutes",
            "text": "30"
          },
          {
            "field": "period",
            "text": "PM"
          }
        ]
      }
    ]
  },
  {
    "text": "Please arrive by 9:00 AM.",
    "entities": [
      {
        "class": "Time",
        "text": "9:00 AM",
        "entities": [
          {
            "field": "hours",
            "text": "9"
          },
          {
            "field": "minutes",
            "text": "00"
          },
          {
            "field": "period",
            "text": "AM"
          }
        ]
      }
    ]
  },
  {
    "text": "The meeting starts at 14:30.",
    "entities": [
      {
        "class": "Time",
        "text": "14:30",
        "entities": [
          {
            "field": "hours",
            "text": "14"
          },
          {
            "field": "minutes",
            "text": "30"
          }
        ]
      }
    ]
  },
  {
    "text": "Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance."
  },
  {
    "text": "It is available at €149.99 in Europe.",
    "entities": [
      {
        "class": "Money",
        "text": "€149.99",
        "entities": [
          {
            "field": "currency",
            "text": "€"
          },
          {
            "field": "value",
            "text": "149.99"
          }
        ]
      }
    ]
  }
]

Classes

Classes can use other classes in their patterns. When a class uses another class in its pattern, it will match any instance of its child classes.

Patterns:
classes:

  Number:
    Real:
      - "\d+\.\d+"
    Integer:
      - "\d+"

  Month:
    1:
      - January
    2:
      - February
    3:
      - March
    4:
      - April
    5:
      - May
    6:
      - June
    7:
      - July
    8:
      - August
    9:
      - September
    10:
      - October
    11:
      - November
    12:
      - December

  Date:
    - "{#year: {@Number.Integer}}-{#month: {@Number.Integer}}-{#day: {@Number.Integer}}"
    - "{#month: {@Number.Integer}}/{#day: {@Number.Integer}}/{#year: {@Number.Integer}}"
    - "{#month: {@Month}}\s{#day: {@Number.Integer}}(?:st|nd|rd|th)?,\s{#year: {@Number.Integer}}"

  TimePeriod:
    AM:
      - "(?-i)AM"
    PM:
      - "(?-i)PM"

  Time:
    - "{#hours: {@Number.Integer}}:{#minutes: {@Number.Integer}}\s?{#period: {@TimePeriod}}"
    - "{#hours: {@Number.Integer}}:{#minutes: {@Number.Integer}}"

  Currency:
    USD:
      - $
    EUR:
      - €
    POUND:
      - Β£

  Money:
    - "{@Currency}{@Number}"
Structured Text Output:
The deadline for submission is {@Date: {#year: {@Number.Integer: 2023}}-{#month: {@Number.Integer: 07}}-{#day: {@Number.Integer: 15}}}.
The event is scheduled for {@Date: {#month: {@Number.Integer: 07}}/{#day: {@Number.Integer: 15}}/{#year: {@Number.Integer: 2023}}}.
The conference is on {@Date: {#month: {@Month.7: July}} {#day: {@Number.Integer: 15}}th, {#year: {@Number.Integer: 2023}}}.
The meeting is scheduled for {@Time: {#hours: {@Number.Integer: 2}}:{#minutes: {@Number.Integer: 30}} {#period: {@TimePeriod.PM: PM}}}.
Please arrive by {@Time: {#hours: {@Number.Integer: 9}}:{#minutes: {@Number.Integer: 00}} {#period: {@TimePeriod.AM: AM}}}.
The meeting starts at {@Time: {#hours: {@Number.Integer: 14}}:{#minutes: {@Number.Integer: 30}}}.
Contact us at +{@Number.Integer: 1} ({@Number.Integer: 555}) {@Number.Integer: 123}-{@Number.Integer: 4567} or {@Date: {#year: {@Number.Integer: 123}}-{#month: {@Number.Integer: 456}}-{#day: {@Number.Integer: 7890}}} for assistance.
It is available at {@Money: {@Currency.EUR: €}{@Number.Real: 149.99}} in Europe.
JSON Output
[
  {
    "text": "The deadline for submission is 2023-07-15.",
    "entities": [
      {
        "class": "Date",
        "text": "2023-07-15",
        "entities": [
          {
            "field": "year",
            "class": "Number.Integer",
            "text": "2023"
          },
          {
            "field": "month",
            "class": "Number.Integer",
            "text": "07"
          },
          {
            "field": "day",
            "class": "Number.Integer",
            "text": "15"
          }
        ]
      }
    ]
  },
  {
    "text": "The event is scheduled for 07/15/2023.",
    "entities": [
      {
        "class": "Date",
        "text": "07/15/2023",
        "entities": [
          {
            "field": "month",
            "class": "Number.Integer",
            "text": "07"
          },
          {
            "field": "day",
            "class": "Number.Integer",
            "text": "15"
          },
          {
            "field": "year",
            "class": "Number.Integer",
            "text": "2023"
          }
        ]
      }
    ]
  },
  {
    "text": "The conference is on July 15th, 2023.",
    "entities": [
      {
        "class": "Date",
        "text": "July 15th, 2023",
        "entities": [
          {
            "field": "month",
            "class": "Month.7",
            "text": "July"
          },
          {
            "field": "day",
            "class": "Number.Integer",
            "text": "15"
          },
          {
            "field": "year",
            "class": "Number.Integer",
            "text": "2023"
          }
        ]
      }
    ]
  },
  {
    "text": "The meeting is scheduled for 2:30 PM. ",
    "entities": [
      {
        "class": "Time",
        "text": "2:30 PM",
        "entities": [
          {
            "field": "hours",
            "class": "Number.Integer",
            "text": "2"
          },
          {
            "field": "minutes",
            "class": "Number.Integer",
            "text": "30"
          },
          {
            "field": "period",
            "class": "TimePeriod.PM",
            "text": "PM"
          }
        ]
      }
    ]
  },
  {
    "text": "Please arrive by 9:00 AM.",
    "entities": [
      {
        "class": "Time",
        "text": "9:00 AM",
        "entities": [
          {
            "field": "hours",
            "class": "Number.Integer",
            "text": "9"
          },
          {
            "field": "minutes",
            "class": "Number.Integer",
            "text": "00"
          },
          {
            "field": "period",
            "class": "TimePeriod.AM",
            "text": "AM"
          }
        ]
      }
    ]
  },
  {
    "text": "The meeting starts at 14:30.",
    "entities": [
      {
        "class": "Time",
        "text": "14:30",
        "entities": [
          {
            "field": "hours",
            "class": "Number.Integer",
            "text": "14"
          },
          {
            "field": "minutes",
            "class": "Number.Integer",
            "text": "30"
          }
        ]
      }
    ]
  },
  {
    "text": "Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance.",
    "entities": [
      {
        "class": "Number.Integer",
        "text": "1"
      },
      {
        "class": "Number.Integer",
        "text": "555"
      },
      {
        "class": "Number.Integer",
        "text": "123"
      },
      {
        "class": "Number.Integer",
        "text": "4567"
      },
      {
        "class": "Date",
        "text": "123-456-7890",
        "entities": [
          {
            "field": "year",
            "class": "Number.Integer",
            "text": "123"
          },
          {
            "field": "month",
            "class": "Number.Integer",
            "text": "456"
          },
          {
            "field": "day",
            "class": "Number.Integer",
            "text": "7890"
          }
        ]
      }
    ]
  },
  {
    "text": "It is available at €149.99 in Europe.",
    "entities": [
      {
        "class": "Money",
        "text": "€149.99",
        "entities": [
          {
            "class": "Currency.EUR",
            "text": "€"
          },
          {
            "class": "Number.Real",
            "text": "149.99"
          }
        ]
      }
    ]
  }
]

Sample-based Patterns

Users can define classes with associated patterns and provide specific sample instances that the system will use to recognize and tag similar occurrences in the text.

Patterns:
classes:

  Number:
    Real:
      - "\d+\.\d+"
    Integer:
      - "\d+"

  Month:
    1:
      - January
    2:
      - February
    3:
      - March
    4:
      - April
    5:
      - May
    6:
      - June
    7:
      - July
    8:
      - August
    9:
      - September
    10:
      - October
    11:
      - November
    12:
      - December

  Date:
    - "~{#year: 2000}-{#month: 05}-{#day: 10}"
    - "~{#month: 07}/{#day: 12}/{#year: 2020}"
    - "~{#month: November} {#day: 1}(?:st|nd|rd|th)?, {#year: 1991}"

  TimePeriod:
    AM:
      - "(?-i)AM"
    PM:
      - "(?-i)PM"

  Time:
    - "~{#hours: 1}:{#minutes: 30}\s?{#period: AM}"
    - "~{#hours: 13}:{#minutes: 00}"

  Currency:
    USD:
      - $
    EUR:
      - €
    POUND:
      - Β£

  Money:
    - "~€1"

  MeasurementUnit:
    British:
      Foot:
        - foot
        - feet
        - ft
      Inch:
        - inch(es)?
    SI:
      Meter:
        - meters?
        - metres?
      Centimeter:
        - centimeters?
        - centimetres?

  Measurement:
    - "~1 meter"
Structured Text Output:
The deadline for submission is {@Date: {#year: {@Number.Integer: 2023}}-{#month: {@Number.Integer: 07}}-{#day: {@Number.Integer: 15}}}.
The event is scheduled for {@Date: {#month: {@Number.Integer: 07}}/{#day: {@Number.Integer: 15}}/{#year: {@Number.Integer: 2023}}}.
The conference is on {@Date: {#month: {@Month.7: July}} {#day: {@Number.Integer: 15}}th, {#year: {@Number.Integer: 2023}}}.
The meeting is scheduled for {@Time: {#hours: {@Number.Integer: 2}}:{#minutes: {@Number.Integer: 30}} {#period: {@TimePeriod.PM: PM}}}.
Please arrive by {@Time: {#hours: {@Number.Integer: 9}}:{#minutes: {@Number.Integer: 00}} {#period: {@TimePeriod.AM: AM}}}.
The meeting starts at {@Time: {#hours: {@Number.Integer: 14}}:{#minutes: {@Number.Integer: 30}}}.
Contact us at +{@Number.Integer: 1} ({@Number.Integer: 555}) {@Number.Integer: 123}-{@Number.Integer: 4567} or {@Date: {#year: {@Number.Integer: 123}}-{#month: {@Number.Integer: 456}}-{#day: {@Number.Integer: 7890}}} for assistance.
It is available at {@Money: {@Currency.EUR: €}{@Number.Real: 149.99}} in Europe.
JSON Output
[
  {
    "text": "The deadline for submission is 2023-07-15.",
    "entities": [
      {
        "class": "Date",
        "text": "2023-07-15",
        "entities": [
          {
            "field": "year",
            "class": "Number.Integer",
            "text": "2023"
          },
          {
            "field": "month",
            "class": "Number.Integer",
            "text": "07"
          },
          {
            "field": "day",
            "class": "Number.Integer",
            "text": "15"
          }
        ]
      }
    ]
  },
  {
    "text": "The event is scheduled for 07/15/2023.",
    "entities": [
      {
        "class": "Date",
        "text": "07/15/2023",
        "entities": [
          {
            "field": "month",
            "class": "Number.Integer",
            "text": "07"
          },
          {
            "field": "day",
            "class": "Number.Integer",
            "text": "15"
          },
          {
            "field": "year",
            "class": "Number.Integer",
            "text": "2023"
          }
        ]
      }
    ]
  },
  {
    "text": "The conference is on July 15th, 2023.",
    "entities": [
      {
        "class": "Date",
        "text": "July 15th, 2023",
        "entities": [
          {
            "field": "month",
            "class": "Month.7",
            "text": "July"
          },
          {
            "field": "day",
            "class": "Number.Integer",
            "text": "15"
          },
          {
            "field": "year",
            "class": "Number.Integer",
            "text": "2023"
          }
        ]
      }
    ]
  },
  {
    "text": "The meeting is scheduled for 2:30 PM. ",
    "entities": [
      {
        "class": "Time",
        "text": "2:30 PM",
        "entities": [
          {
            "field": "hours",
            "class": "Number.Integer",
            "text": "2"
          },
          {
            "field": "minutes",
            "class": "Number.Integer",
            "text": "30"
          },
          {
            "field": "period",
            "class": "TimePeriod.PM",
            "text": "PM"
          }
        ]
      }
    ]
  },
  {
    "text": "Please arrive by 9:00 AM.",
    "entities": [
      {
        "class": "Time",
        "text": "9:00 AM",
        "entities": [
          {
            "field": "hours",
            "class": "Number.Integer",
            "text": "9"
          },
          {
            "field": "minutes",
            "class": "Number.Integer",
            "text": "00"
          },
          {
            "field": "period",
            "class": "TimePeriod.AM",
            "text": "AM"
          }
        ]
      }
    ]
  },
  {
    "text": "The meeting starts at 14:30.",
    "entities": [
      {
        "class": "Time",
        "text": "14:30",
        "entities": [
          {
            "field": "hours",
            "class": "Number.Integer",
            "text": "14"
          },
          {
            "field": "minutes",
            "class": "Number.Integer",
            "text": "30"
          }
        ]
      }
    ]
  },
  {
    "text": "Contact us at +1 (555) 123-4567 or 123-456-7890 for assistance.",
    "entities": [
      {
        "class": "Number.Integer",
        "text": "1"
      },
      {
        "class": "Number.Integer",
        "text": "555"
      },
      {
        "class": "Number.Integer",
        "text": "123"
      },
      {
        "class": "Number.Integer",
        "text": "4567"
      },
      {
        "class": "Date",
        "text": "123-456-7890",
        "entities": [
          {
            "field": "year",
            "class": "Number.Integer",
            "text": "123"
          },
          {
            "field": "month",
            "class": "Number.Integer",
            "text": "456"
          },
          {
            "field": "day",
            "class": "Number.Integer",
            "text": "7890"
          }
        ]
      }
    ]
  },
  {
    "text": "It is available at €149.99 in Europe.",
    "entities": [
      {
        "class": "Money",
        "text": "€149.99",
        "entities": [
          {
            "class": "Currency.EUR",
            "text": "€"
          },
          {
            "class": "Number.Real",
            "text": "149.99"
          }
        ]
      }
    ]
  }
]

More examples on demo.cariochi.com

Last updated