In Cariochi Patterns, the syntax revolves around the use of "classes," "fields," and "patterns" to identify and extract specific information from unstructured text. Let's delve into each element:
Patterns
Regular expressions form the basis of patterns in Cariochi Patterns. These expressions are used to match specific entities within the text. Patterns can include both classes and fields to extract relevant information accurately.
Fields
Fields are specific attributes or values within a class that help define the structure of the extracted information. In Cariochi Patterns, fields are indicated by the "#" symbol followed by the field identifier. They provide a way to extract and organize different components of an entity.
For instance, consider the @Date class with the following pattern:
In this example, the pattern specifies the structure of a date with fields for the year, month, and day components. When the pattern matches the text, the year, month, and day values are extracted as separate fields within the @Date entity.
Inner Classes
Cariochi Patterns also introduces the concept of inner classes. Inner classes are classes defined within a pattern and are represented using the "@" symbol followed by the class name. They allow you to specify that text matched by the pattern should be interpreted as an instance of the inner class.
For instance, inner classes can be applied to each component of a date. An example of this is the inner class @Int, which indicates that extracted values should be treated as integers, thus enhancing the semantic interpretation of the matched data:
In this case, the inner class @Int is used with the #year, #month, and #day fields to define that the matched text should be treated as integer values.
The deadline for submission is {@Date: {#year: {@Int: 2023}}-{#month: {@Int: 07}}-{#day: {@Int: 15}}}.
JSON Output
[ {"text":"The deadline for submission is 2023-07-15.","entities": [ {"class":"Date","text":"2023-07-15","entities": [ {"field":"year","class":"Int","text":"2023" }, {"field":"month","class":"Int","text":"07" }, {"field":"day","class":"Int","text":"15" } ] } ] }]
Classes
Classes are represented by names preceded by the "@" symbol. They act as categories for different entities you want to recognize in the text. These classes can be used as building blocks within patterns to create more sophisticated and accurate data extraction rules.
The price of the product is $199.99.
It is available at β¬149.99 in Europe.
Structured Text Output
The price of the product is {@Money: {@Currency:$}{@Number: 199.99}}.Itisavailable at {@Money: {@Currency:β¬}{@Number: 149.99}}inEurope.
JSON Output
[ {"text":"The price of the product is $199.99. ","entities": [ {"class":"Money","text":"$199.99","entities": [ {"class":"Currency","text":"$" }, {"class":"Number","text":"199.99" } ] } ] }, {"text":"It is available at β¬149.99 in Europe.","entities": [ {"class":"Money","text":"β¬149.99","entities": [ {"class":"Currency","text":"β¬" }, {"class":"Number","text":"149.99" } ] } ] }]
Abstract Classes
Cariochi Patterns supports a hierarchical structure for classes, allowing you to organize and group related entities more efficiently. You can create abstract classes, which serve as parent classes without specific patterns, and child classes with their own unique patterns.
Example:
Patterns
classes:Number:# abstract class {@Number}Real:# class {@Number.Real} - "\d+\.\d+"Int:# class {@Number.Int} - "\d+"Percent: - "{@Number}%"# pattern with an abstract class
Input
The company's profit margin significantly increased from 0.1% to 3%.
Structured Text Output
The company's profit margin significantly increased from {@Percent: {@Number.Real: 0.1}%} to {@Percent: {@Number.Int: 3}%}.
Private classes are a valuable feature in Cariochi Patterns that allows users to create intermediate classes without including them in the final output. They are represented by names preceded by the "_" symbol.
Example:
Patterns
classes:Number: - "\d+(\.\d+)?"Percent: - "{@Number}%"FinIndicator: - "EPS" - "RPS" - "DPS"_action:# private class - "grew by" - "growing by" - "rose to" - "surged to" - "remained stable at"FinData: - "{@FinIndicator} {@_action} {@Percent}"# pattern with a private class
Input
EPS grew by 15% to $2.50 per share, reflecting strong financial performance.
Structured Text Output
{@FinData: {@FinIndicator:EPS}grewby {@Percent: {@Number: 15}%}} to ${@Number: 2.50}pershare,reflectingstrongfinancialperformance.
JSON Output
[ {"text":"EPS grew by 15% to $2.50 per share, reflecting strong financial performance.","entities": [ {"class":"FinData","text":"EPS grew by 15%","entities": [ {"class":"FinIndicator","text":"EPS" }, {"class":"Percent","text":"15%","entities": [ {"class":"Number","text":"15" } ] } ] }, {"class":"Number","text":"2.50" } ] }]
Sample-based Patterns
The "Sample-based Patterns" feature is a unique capability that allows users to define custom patterns using concrete examples. These patterns start with the "~" symbol and act as templates to recognize similar occurrences in the text.
By leveraging the power of classes, fields, and patterns, Cariochi Patterns enables users to efficiently extract structured data from unstructured text, making it easier to analyze and understand textual information effectively.