Overview
EXPRESSO
Expresso: Object-Oriented Regular Expressions
Cariochi Expresso is a Java library that extends traditional regular expressions with object-oriented features, enabling hierarchical, modular, and reusable patterns. With Expresso, you can define and combine named patterns in YAML, apply them to text, and generate structured outputs like JSON. It’s perfect for parsing structured data, log files, or domain-specific texts while keeping your patterns maintainable and modular.
Defining Patterns in Expresso
Expresso provides a flexible and intuitive way to define reusable regular expression patterns using a YAML format. Patterns are organized into classes to enable object-oriented capabilities like inheritance and modularity. There are two primary notations for defining classes in Expresso: Verbose and Compact.
Verbose Notation
The verbose notation is explicit and provides additional clarity for defining classes. Each class specifies its name, an optional parent class, and a list of patterns. This format is suitable when detailed documentation or advanced configurations are required.
Syntax:
Example:
Compact Notation
The compact notation is a more concise way to define patterns, grouping child classes under their parent class. This format is ideal for simpler use cases or when defining multiple related classes in a streamlined manner.
Syntax:
Example:
Choosing the Right Notation
Use Verbose Notation when you need to provide additional details for each class or work with complex class hierarchies.
Use Compact Notation for brevity when defining simpler patterns or related classes.
Both notations offer the same functionality and can be used interchangeably within your YAML configuration, allowing you to choose the format that best suits your needs.
Advanced features
1. Expresso Captures Only Named Groups
Expresso differs from traditional regular expressions by focusing solely on named groups. This approach aligns with its object-oriented design, treating named groups as fields of a class. Any unnamed groups present in the pattern are ignored during matching, which ensures that only meaningful, structured data is captured.
Example:
For the input: 2023-11-13
the output will include:
2. Flexible Naming for Groups
Expresso allows unparalleled flexibility in naming groups, enabling the use of special characters such as #
, @
, or even .
as a separator to define a hierarchy. This capability extends regular expression usability, making it easier to define meaningful and hierarchical structures.
With this pattern:
• The group <Numbers.Digits.#3>
will match within both matcher.within("Numbers")
and matcher.within("Digits")
, as well as specifically matcher.within("#3")
.
3. Support for Repeated Group Names
Expresso enables the use of repeated group names within a single pattern, allowing complex data structures to be captured with clarity and precision.
Example Output for Patterns
Expresso can generate structured outputs that visualize the hierarchical structure of matched groups in a regex-like format. This output format compactly represents the matched groups and their nested structure, making it ideal for documentation and debugging.
For the provided examples, the output demonstrates how patterns defined in the YAML configuration match input strings and organize them into named groups.
Input: "2023-11-13"
Input: "11/13/2023"
This regex-like output format clearly shows:
The top-level group corresponding to the class name (e.g.,
IsoDate
,UsDate
).The nested subgroups and their matched values (e.g.,
year
,month
,day
).
Why Expresso is Object-Oriented
Expresso’s approach to regular expressions is inherently object-oriented, allowing you to structure patterns into classes, define inheritance, and reuse patterns modularly. Here’s how Expresso leverages object-oriented principles:
1. Classes Can Be Reused in Other Patterns
Classes in Expresso can be referenced and reused within other patterns, making your regexes modular and maintainable. This enables the creation of complex patterns by combining simpler ones.
Explanation:
Here, the LogEntry
class references the IsoDate
class with ${IsoDate}
. Any text matching the IsoDate
pattern will be embedded into the LogEntry
pattern, allowing for structured and modular regex composition.
2. Parent Classes Include All Descendant Classes
When you use a parent class (e.g., ${Date}
), it automatically includes all its descendant classes (e.g., IsoDate
, UsDate
, etc.). This inheritance-based approach ensures flexibility and avoids redundant definitions.
Explanation:
In this example, the LogEntry
class references ${Date}
. Since Date
is the parent class of IsoDate
and UsDate
, both formats (2023-11-13
and 11/13/2023
) will be matched within the LogEntry
pattern.
3. Leveraging Class Hierarchies in Java Matcher API
Expresso’s Java Matcher API seamlessly integrates with the class hierarchy defined in your patterns, enabling flexible and precise matching. By using the .within()
method, you can focus on a specific class (or group) and include all its descendants in the search, ensuring powerful and intuitive handling of hierarchical patterns.
Example: Matching All Date Variants
Key Benefits of the Object-Oriented Approach:
Reusability: Patterns like
${IsoDate}
can be reused across multiple templates, reducing redundancy and improving consistency.Hierarchy: Using a parent class (e.g.,
${Date}
) ensures that all child patterns are automatically included, simplifying pattern management.Modularity: You can break down complex patterns into smaller, maintainable components (classes) and compose them as needed.
Readability: The use of named classes makes your patterns more understandable and easier to maintain over time.
This object-oriented design sets Expresso apart, allowing you to manage regex patterns with the same principles as object-oriented programming.
Documentation is under construction, but in the meantime, you can explore the library features using the interactive demo.
Last updated