Defining Patterns

EXPRESSO

Expresso patterns are designed with modularity, readability, and reusability in mind. Unlike traditional regex engines, Expresso organizes patterns in a class-based structure using YAML format. This structure allows you to create, extend, and reuse patterns like objects in an object-oriented programming language.

Expresso supports two formats for defining patterns: Verbose Notation and Compact Notation. Each format is suited for different use cases depending on the complexity and size of your pattern definitions.

Verbose Notation

Verbose notation provides an explicit and detailed way to define patterns, making it ideal for more complex structures or when documentation clarity is important. Each class includes its name, optional parent class, and a list of associated patterns.

Syntax:

- class: <class name>
  extends: <parent class name>
  patterns: [ <list of patterns> ]

Example:

- class: IsoDate
  extends: Date
  patterns: [ '(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})' ]
  
- class: UsDate
  extends: Date
  patterns: [ '(?<month>\d{2})/(?<day>\d{2})/(?<year>\d{4})' ]

Compact Notation

Compact notation offers a concise way to define patterns, making it suitable for smaller or simpler pattern structures. Patterns are defined directly under their parent classes without additional fields.

Syntax:

<parent class name>:
  <class name>: [ <list of patterns> ]

Example:

Date:
  IsoDate: [ '(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})' ]
  UsDate: [ '(?<month>\d{2})/(?<day>\d{2})/(?<year>\d{4})' ]

Choosing the Right Notation

The choice between verbose and compact notation depends on your project’s needs:

  • Verbose Notation is ideal for complex patterns, as it provides a clear and structured way to define details such as inheritance and multiple patterns.

  • Compact Notation is best for quick and straightforward definitions, where clarity and simplicity are prioritized.

Expresso allows you to mix and match both notations within the same project, enabling flexibility based on the complexity of each pattern.

Using Classes in Patterns

One of the unique and powerful features of Expresso is the ability to reference defined classes directly within other patterns. This feature allows for modular and reusable pattern definitions by embedding class names into templates using the ${ClassName} syntax.

This approach simplifies complex pattern creation by breaking them into smaller, reusable components.

Syntax

When defining a pattern, you can reference an existing class by enclosing its name in ${...}. The referenced class’s pattern will be expanded in place, maintaining the hierarchy and reusability of your design.

Example

Number:
  Integer: [ '\d+' ]
  Decimal: [ '${Integer}\.${Integer}' ]

Measurement:
  Weight: [ '${Decimal}\s*(kg|lbs)' ]
  Height: [ '${Decimal}\s*(cm|in)' ]

In this example:

  • The Decimal pattern reuses the Integer pattern to define decimal numbers.

  • The Weight and Height patterns reuse the Decimal class to define units of measurement like kilograms (kg) or centimeters (cm).

Real-World Use Case

For instance, if you need to define monetary values and reuse numeric formats:

Currency:
  USD: [ '\$${Decimal}' ]
  EUR: [ '€${Decimal}' ]

Price:
  Range: [ '${Currency}\s*-\s*${Currency}' ]

Here:

  • USD and EUR reuse the Decimal class to represent amounts in dollars and euros.

  • Range uses Currency to define a price range, like $10.99 - $20.49.

This powerful feature allows Expresso users to build scalable and maintainable solutions for even the most complex text-parsing requirements.

Defining Pattern Hierarchies

Expresso’s true strength lies in its ability to use classes and inheritance for pattern definitions. By defining patterns in a hierarchy, you can build modular, reusable, and scalable solutions. This approach simplifies pattern management, as changes in base classes automatically propagate to derived patterns.

Example

Number:
  Integer: [ '\d+' ]
  Decimal: [ '${?Integer}\.${?Integer}' ]

Currency:
  USD: [ '\$' ]
  EUR: [ '€' ]
  GBP: [ '£' ]

Money: [ '${Currency}${Number}' ]

Percent: [ '${Number}%' ]

In this example:

  • The Number class defines reusable numeric patterns:

    • Integer matches whole numbers (123).

    • Decimal reuses the Integer class to match decimal numbers (123.45).

  • The Currency class defines various currency symbols: $, €, and £.

  • The Money pattern combines Currency and Number, allowing for monetary expressions like $123.45 or €456.

  • The Percent pattern combines Number with % to match percentages, e.g., 25%.

Advantages

  1. Modularity: Individual patterns are small and focused, making them easier to manage and extend.

  2. Reusability: Base classes like Number and Currency can be reused across multiple higher-level patterns.

  3. Flexibility: Patterns can adapt to various formats by simply updating used classes.

  4. Maintainability: Updates to base patterns automatically reflect in all dependent patterns.

With these powerful notations, you can design robust, maintainable, and modular patterns that are easily extendable and adaptable to different use cases.

Last updated