Skip to content

Content Embeddings / Smart Search

Flow ID: IN-16 | Module(s): src/ContentEmbeddings/ | Complexity: Medium

Business Overview

Content Embeddings is an internal Ecommercen feature that enables rich text content (blog posts, page builder blocks, CMS pages) to embed live product displays and product list sliders directly within their HTML content. Editors insert special reference tokens into rich text fields (via TinyMCE), and the system extracts, resolves, and renders those references into interactive product components at render time.

This is not AI-based search or vector embeddings -- it is a regex-based content reference extraction and hydration system that replaces shortcodes with rendered product HTML.

Reference Token Format

[[ref_product:42]]           -> Single product display
[[ref_product:42,55,88]]     -> Multiple products (slider/grid)
[[ref_product_list:7]]       -> Product list (dynamic collection)
[[ref_product_list:7,12]]    -> Multiple product lists

Pattern: [[ref_{type}:{id(s)}]] where IDs are comma-separated integers.

API Reference

Extraction API

ClassMethodParametersReturns
ContentEmbeddingsExtractorextract($content)string (HTML content)ContentEmbeddings object
ContentEmbeddingsExtractorgetPattern($type, $value)?string, mixedRegex pattern string

ContentEmbeddings Value Object

MethodParametersReturnsDescription
addEmbedding($type, $idOrIds)string, stringvoidAdds parsed embedding (type-validated)
getEmbeddingsOfType($type, $unique)string, boolarrayGets embeddings by type
isEmpty()--boolTrue if no embeddings found
isEmbeddingTypeAvailable($type)stringboolValidates embedding type

Hydration API

ClassMethodParametersReturns
ContentEmbeddingsHydratorhydrate($content, $options, $extra)string, array, arrayHydrated HTML string
ProductHydratorhydrate($content, $embeddings, $options, $extra)string, ContentEmbeddings, array, arrayContent with products rendered
ProductListHydratorhydrate($content, $embeddings, $options, $extra)string, ContentEmbeddings, array, arrayContent with product lists rendered

Supported Embedding Types

ConstantValueDescription
EMBEDDING_TYPE_PRODUCTproductIndividual product reference(s)
EMBEDDING_TYPE_PRODUCT_LISTproduct_listProduct list/collection reference(s)

Code Flow

Extraction Phase

ContentEmbeddingsExtractor::extract($content)
  -> mb_ereg_search_init($content, regex_pattern)
  -> Pattern: \[\[ref_([a-zA-Z][a-zA-Z_-]+[a-zA-Z]):(\d+(,\d+)*)\]\]
  -> For each match:
     -> $type = match[1]  (e.g., "product", "product_list")
     -> $ids = match[2]   (e.g., "42" or "42,55,88")
     -> ContentEmbeddings::addEmbedding($type, $ids)
        -> Validates type against whitelist (product, product_list)
        -> Splits IDs by comma, converts to int array
        -> Groups by type: embeddings[type][] = [id, id, ...]
  -> Returns ContentEmbeddings object

Hydration Phase

ContentEmbeddingsHydrator::hydrate($content, $options, $extraData)
  -> Extract embeddings from content
  -> If empty, return content unchanged
  -> For each registered hydrator type:
     -> ProductHydrator or ProductListHydrator
     -> hydrator->hydrate($content, $embeddings, $options, $extraData)
        -> Extract type-specific embeddings from content
        -> For each embedding set:
           -> Look up cached data from $options[type]
           -> Build HydratedContentData (bidirectional state container)
           -> Render view template via CI loader
           -> Replace regex pattern in content with rendered HTML
        -> Strip any unresolved embeddings of this type
     -> Return modified content

Product Hydration Detail

ProductHydrator::hydrate(...)
  -> extractEmbeddings($content)  (re-extracts from current content state)
  -> For each product embedding set (e.g., [42, 55]):
     -> Look up products in $options['product'] (pre-cached indexed data)
     -> Create HydratedContentData with:
        - productDataSet, liveData, productCodes, productCodeImages
        - productIds, allPageProducts reference
     -> Render view: {template}/components/content_embeddings/product
     -> Merge rendered product data back into allPageProducts
     -> Replace [[ref_product:42,55]] with rendered HTML
  -> Strip any remaining unresolved product embeddings

Product List Hydration Detail

ProductListHydrator::hydrate(...)
  -> extractEmbeddings($content)
  -> For each product list embedding set (e.g., [7]):
     -> Look up product list data in $options['product_list']
     -> Extract product IDs from each list's products
     -> Create HydratedContentData with:
        - productListDataSet, productIds by list
        - liveData, productCodes, productCodeImages, allPageProducts
     -> Render view: {template}/components/content_embeddings/product_list
     -> Replace [[ref_product_list:7]] with rendered HTML
  -> Strip remaining unresolved product_list embeddings

Architecture

src/ContentEmbeddings/
  Extraction/
    ContentEmbeddingsExtractor.php   # Regex-based token extraction
    ContentEmbeddings.php            # Value object holding parsed embeddings

  Hydration/
    ContentEmbeddingsHydrator.php    # Orchestrator: extraction + hydrator dispatch
    AbstractHydrator.php             # Base class: pattern matching, replacement, cleanup
    HydratorInterface.php           # Contract for hydrators
    ProductHydrator.php              # Renders product embeddings
    ProductListHydrator.php          # Renders product list embeddings
    HydratorData.php                 # Shared state container (liveData, productCodes, template, CI instance)
    HydratedContentData.php         # Bidirectional state for view rendering

application/views/main/components/content_embeddings/
    product.php                      # Product embedding view template
    product_list.php                 # Product list embedding view template

assets/main/scss/components/
    _content_embeddings.scss         # Styling for embedded components

tests/Unit/ContentEmbeddings/Extraction/
    ContentEmbeddingsExtractorTest.php  # Extraction unit tests
    ContentEmbeddingsTest.php           # Value object unit tests

Class Hierarchy

HydratorInterface
  <- AbstractHydrator (base: pattern matching, replacement, cleanup)
       <- ProductHydrator (template: content_embeddings/product)
       <- ProductListHydrator (template: content_embeddings/product_list)

Key Design Decisions

  1. Two-phase architecture: Extraction and hydration are separated. Extraction is pure regex (no DB calls), while hydration requires product data lookups and template rendering.
  2. Pre-cached data: Product and product list data is passed via $options parameter, meaning the calling code (typically the front controller) pre-fetches all needed data in a single batch query.
  3. Bidirectional state: HydratedContentData allows views to modify state (especially allPageProducts) that flows back to the hydrator, enabling product tracking across multiple embedded components on the same page.
  4. Template-aware: Views are resolved via the active template's component directory, supporting per-theme rendering.
  5. Graceful degradation: Unresolved embeddings (referencing deleted products or invalid IDs) are stripped from the content rather than displayed as raw tokens.

Data Model

No Dedicated Tables

Content embeddings do not have their own database tables. The reference tokens live inline within existing content fields (e.g., blog_mui.description, builder block content). Product and product list data is loaded from the standard shop_product and product list tables.

Integration Points

Content SourceWhere Tokens Appear
Blog postsblog_mui.description (rich text)
Page builder blocksBlock content fields
CMS pagesRich text content areas

Configuration

TinyMCE Integration

The TinyMCE rich text editor (application/views/admin/utils/tinyMCE.php) includes a custom button/plugin for inserting content embedding tokens. Administrators select products or product lists from a picker, and the editor inserts the [[ref_product:...]] or [[ref_product_list:...]] token.

Template Views

Each storefront template provides its own component views:

  • {template}/components/content_embeddings/product.php
  • {template}/components/content_embeddings/product_list.php

PurgeCSS

The _content_embeddings.scss styles are included in the main PurgeCSS profile (build/purgecss_profiles/main/_commons.js) to ensure styles are not stripped during production builds.

Client Extension Points

Custom Embedding Types

The system currently supports two types (product, product_list). To add a custom type:

  1. Add the type constant to ContentEmbeddings::isEmbeddingTypeAvailable()
  2. Create a hydrator class extending AbstractHydrator
  3. Register the hydrator in ContentEmbeddingsHydrator::setupHydrators()
  4. Create the view template in the storefront template directory

In a client repo, this would require overriding ContentEmbeddingsHydrator via DI or creating a Custom\ContentEmbeddings\ namespace.

Custom Templates

Override the product/product list view templates in the client's storefront template to customize the rendering of embedded products (layout, styling, additional data).

Custom Data Resolution

The calling code controls what data is passed in $options. Client repos can modify the front controller to include additional pre-fetched data (e.g., inventory status, promotional flags) that views can then access.

Business Rules

  1. Type whitelist: Only product and product_list types are accepted. Unknown types (e.g., [[ref_banner:5]]) are silently ignored during extraction.
  2. ID validation: IDs must be positive integers. Non-numeric values in the ID position will not match the regex pattern.
  3. Multiple IDs: A single token can reference multiple items (comma-separated). For products, this typically renders a slider or grid. For product lists, multiple lists are rendered sequentially.
  4. Graceful cleanup: After hydration, any remaining unresolved tokens of processed types are stripped from the output. This handles cases where referenced products have been deleted or deactivated.
  5. Page product tracking: All embedded products are tracked in allPageProducts for analytics purposes (e.g., product impression tracking). This accumulates across all embedding instances on a page.
  6. Rendering order: Product embeddings are processed before product list embeddings (based on hydrator registration order in setupHydrators()).
  7. Template resolution: View paths are resolved dynamically based on the active storefront template, allowing different visual presentations per theme.