Skip to content

Solr Search Indexing

Flow ID: SY-06 | Module(s): job, search | Complexity: High Last Updated: 2026-04-04

Business Overview

The platform uses Apache Solr for full-text product (and blog) search with Greek language support. The AdvSolrIndex job rebuilds the Solr index on a daily schedule, pushing all active products (and blogs in v2) as structured documents. The system supports two schema versions:

  • v1: Product-only indexing with Greek-to-Latin transliteration fields for cross-script search.
  • v2: Products plus blog articles, ICU tokenizer, custom Greek-Latin filter plugin (GreekLatinTokenFilterFactory), n-gram suggestion fields, and a document modification stamp for safe atomic updates.

Schema creation is handled separately via a CLI command (php cli.php solr/createSolrSchema).

Architecture

AdvSolrIndex (job, daily)
  |
  +--> Check solrSearch config enabled
  +--> Dispatch to version handler:
        |
        +--> v1: solr_model::indexData()
        |     +--> productsToIndex()           SQL join query
        |     +--> formatData()                document transform
        |     +--> solr_client::delete('*:*')  wipe entire core
        |     +--> solr_client::update()       push all documents
        |
        +--> v2: solr_model_v2::indexData()
              +--> productsToIndex()           product SQL query
              +--> formatProductData()         product document transform
              +--> blogsToIndex()              blog SQL query
              +--> formatBlogData()            blog document transform
              +--> assignModificationStamp()   tag all docs with hash
              +--> solr_client::update()       push all documents (upsert)
              +--> solr_client::delete()       remove stale docs by stamp

Key Files

FileRole
ecommercen/job/libraries/AdvSolrIndex.phpJob implementation
application/modules/job/libraries/SolrIndex.phpClient-overridable subclass
ecommercen/search/models/Adv_solr_model.phpv1 indexing model
ecommercen/search/models/Adv_solr_model_v2.phpv2 indexing model (products + blogs)
ecommercen/libraries/AdvSolrClient.phpHTTP client for Solr REST API
ecommercen/search/controllers/Adv_solr.phpSchema creation CLI controller
ecommercen/search/traits/WithSolrDocumentTypeTrait.phpDocument ID generation/extraction
ecommercen/libraries/SolrDocType.phpDocument type enum (Product, Blog)
application/config/app.phpSolr connection configuration

Code Flow

v1 Indexing (Adv_solr_model::indexData)

  1. Collect products via productsToIndex():

    • Joins shop_product with MUI tables, vendor MUI, barcodes, product codes, and category MUI.
    • Filters: soft_delete = false, active = true, price > 0.
    • Aggregates multi-language names, descriptions, barcodes, product codes, and category slugs using GROUP_CONCAT with custom separators.
    • Enriches with sales count from products_in_cart_model->getProductsWithNumberOfSales().
  2. Format documents via formatData():

    • Each product becomes a Solr document with fields: id, soft_delete, active, price, vendor_id, vendor_name, product_hits, product_sales, barcode[], product_code[], product_category[], product_name[], product_description[].
    • Multi-valued fields are exploded from concatenated strings.
  3. Full re-index:

    • solr_client->delete('*:*') -- wipe all existing documents.
    • solr_client->update(formattedProducts) -- push all documents with softCommit=true.

v2 Indexing (Adv_solr_model_v2::indexData)

  1. Generate modification stamp: md5(time()) -- a unique hash for this indexing run.

  2. Collect and format products (same SQL as v1, plus category_slugs and product_category_slug field).

    • Document IDs are prefixed with type: product:{id} via generateID(SolrDocType::Product, id).
    • Includes doc_type: 'product' field.
  3. Collect and format blogs via blogsToIndex():

    • Joins blog with blog_mui for multi-language titles, descriptions.
    • Document IDs: blog:{id}.
    • Fields: blog_title[], blog_small_description[], blog_description[], blog_date, blog_hits.
  4. Stamp all documents with doc_modification_stamp.

  5. Atomic update:

    • solr_client->update(products + blogs) -- upsert all documents.
    • solr_client->delete("-doc_modification_stamp:{stamp}") -- delete any document NOT stamped with the current run (removes stale entries without a full wipe).

Schema Creation (CLI)

The schema is created/updated via a separate CLI command, not during indexing:

bash
php cli.php solr/createSolrSchema

This calls solr_model->createSchema() (v1) or solr_model_v2->createSchema() (v2), which POSTs the schema definition to Solr's Schema API.

The controller at ecommercen/search/controllers/Adv_solr.php is CLI-restricted (is_cli() || isAdvisableUser()).

Data Model

Solr Document Fields (v1)

FieldSolr TypeMulti-valuedSource
idstring (auto)Noshop_product.id
soft_deletebooleanNoshop_product.soft_delete
activebooleanNoshop_product.active
pricepfloatNoshop_product.price
product_nametext_elYesshop_product_mui.name (copy target)
product_name_originaltext_elNoOriginal name
product_name_greektolatintext_generalNoTransliterated name
product_name_latintogreektext_elNoReverse transliteration
product_descriptiontext_elYesshop_product_mui.description (copy target)
vendor_idstringNoshop_product.vendor_id
vendor_nametext_generalNoshop_vendor_mui.name
product_hitspintNoshop_product.hits
product_salespintNoAggregated from tmp_shop_order_basket
barcodetext_en_splitting_tightYesshop_product_barcodes.barcode
product_codetext_en_splitting_tightYesproduct_codes.product_code
product_categorytext_elYesshop_product_category_mui.slug
_text_text_elYesCopy-field aggregate for full-text search

Additional v2 Fields

FieldSolr TypeMulti-valuedSource
doc_typestringNoproduct or blog
doc_modification_stampstringNoMD5 hash per indexing run
product_category_slugtext_elYesCategory slugs
suggestionstext_el_suggestYesCopy-field for autocomplete
blog_titletext_elYesblog_mui.title
blog_small_descriptiontext_elYesblog_mui.small_description
blog_descriptiontext_elYesblog_mui.description
blog_datepdateNoblog.blog_date
blog_hitspintNoblog.hits

v2 Custom Field Types

Type NamePurpose
text_elICU tokenizer + Greek lowercase + Greek-Latin filter + Greek stemmer
text_el_suggestN-gram tokenizer (3-15) for autocomplete suggestions
text_product_codeWhitespace tokenizer + pattern replace + word delimiter + n-gram for barcode/SKU search

v2 Copy Fields

SourceDestinationPurpose
vendor_name, product_name, product_category, blog_titlesuggestionsAutocomplete
vendor_name, product_name, product_description, product_category, product_category_slug, blog_title, blog_small_description, blog_description_text_Full-text search

Source MySQL Tables

TableContent
shop_productProducts (filtered: active, not deleted, price > 0)
shop_product_muiProduct multi-language names and descriptions
shop_vendor_muiVendor names
shop_product_barcodesProduct barcodes
product_codesProduct SKU codes
shop_product_category_lpProduct-to-category relationships
shop_product_category_muiCategory slugs and names
blogBlog articles (v2 only)
blog_muiBlog multi-language content (v2 only)
tmp_shop_order_basketProduct sales aggregation

Configuration

Job Scheduling (application/config/jobs.php)

php
['command' => 'SolrIndex', 'schedule' => '30 5 * * *', 'graceTime' => 300, 'retryTimes' => 3]

Runs daily at 05:30 in the core queue.

Solr Connection (application/config/app.php)

php
$config['solrSearch'] = [
    'enabled'         => env('APP_SOLR_ENABLED', false),
    'protocol'        => env('APP_SOLR_PROTOCOL', 'http'),
    'timeout'         => env('APP_SOLR_TIMEOUT', 5),
    'connect_timeout' => env('APP_SOLR_CONNECT_TIMEOUT', 5),
    'host'            => env('APP_SOLR_HOST', 'localhost'),
    'port'            => env('APP_SOLR_PORT', '8983'),
    'core'            => env('APP_SOLR_CORE', 'eshop'),
    'auth' => [
        'enabled' => env('APP_SOLR_AUTH_ENABLED', false),
        'user'    => env('APP_SOLR_AUTH_USER', 'user'),
        'pass'    => env('APP_SOLR_AUTH_PASS', 'pass')
    ],
    'version' => 'v2'
];

Environment Variables

VariableDefaultDescription
APP_SOLR_ENABLEDfalseMaster toggle for Solr integration
APP_SOLR_PROTOCOLhttpConnection protocol
APP_SOLR_HOSTlocalhostSolr server hostname
APP_SOLR_PORT8983Solr server port
APP_SOLR_COREeshopSolr core name
APP_SOLR_TIMEOUT5HTTP request timeout (seconds)
APP_SOLR_CONNECT_TIMEOUT5HTTP connection timeout (seconds)
APP_SOLR_AUTH_ENABLEDfalseEnable Basic auth
APP_SOLR_AUTH_USERuserBasic auth username
APP_SOLR_AUTH_PASSpassBasic auth password

Solr API Endpoints Used

OperationEndpoint
Schema updateapi/cores/{core}/schema
Delete documentssolr/{core}/update?commit=true
Index documentssolr/{core}/update?softCommit=true
Core statussolr/admin/cores?action=STATUS&core={core}
Searchsolr/{core}/select

Client Extension Points

  1. Override the job class: Extend AdvSolrIndex in application/modules/job/libraries/SolrIndex.php to add custom document types or modify the indexing strategy.

  2. Override the Solr model: Extend Adv_solr_model or Adv_solr_model_v2 in application/modules/search/models/ to:

    • Add custom fields to the schema
    • Modify productsToIndex() to include additional product attributes
    • Add new document type methods (e.g., events, landing pages)
    • Change the formatData() / formatProductData() / formatBlogData() document structure
  3. Override the schema: Pass a custom schema array to createSchema($overrideSchema) from a controller override.

  4. Override the Solr client: Extend AdvSolrClient to add custom HTTP middleware, logging, or error handling.

  5. Switch schema versions: Change 'version' => 'v1' in app.php to use the simpler v1 schema (products only, no blogs, no modification stamp).

Business Rules

RuleDescription
Feature gatedJob exits immediately if solrSearch.enabled is not true
Full re-index (v1)v1 wipes the entire core before re-indexing (brief search downtime)
Atomic update (v2)v2 uses modification stamp to remove stale docs without full wipe
Active products onlyOnly products with soft_delete=false, active=true, price>0 are indexed
Multi-language supportAll language variants are indexed as multi-valued fields
Greek-Latin cross-scriptv1 uses explicit transliteration fields; v2 uses the GreekLatinTokenFilterFactory plugin
Autocomplete (v2)Copy fields feed n-gram suggestions for typeahead search
Blog indexing (v2)Blog articles are indexed alongside products with doc_type discrimination
Sales rankingProduct sales counts are enriched from the order basket analytics table
Schema managed separatelySchema creation is a manual CLI step, not part of the indexing job

Wiki Guide: Solr Setup Guide -- full Solr installation, configuration, and schema management guide