Skip to content

Customer Data Jobs (GDPR / Sanitize / Tags / Audience)

Flow ID: SY-09 | Module(s): job, eshop, audience | Complexity: High

Business Overview

Three automated jobs manage customer data lifecycle and segmentation:

  • AdvSanitizeGuestCustomers (GDPR compliance): Replaces personally identifiable information for inactive guest accounts with UUIDs after a configurable interval, ensuring compliance with data protection regulations.
  • AdvAddTagsToCustomers: Generates behavioral and demographic tags from order history, enabling customer segmentation for marketing and audience targeting.
  • AdvAddCustomersToAudience: Orchestrates audience membership refresh by spawning staggered sub-jobs that evaluate each audience's criteria against customer tags.

Together, these jobs form the data pipeline for customer personalization: sanitization removes stale data, tagging classifies active customers, and audience assignment groups them for campaigns.

Architecture

AdvSanitizeGuestCustomers         AdvAddTagsToCustomers         AdvAddCustomersToAudience
         |                                |                              |
         v                                v                              v
  customer_model                 customer_tag_model              audience_model
  getCustomersToAutoSanitize()   saveCustomersTags()             getRecords()
         |                                |                              |
         v                                v                              v
  sanitize_model                 8 tag generation methods         For each audience:
  sanitizeAll()                  (INSERT IGNORE / REPLACE)        create AddCustomersTo
         |                                                        SpecificAudience job
         v                                                        (5-min stagger)
  6 cascade sanitizations:
  userSanitize()
  orderSanitize()
  wishListSanitize()
  customersTagsSanitize()
  reviewsSanitize()
  waitingListSanitize()

Key Components

ComponentPathRole
AdvSanitizeGuestCustomersecommercen/job/libraries/AdvSanitizeGuestCustomers.phpJob: GDPR guest data sanitization
AdvAddTagsToCustomersecommercen/job/libraries/AdvAddTagsToCustomers.phpJob: daily customer tag generation
AdvAddCustomersToAudienceecommercen/job/libraries/AdvAddCustomersToAudience.phpJob: orchestrates audience refresh
AdvAddCustomersToSpecificAudienceecommercen/job/libraries/AdvAddCustomersToSpecificAudience.phpSub-job: evaluates single audience criteria
Adv_sanitize_modelecommercen/eshop/models/Adv_sanitize_model.phpModel: PII replacement and cascade deletion
AdvCustomerTagModelecommercen/audience/models/AdvCustomerTagModel.phpModel: tag computation and storage
Adv_customer_modelecommercen/eshop/models/Adv_customer_model.phpModel: customer queries for sanitization targets
Client stubsapplication/modules/job/libraries/SanitizeGuestCustomers.php, etc.Empty extension points

Code Flow

AdvSanitizeGuestCustomers (GDPR)

  1. Entry: Reads SANITIZE_INTERVAL from registry (ESHOP group). If zero or negative, exits immediately.
  2. Target selection: customer_model->getCustomersToAutoSanitize($interval) queries shop_customer for:
    • is_guest = true
    • is_sanitized = false
    • date_registered < (now - $interval months) (unix timestamp comparison)
  3. Cascade sanitization via sanitize_model->sanitizeAll($userIds):
    • userSanitize: For each customer, replaces name, surname, email, address, city, phone numbers, company details with UUIDs. Sets birthdate = null, active_token = null, has_access = 0, is_sanitized = 1. Email becomes {uuid}@invalid.email.
    • orderSanitize: For each order belonging to the customer, replaces pricing/shipping names, addresses, phones, company details, AFM/DOY with UUIDs. Nullifies customer/courier comments.
    • wishListSanitize: Deletes all wishlist records for the customer IDs.
    • customersTagsSanitize: Deletes records from shop_customer_tag and shop_customer_campaign for the customer IDs.
    • reviewsSanitize: Deletes records from shop_customer_reviews matching the original email addresses (collected before email was replaced).
    • waitingListSanitize: Replaces email addresses in shop_waiting_list with UUID-based invalid emails.

AdvAddTagsToCustomers

  1. Entry: Computes fromDate as yesterday's date (-1 days).
  2. Delegation: Calls customer_tag_model->saveCustomersTags($fromDate).
  3. Tag generation (8 types, each gated by CUSTOMER_TAGS_GROUPS constant):
Tag TypeGroup IDSQL StrategySource Data
Category2INSERT IGNOREOrder baskets -> product categories
Vendor1INSERT IGNOREOrder baskets -> product vendors
Registered3INSERT IGNORECustomer is_guest=0 status
Last Referral4DELETE + insert_batchLatest order's skroutz_referer
Country7REPLACE INTOOrder billing country (alpha_2)
County8REPLACE INTOOrder billing county
Turnover5REPLACE INTOSUM of total_vat across fulfilled orders
Avg Cart6REPLACE INTOAVG of total_vat across fulfilled orders
  1. Filtering: All queries exclude guest customers (is_guest=0) and sanitized customers (is_sanitized=0).
  2. Date filtering: When fromDate is provided, only processes orders/registrations from that date forward (except Turnover and Avg Cart which always recalculate globally).
  3. Idempotency: INSERT IGNORE prevents duplicate tags for Category/Vendor/Registered. REPLACE INTO overwrites for Country/County/Turnover/Avg Cart. Last Referral explicitly deletes old records before inserting.

AdvAddCustomersToAudience

  1. Entry: Fetches all audience records via audience_model->getRecords().
  2. Job creation: For each audience, creates an AddCustomersToSpecificAudience job with:
    • Queue: personalization
    • Status: JOB_NOT_STARTED
    • Run time: staggered at 5-minute intervals from current time
    • Grace time: 60 seconds
    • Retries: 1
    • Arguments: {"audience": <audience_id>}
  3. Sub-job execution: AdvAddCustomersToSpecificAudience validates the audience ID and calls audience_model->saveCustomersToAudience($audienceId).

Data Model

Tables

TableRole
shop_customerCustomer records; is_guest, is_sanitized, date_registered fields control sanitization
shop_orderOrder records; sanitized fields include pricing/shipping PII
wishlistDeleted entirely for sanitized customers
shop_customer_tagCustomer-tag associations (customer_id, tag_id, group_id, custom_value)
shop_customer_campaignCampaign tracking (prevents duplicate birthday/points emails)
shop_customer_reviewsReview action tracking; deleted for sanitized email addresses
shop_waiting_listWaiting list entries; email replaced for sanitized customers
audienceAudience definitions
audience_criteriaCriteria rules for audience membership
shop_customer_audienceCustomer-audience membership
shop_order_basketOrder line items; joined for tag computation
product_codesProduct codes; joined for tag computation
shop_product_category_lpProduct-category links; joined for category tags
countryCountry reference table; validates country codes
countyCounty reference table; validates county codes

Tag Group IDs (CUSTOMER_TAGS_GROUPS constant)

GroupIDtag_id Meaningcustom_value Meaning
VENDOR1vendor_idnull
CATEGORY2category_idnull
REGISTERED31 (registered) / 0 (guest)null
LAST.REFERRAL4Referrer channel ID (0 = none)null
TURNOVER50Total order value (decimal)
AVG.CART60Average cart value (decimal)
COUNTRY70Country alpha_2 code
COUNTY80County alpha code

Configuration

Registry Settings

GroupKeyDescription
ESHOPSANITIZE_INTERVALMonths after which guest accounts are sanitized (0 = disabled)

Application Constants

ConstantPathDescription
CUSTOMER_TAGS_GROUPSapplication/config/constants.phpMaps tag type names to group IDs
CUSTOMER_CAMPAIGN_TYPEapplication/config/constants.phpMaps campaign types: POINTS_REMAINING=1, BIRTHDAY=2

Job Options

JobOptionTypeRequiredDescription
AdvSanitizeGuestCustomers(none)----Interval from registry
AdvAddTagsToCustomers(none)----Uses yesterday as fromDate
AdvAddCustomersToAudience(none)----Processes all audiences
AdvAddCustomersToSpecificAudienceaudienceintYesAudience ID to evaluate

Client Extension Points

  • Job override: Create SanitizeGuestCustomers, AddTagsToCustomers, or AddCustomersToAudience in application/modules/job/libraries/ extending the Adv* base class.
  • Sanitize model override: Override Adv_sanitize_model in application/models/ to add custom sanitization targets (e.g., client-specific tables with PII).
  • Tag model override: Override AdvCustomerTagModel in application/modules/audience/models/ to add custom tag types or modify tag computation logic.
  • Invalid email domain: The sanitized email pattern uses config_item('order.invalid.email') which can be configured per client.
  • Tag groups: The CUSTOMER_TAGS_GROUPS constant can be extended with additional tag types if the tag model is overridden.

Business Rules

Sanitization (GDPR)

  1. Guest-only: Only guest accounts (is_guest=true) are targeted. Registered customers are never auto-sanitized.
  2. Interval-based: The SANITIZE_INTERVAL registry value controls the grace period in months. Setting it to 0 disables sanitization.
  3. UUID replacement: PII fields are replaced with random v4 UUIDs, not deleted, to preserve referential integrity.
  4. Conditional replacement: Fields are only replaced if they had a non-empty value originally. Empty fields remain empty.
  5. Cascade order: Customer data is sanitized first (collecting original emails), then orders, wishlists, tags, reviews (using collected emails), and waiting lists.
  6. Irreversible: Once is_sanitized=1 is set, the customer is permanently excluded from future tag generation and email campaigns.

Tag Generation

  1. Daily execution: Tags are typically regenerated daily, processing only the previous day's data for incremental tags.
  2. Non-guest, non-sanitized only: All tag queries filter out guests and sanitized customers.
  3. Fulfilled orders: Turnover and Avg Cart tags only consider orders with status SENT, PAID_SENT, or INVOICED.
  4. Gift exclusion: Category and Vendor tags exclude gift items (gift_id IS NULL) from order baskets.
  5. Last Referral replace: Unlike other incremental tags, Last Referral deletes existing records before inserting (in 1000-record chunks to avoid large DELETE queries).

Audience Orchestration

  1. Staggered execution: Sub-jobs are spaced 5 minutes apart to avoid overwhelming the database with concurrent criteria evaluations.
  2. Queue isolation: Audience jobs use the personalization queue, separate from other job queues.
  3. Single retry: Each audience sub-job allows one retry on failure.