Appearance
Customer Data Jobs (GDPR / Sanitize / Tags / Audience)
Flow ID: SY-09 | Module(s): job, eshop, audience | Complexity: High
Business Overview
Three automated jobs manage customer data lifecycle and segmentation:
- AdvSanitizeGuestCustomers (GDPR compliance): Replaces personally identifiable information for inactive guest accounts with UUIDs after a configurable interval, ensuring compliance with data protection regulations.
- AdvAddTagsToCustomers: Generates behavioral and demographic tags from order history, enabling customer segmentation for marketing and audience targeting.
- AdvAddCustomersToAudience: Orchestrates audience membership refresh by spawning staggered sub-jobs that evaluate each audience's criteria against customer tags.
Together, these jobs form the data pipeline for customer personalization: sanitization removes stale data, tagging classifies active customers, and audience assignment groups them for campaigns.
Architecture
AdvSanitizeGuestCustomers AdvAddTagsToCustomers AdvAddCustomersToAudience
| | |
v v v
customer_model customer_tag_model audience_model
getCustomersToAutoSanitize() saveCustomersTags() getRecords()
| | |
v v v
sanitize_model 8 tag generation methods For each audience:
sanitizeAll() (INSERT IGNORE / REPLACE) create AddCustomersTo
| SpecificAudience job
v (5-min stagger)
6 cascade sanitizations:
userSanitize()
orderSanitize()
wishListSanitize()
customersTagsSanitize()
reviewsSanitize()
waitingListSanitize()Key Components
| Component | Path | Role |
|---|---|---|
AdvSanitizeGuestCustomers | ecommercen/job/libraries/AdvSanitizeGuestCustomers.php | Job: GDPR guest data sanitization |
AdvAddTagsToCustomers | ecommercen/job/libraries/AdvAddTagsToCustomers.php | Job: daily customer tag generation |
AdvAddCustomersToAudience | ecommercen/job/libraries/AdvAddCustomersToAudience.php | Job: orchestrates audience refresh |
AdvAddCustomersToSpecificAudience | ecommercen/job/libraries/AdvAddCustomersToSpecificAudience.php | Sub-job: evaluates single audience criteria |
Adv_sanitize_model | ecommercen/eshop/models/Adv_sanitize_model.php | Model: PII replacement and cascade deletion |
AdvCustomerTagModel | ecommercen/audience/models/AdvCustomerTagModel.php | Model: tag computation and storage |
Adv_customer_model | ecommercen/eshop/models/Adv_customer_model.php | Model: customer queries for sanitization targets |
| Client stubs | application/modules/job/libraries/SanitizeGuestCustomers.php, etc. | Empty extension points |
Code Flow
AdvSanitizeGuestCustomers (GDPR)
- Entry: Reads
SANITIZE_INTERVALfrom registry (ESHOPgroup). If zero or negative, exits immediately. - Target selection:
customer_model->getCustomersToAutoSanitize($interval)queriesshop_customerfor:is_guest = trueis_sanitized = falsedate_registered < (now - $interval months)(unix timestamp comparison)
- Cascade sanitization via
sanitize_model->sanitizeAll($userIds):- userSanitize: For each customer, replaces name, surname, email, address, city, phone numbers, company details with UUIDs. Sets
birthdate = null,active_token = null,has_access = 0,is_sanitized = 1. Email becomes{uuid}@invalid.email. - orderSanitize: For each order belonging to the customer, replaces pricing/shipping names, addresses, phones, company details, AFM/DOY with UUIDs. Nullifies customer/courier comments.
- wishListSanitize: Deletes all wishlist records for the customer IDs.
- customersTagsSanitize: Deletes records from
shop_customer_tagandshop_customer_campaignfor the customer IDs. - reviewsSanitize: Deletes records from
shop_customer_reviewsmatching the original email addresses (collected before email was replaced). - waitingListSanitize: Replaces email addresses in
shop_waiting_listwith UUID-based invalid emails.
- userSanitize: For each customer, replaces name, surname, email, address, city, phone numbers, company details with UUIDs. Sets
AdvAddTagsToCustomers
- Entry: Computes
fromDateas yesterday's date (-1 days). - Delegation: Calls
customer_tag_model->saveCustomersTags($fromDate). - Tag generation (8 types, each gated by
CUSTOMER_TAGS_GROUPSconstant):
| Tag Type | Group ID | SQL Strategy | Source Data |
|---|---|---|---|
| Category | 2 | INSERT IGNORE | Order baskets -> product categories |
| Vendor | 1 | INSERT IGNORE | Order baskets -> product vendors |
| Registered | 3 | INSERT IGNORE | Customer is_guest=0 status |
| Last Referral | 4 | DELETE + insert_batch | Latest order's skroutz_referer |
| Country | 7 | REPLACE INTO | Order billing country (alpha_2) |
| County | 8 | REPLACE INTO | Order billing county |
| Turnover | 5 | REPLACE INTO | SUM of total_vat across fulfilled orders |
| Avg Cart | 6 | REPLACE INTO | AVG of total_vat across fulfilled orders |
- Filtering: All queries exclude guest customers (
is_guest=0) and sanitized customers (is_sanitized=0). - Date filtering: When
fromDateis provided, only processes orders/registrations from that date forward (except Turnover and Avg Cart which always recalculate globally). - Idempotency:
INSERT IGNOREprevents duplicate tags for Category/Vendor/Registered.REPLACE INTOoverwrites for Country/County/Turnover/Avg Cart. Last Referral explicitly deletes old records before inserting.
AdvAddCustomersToAudience
- Entry: Fetches all audience records via
audience_model->getRecords(). - Job creation: For each audience, creates an
AddCustomersToSpecificAudiencejob with:- Queue:
personalization - Status:
JOB_NOT_STARTED - Run time: staggered at 5-minute intervals from current time
- Grace time: 60 seconds
- Retries: 1
- Arguments:
{"audience": <audience_id>}
- Queue:
- Sub-job execution:
AdvAddCustomersToSpecificAudiencevalidates the audience ID and callsaudience_model->saveCustomersToAudience($audienceId).
Data Model
Tables
| Table | Role |
|---|---|
shop_customer | Customer records; is_guest, is_sanitized, date_registered fields control sanitization |
shop_order | Order records; sanitized fields include pricing/shipping PII |
wishlist | Deleted entirely for sanitized customers |
shop_customer_tag | Customer-tag associations (customer_id, tag_id, group_id, custom_value) |
shop_customer_campaign | Campaign tracking (prevents duplicate birthday/points emails) |
shop_customer_reviews | Review action tracking; deleted for sanitized email addresses |
shop_waiting_list | Waiting list entries; email replaced for sanitized customers |
audience | Audience definitions |
audience_criteria | Criteria rules for audience membership |
shop_customer_audience | Customer-audience membership |
shop_order_basket | Order line items; joined for tag computation |
product_codes | Product codes; joined for tag computation |
shop_product_category_lp | Product-category links; joined for category tags |
country | Country reference table; validates country codes |
county | County reference table; validates county codes |
Tag Group IDs (CUSTOMER_TAGS_GROUPS constant)
| Group | ID | tag_id Meaning | custom_value Meaning |
|---|---|---|---|
| VENDOR | 1 | vendor_id | null |
| CATEGORY | 2 | category_id | null |
| REGISTERED | 3 | 1 (registered) / 0 (guest) | null |
| LAST.REFERRAL | 4 | Referrer channel ID (0 = none) | null |
| TURNOVER | 5 | 0 | Total order value (decimal) |
| AVG.CART | 6 | 0 | Average cart value (decimal) |
| COUNTRY | 7 | 0 | Country alpha_2 code |
| COUNTY | 8 | 0 | County alpha code |
Configuration
Registry Settings
| Group | Key | Description |
|---|---|---|
ESHOP | SANITIZE_INTERVAL | Months after which guest accounts are sanitized (0 = disabled) |
Application Constants
| Constant | Path | Description |
|---|---|---|
CUSTOMER_TAGS_GROUPS | application/config/constants.php | Maps tag type names to group IDs |
CUSTOMER_CAMPAIGN_TYPE | application/config/constants.php | Maps campaign types: POINTS_REMAINING=1, BIRTHDAY=2 |
Job Options
| Job | Option | Type | Required | Description |
|---|---|---|---|---|
AdvSanitizeGuestCustomers | (none) | -- | -- | Interval from registry |
AdvAddTagsToCustomers | (none) | -- | -- | Uses yesterday as fromDate |
AdvAddCustomersToAudience | (none) | -- | -- | Processes all audiences |
AdvAddCustomersToSpecificAudience | audience | int | Yes | Audience ID to evaluate |
Client Extension Points
- Job override: Create
SanitizeGuestCustomers,AddTagsToCustomers, orAddCustomersToAudienceinapplication/modules/job/libraries/extending theAdv*base class. - Sanitize model override: Override
Adv_sanitize_modelinapplication/models/to add custom sanitization targets (e.g., client-specific tables with PII). - Tag model override: Override
AdvCustomerTagModelinapplication/modules/audience/models/to add custom tag types or modify tag computation logic. - Invalid email domain: The sanitized email pattern uses
config_item('order.invalid.email')which can be configured per client. - Tag groups: The
CUSTOMER_TAGS_GROUPSconstant can be extended with additional tag types if the tag model is overridden.
Business Rules
Sanitization (GDPR)
- Guest-only: Only guest accounts (
is_guest=true) are targeted. Registered customers are never auto-sanitized. - Interval-based: The
SANITIZE_INTERVALregistry value controls the grace period in months. Setting it to 0 disables sanitization. - UUID replacement: PII fields are replaced with random v4 UUIDs, not deleted, to preserve referential integrity.
- Conditional replacement: Fields are only replaced if they had a non-empty value originally. Empty fields remain empty.
- Cascade order: Customer data is sanitized first (collecting original emails), then orders, wishlists, tags, reviews (using collected emails), and waiting lists.
- Irreversible: Once
is_sanitized=1is set, the customer is permanently excluded from future tag generation and email campaigns.
Tag Generation
- Daily execution: Tags are typically regenerated daily, processing only the previous day's data for incremental tags.
- Non-guest, non-sanitized only: All tag queries filter out guests and sanitized customers.
- Fulfilled orders: Turnover and Avg Cart tags only consider orders with status
SENT,PAID_SENT, orINVOICED. - Gift exclusion: Category and Vendor tags exclude gift items (
gift_id IS NULL) from order baskets. - Last Referral replace: Unlike other incremental tags, Last Referral deletes existing records before inserting (in 1000-record chunks to avoid large DELETE queries).
Audience Orchestration
- Staggered execution: Sub-jobs are spaced 5 minutes apart to avoid overwhelming the database with concurrent criteria evaluations.
- Queue isolation: Audience jobs use the
personalizationqueue, separate from other job queues. - Single retry: Each audience sub-job allows one retry on failure.
Related Flows
- SY-01 Cron Framework -- Job registration and scheduling infrastructure
- AD-04 Customer Management -- Admin customer management
- AD-22 Audience & Campaigns -- Audience definition and campaign management
- AD-46 Audience REST -- Modern REST API for audience management
- SY-12 Birthday & Points Emails -- Uses
shop_customer_campaignfor tracking - CF-32 Loyalty Points -- Points system that interacts with tag generation