Trend Research Automation User Manual

This manual explains how the trend-research-automation system works end-to-end: data collection, storage, normalization, scoring, dashboards, exports, reports, competitor analysis, configuration, maintenance, and future extension.

Project trend-research-automation

Primary stack Python, SQLite, pandas, pydantic, Typer, Streamlit, Jinja2

Primary use case Fashion and homewear trend discovery for organic and paid testing

Default timezone Africa/Cairo

1. System Purpose 2. Architecture 3. Data Flow 4. Data Providers 5. Database Model 6. Scoring Logic 7. Dashboard Guide 8. Pipelines and Commands 9. Configuration Files 10. Daily Operations 11. Maintenance and Troubleshooting 12. Future-Proofing Guidance

1. System Purpose

The system is designed to help a pajamas, lingerie, homewear, loungewear, and nightwear brand decide what to create and what to test. It combines trend demand, competitor advertising signals, internal fit heuristics, and future-ready placeholders for owned performance data.

The output is not a generic trend feed. It is a decision-support system intended to answer questions like:

Which themes are currently rising in Egypt?
Which themes are being reinforced by competitor advertising?
Should a trend be ignored, watched, tested organically, or tested in paid media?
What hook, format, and platform should be used first?

2. Architecture

Core Layers

config/: editable YAML business configuration
src/providers/: external source adapters
src/pipelines/: executable workflows
src/services/: scoring, classification, reporting, alerts, config loading
src/models/: schemas and SQLite management
dashboards/: Streamlit app plus dashboard export CSV
reports/: rendered HTML reports and alert files

Execution Modes

Mock: deterministic test data
Manual: CSV imports
Live: Google Trends live collection
Apify: competitor Meta Ad Library scraping

3. Data Flow

Providers fetch or import raw source data.
Raw rows are stored in source-specific SQLite tables.
Normalization maps different sources into one standard trend-signal shape.
Scoring calculates source scores, fit score, historical fit, and competitor proxy score.
Thresholds convert final scores into an operational decision.
Exports, reports, alerts, and dashboard views are generated from the scored data.

The system is intentionally modular. If one provider fails, the project should still remain usable with the available sources.

4. Data Providers

Google Trends

Google Trends is the strongest direct demand signal in the current setup. The project uses Egypt-focused tracked terms and collects interest data with locale-aware handling.

Stored in: google_trends_raw

Meta Competitor Monitoring

Competitor ads are collected through Apify using the Facebook Ads Library scraper actor. Exact Facebook page URLs are preferred where available because they produce better results than broad keyword search.

Stored in: meta_competitor_raw

Competitor scrape quality depends on the actor, the page URL quality, timeouts, and what Meta exposes publicly. Not all brands will return the same richness of data.

TikTok

TikTok is currently supported through mock and manual workflows. The architecture is ready for a more live integration later, but a stable live collector is not yet part of the current system.

Owned Social Performance

Internal Instagram, Facebook, and TikTok performance data can be imported manually. This is the future source for true historical performance weighting, including CTR, orders, revenue, and conversion-derived fit.

5. Database Model

The project uses SQLite. The main operational tables are:

Table	Purpose
`tracked_terms`	Keyword inventory used by live and manual collection.
`google_trends_raw`	Raw Google Trends rows.
`tiktok_trends_raw`	Raw TikTok trend rows.
`meta_competitor_raw`	Raw competitor ad rows from Meta/Apify.
`social_performance_raw`	Owned content/ad performance inputs.
`normalized_trends`	Unified cross-source trend signal table.
`trend_scores`	Final scored trend decisions.
`weekly_shortlist`	Action-oriented shortlist for testing.

6. Scoring Logic

Final Formula

final_score =
(google_score * 0.30) +
(tiktok_score * 0.25) +
(competitor_score * 0.20) +
(fit_score * 0.15) +
(historical_conversion_fit_score * 0.10)

Decision Thresholds

Decision	Rule	Meaning
`paid_test`	`>= 75`	Strong enough to justify paid testing.
`organic_test`	`>= 55 and < 75`	Strong enough for content testing.
`watchlist`	`>= 35 and < 55`	Worth monitoring, not a top priority.
`ignore`	`< 35`	Low priority right now.

Fit Score

Fit score is a heuristic measure of how relevant a topic is to the brand. It rewards direct product terms such as pajamas, lingerie, homewear, satin, cotton, and Egyptian Arabic equivalents.

Historical Conversion Fit Score

If owned performance data exists, the system can infer fit from revenue, orders, and CTR. If not, it defaults to a neutral score of 50.

Competitor Score

Competitor score is the strongest of:

normalized competitor signal score from Meta rows
competitor effectiveness proxy score

Competitor Effectiveness Proxy

Because competitor conversion rate is not available from Ad Library scraping, the system uses a transparent proxy. It rewards:

number of relevant ads
recurring creative usage
ad duration
offer strength such as discounts or free delivery
creative richness such as reel or carousel/video formats
optional impression-like fields only when they are real and non-empty

The system does not claim competitor conversion rate. The proxy is a directional signal only.

Keyword Matching Rules

Competitor relevance is intentionally constrained:

Brand name is not used as evidence of keyword usage.
Ad relevance comes from actual ad text fields such as campaign_angle and offer_text.
Category tags such as pajama, lingerie, and homewear are used for overlap matching.
The dashboard now exposes whether a trend is supported by exact match, tag_overlap, or none.

7. Dashboard Guide

Overview

Executive summary of the current system state: top score, top trend, current shortlist size, the top curated trend table, and basic charts.

Trend Scores

Detailed trend ranking view with the full planning-ready fields: scores, decision, platform recommendation, hook, format, paid-vs-organic recommendation, and shortlist rationale.

Competitors

Raw competitor ad rows. This is where scraped ad text, offer text, and media type can be inspected directly.

Keyword Support

Audit layer for Google keyword support. It shows, for each Google keyword, whether any competitor ad supports it by:

exact: the keyword appears in ad-relevant text
tag_overlap: the competitor ad shares the same category tags
none: no current competitor support

Raw Signals

Data quality and debugging section. Use it to inspect what was ingested from each source and how it was normalized.

Weekly Shortlist

Action layer for execution teams. Shows why a trend matters now and what to make first.

Exports

Download center for CSV and HTML outputs.

Important Dashboard Columns

Field	Meaning
`trend_topic`	Trend being ranked.
`final_score`	Weighted final priority score.
`decision`	Operational recommendation: paid, organic, watchlist, ignore.
`platform_priority`	Recommended primary platform.
`recommended_hook`	Suggested opening angle.
`recommended_format`	Recommended creative format.
`why_now`	Human-readable rationale for acting now.
`content_hook`	Shortlist-ready execution angle.

8. Pipelines and Commands

Main commands:

python -m src.main ingest-google-trends
python -m src.main ingest-meta-competitors
python -m src.main normalize-signals
python -m src.main score-trends
python -m src.main build-weekly-shortlist
python -m src.main export-dashboard
python -m src.main generate-daily-report
python -m src.main generate-weekly-report
python -m src.main run-daily-pipeline
python -m src.main run-weekly-pipeline
python -m src.main run-dashboard

Daily Pipeline

Ingest enabled sources
Normalize signals
Score trends
Export dashboard data
Generate daily report
Write alert output

Weekly Pipeline

Refresh scoring
Build weekly shortlist
Export dashboard data
Generate weekly report
Write weekly alert output

9. Configuration Files

Main editable files:

config/settings.yaml: global settings, scheduler times, provider enablement, paths
config/tracked_terms.yaml: the keyword universe
config/competitors.yaml: competitor list, platforms, handles, optional page_url
config/scoring.yaml: weights, thresholds, fit keywords, competitor proxy settings

10. Daily Operations

Recommended Daily Process

Run the daily pipeline.
Open the dashboard.
Check Overview and Trend Scores.
Check Keyword Support to see whether competitor reinforcement is exact or category-based.
Review Weekly Shortlist for action-ready ideas.
Export CSVs if reporting or BI tools need the latest output.

Recommended Weekly Process

Run the weekly pipeline.
Review shortlist quality and rationale.
Decide what becomes organic content, what becomes paid testing, and what remains on watchlist.

11. Maintenance and Troubleshooting

Common Issues

Issue	Likely Cause	Action
Competitor scrape returns weak data	Generic keyword search instead of exact page URL	Add or improve `page_url` in `competitors.yaml`.
No competitor data	Actor timeout or empty page results	Retry scrape, reduce scope, or improve competitor source URLs.
Scores look inflated	Matching rules too broad	Inspect the `Keyword Support` tab and tighten taxonomy rules.
Dashboard looks outdated	Exports not regenerated after scoring	Run `score-trends`, `build-weekly-shortlist`, and `export-dashboard`.

Security Practices

Never hardcode tokens into source files.
Use environment variables or .env.
Rotate tokens if they are ever pasted into logs, chat, or screenshots.

12. Future-Proofing Guidance

The system has been structured so it can evolve without a full rewrite. Recommended next upgrades:

Persist extracted topic/ad tags during normalization instead of deriving them only at scoring time.
Add a stronger TikTok collection path.
Add owned Meta/Instagram performance connectors for true historical conversion scoring.
Add richer competitor page discovery so each brand has stable Facebook page URLs.
Add migration/version tracking for schema changes beyond auto-create logic.
Add more dashboard diagnostics such as source freshness, scrape success rate, and provider health.

The most important future-proofing rule is to keep the system honest. If a signal is proxy-based, label it as a proxy. If a provider is weak, expose that weakness rather than hiding it inside a score.

Appendix: Key File Paths

README.md: quick-start operational instructions
USER_MANUAL.html: this full manual
dashboards/app.py: Streamlit dashboard
src/services/trend_scoring.py: main scoring logic
src/providers/meta_ad_library.py: competitor scrape adapter
db/trend_research.db: SQLite database
dashboards/dashboard_dataset.csv: dashboard-ready export

This manual is designed to be maintained with the codebase. When major provider, scoring, or dashboard behavior changes, update this file at the same time so the operational model stays aligned with the implementation.