Build Data pipeline with Adobe data feeds | Community
Skip to main content
New Participant
May 22, 2025
Solved

Build Data pipeline with Adobe data feeds

  • May 22, 2025
  • 2 replies
  • 661 views

Hi Everyone,

 

I would like to check if someone has come across this type of request for data nalysis with Adobe data feed.To achieve this i am trying to build a data pipeline. i have a medallion architecture in mind. And the goal is to have one table at the visit level and aggregated tables as per business requirement. Please share your thoughts/challenges/experience if you have any regarding this. 

 

Thanks

Best answer by pradnya_balvir

Hi @sasikalaes ,

 

Key Design Considerations:

Data Ingestion

  • Format: Adobe feeds are TSV with thousands of columns.
  • Delivery: Often daily, partitioned by date/hour.
  • Tools: Use Spark, Databricks, or Snowflake for scalable parsing and ingestion.
  1. Visit-Level Aggregation
  • Adobe doesn’t explicitly give visits in data feeds, you must:
    • Use visit_num, visit_start_time_gmt, and post_visid_high/low to group hits.
    • Ensure sessionization logic (handling visit timeouts, cross-day visits).
  1. Identity Resolution
  • post_visid_high/low or mcvisid/mid fields used for visitor ID.
  • Cross-device stitching is not out-of-the-box; consider integrating with ECID/CRM IDs if available.
  1. Medallion Architecture
  • Bronze: Raw ingestion + minimal parsing (e.g., data types, partitioning).
  • Silver: Normalize fields, resolve sessions, de-duplicate hits.
  • Gold: Create dimension tables, aggregate for metrics like conversion rate, funnel analysis.
  1. Aggregation Examples
  • Sessions by traffic source.
  • Page views by product category.
  • Time spent on site by user cohort.
  • Custom attribution models for conversions.

Suggested Tools:

 

  • Data Lakehouse: Databricks (Delta Lake), Snowflake, BigQuery

  • Orchestration: Airflow, Azure Data Factory, dbt

  • Storage: S3 / ADLS Gen2 (Bronze/Silver/Gold folders)

  • Analytics: Power BI, Tableau, Looker

  • Schema Evolution: Apache Iceberg or Delta for handling schema changes

 

2 replies

Jennifer_Dungan
New Participant
May 24, 2025

One more thing to keep in mind.

 

Rad Data feeds have every row of data collected... including rows that have been excluded (bots, internal traffic, malformed data, etc).

 

When processing your raw data, don't forget to check the exclude_hit and make sure that you don't include these rows, or your data will be inflated.

 

 

Also, make sure you are using the "post" version of the data where ever possible.. this is the post-processed version of the data (so your processing rules, vista rules, etc). 

pradnya_balvir
pradnya_balvirAccepted solution
New Participant
May 22, 2025

Hi @sasikalaes ,

 

Key Design Considerations:

Data Ingestion

  • Format: Adobe feeds are TSV with thousands of columns.
  • Delivery: Often daily, partitioned by date/hour.
  • Tools: Use Spark, Databricks, or Snowflake for scalable parsing and ingestion.
  1. Visit-Level Aggregation
  • Adobe doesn’t explicitly give visits in data feeds, you must:
    • Use visit_num, visit_start_time_gmt, and post_visid_high/low to group hits.
    • Ensure sessionization logic (handling visit timeouts, cross-day visits).
  1. Identity Resolution
  • post_visid_high/low or mcvisid/mid fields used for visitor ID.
  • Cross-device stitching is not out-of-the-box; consider integrating with ECID/CRM IDs if available.
  1. Medallion Architecture
  • Bronze: Raw ingestion + minimal parsing (e.g., data types, partitioning).
  • Silver: Normalize fields, resolve sessions, de-duplicate hits.
  • Gold: Create dimension tables, aggregate for metrics like conversion rate, funnel analysis.
  1. Aggregation Examples
  • Sessions by traffic source.
  • Page views by product category.
  • Time spent on site by user cohort.
  • Custom attribution models for conversions.

Suggested Tools:

 

  • Data Lakehouse: Databricks (Delta Lake), Snowflake, BigQuery

  • Orchestration: Airflow, Azure Data Factory, dbt

  • Storage: S3 / ADLS Gen2 (Bronze/Silver/Gold folders)

  • Analytics: Power BI, Tableau, Looker

  • Schema Evolution: Apache Iceberg or Delta for handling schema changes