At Data Summit 2022 in Boston, Sudha Viswanathan, staff engineer, Wayfair, presented a talk titled, “Gaining Insights From Clickstream Data.”
As Viswanathan explained, Wayfair’s massive, petabyte-scale clickstream data environment consists of processes and data sources designed to capture and represent external customer activity while they browse one of Wayfair storefront sites/apps.
To gain important insights to inform its strategic direction, the company processes clickstream datasets daily using Google BigQuery. Its intent is to capture all activity from legitimate external customers and to create actionable site data for marketing and storefront analytics. Taking a deep dive into the data processing architecture, Viswanathan discussed the methods, technology, and processes used at Wayfair to build data processing and analytics at scale. She showcased the next-gen data modeling practices, with special emphasis on how data processing has advanced with the advent of cloud computing.
Viswanathan explained that Wayfair’s clickstream data refers to data that contains information about customer actions on the Wayfair site, such as what pages were viewed, the products that were clicked, what was added to the cart, and which URL brought the customer to Wayfair. This helps Wayfair make data-driven decisions regarding revenue attribution for different marketing channels and improves traffic and test analysis and ad bidding.
Wayfair uses Looker, an enterprise platform for BI and analysis to explore, share, and visualize data so that it can make better decisions. According to Viswanathan, you can connect Looker to any relational database, such as Amazon RedShift, SQL Server, MySQL, or Google BigQuery. Looker automatically generates a data model from your schema, and you can redefine the model to reflect your company’s unique metrics and business logic and build the basics such as KPI dashboards and departmental reports, and then invite users to self-serve.
Wayfair has implemented a hub and spoke model with Looker which helps it to maintain one source of truth for data accuracy and accountability. It also allows other projects to get of the ground quickly by using existing code, and enables business logic to immediately propagate to the rest of the company via a code change in the master project.
Viswanathan said visualization on big data was always a had problem but there are tools that provide both scalability and simplicity to enable high quality analytics. And, finally, said Viswanathan, data processing using cloud technologies is a game-change for data engineers.
Many Data Summit 2022 presentations are available for review at https://www.dbta.com/DataSummit/2022/Presentations.aspx.