AI Report: Analytics & Business Intelligence Stack at Meta and Google
Executive Summary
Both companies have built extensive custom analytics infrastructures over 15+ years to process petabytes of user data daily. Unlike their ML stacks, much of this is proprietary internal tooling designed for scale that far exceeds commercial BI platforms.
Meta's Analytics Infrastructure
Core Data Processing Stack
| Component | Purpose | Notes |
|---|---|---|
| Presto | Distributed SQL query engine | Created at Facebook; open-sourced in 2013 |
| Hive | Data warehousing & ETL | Heavily modified internally |
| Apache Kafka | Real-time data streaming | Petabytes/day throughput |
| Scribe/Scribe 2.0 | Logging infrastructure | Custom-built for scale |
Query & Analytics Tools
Presto (Open Source)
- Developed specifically at Facebook to enable analysts to query live user data without heavy ETL
- Enables SQL queries across billions of rows in seconds
- Apache Presto Documentation
Internal BI Dashboards
- Custom-built visualization platforms for product metrics, ad performance, user engagement
- Tightly integrated with their data pipelines
- Not publicly documented (proprietary)
Ad Measurement & Attribution Systems
- Proprietary tools for tracking campaign effectiveness
- Cross-platform attribution (Instagram, Facebook, WhatsApp)
- Conversion lift measurement
Engineering Documentation
- “How we process petabytes of data at Facebook” — Meta Engineering Blog
- Presto Architecture Paper
- “Facebook's Data Processing Infrastructure” — QCon talks (various years)
Google's Analytics Infrastructure
Core Data Processing Stack
| Component | Purpose | Notes |
|---|---|---|
| BigQuery | Serverless data warehouse | External product; internal version powers much of Google |
| Colossus | Distributed file system (GFS successor) | Proprietary, not public |
| MapReduce / FlumeJava | Batch processing framework | Internal tools |
| MillWheel / Dataflow | Stream processing | Open-sourced as Apache Beam/Dataflow |
Query & Analytics Tools
BigQuery (Internal + External)
- Google's internal analytics largely run on BigQuery infrastructure
- Enables SQL queries on exabytes of data with sub-second latency
- Google Cloud BigQuery
Search Quality & AdWords Analytics
- Highly proprietary tools for understanding search behavior, ad relevance
- Click-through rate analysis, quality scoring systems
- Not publicly documented in detail
YouTube Analytics Infrastructure
- Separate but integrated with core Google analytics
- Viewer retention, engagement, recommendation effectiveness
Engineering Documentation
- “The Google File System” — OSDI 2003 (foundational)
- BigQuery Technical Papers
- “Processing Petabytes of Data at Google” — Various conference talks
Common Patterns in Their Analytics Stacks
Architectural Principles Shared by Both
| Pattern | Description |
|---|---|
| Lambda Architecture | Separate batch and real-time processing layers that merge for analysis |
| Data Lakehouse Model | Raw data stored cheaply; processed on demand with SQL-like interfaces |
| Columnar Storage | Parquet, ORC formats optimized for analytical queries |
| Tiered Storage | Hot/warm/cold data placement based on access patterns |
What They Measure (Common Metrics)
Both companies track similar categories of actionable information:
User Engagement
- Time spent, sessions per user, retention cohorts
- Feature adoption rates
Revenue & Monetization
- Ad impressions, CTR, conversion rates
- ARPU (Average Revenue Per User)
Product Health
- Error rates, latency percentiles
- System uptime and availability
Attribution & Lift
- Campaign effectiveness
- Incrementality testing
Key Differences: Meta vs Google Analytics Approach
| Dimension | Meta | |
|---|---|---|
| Open Source Strategy | Very aggressive (Presto, Hive contributions) | Selective; much remains internal |
| Primary Focus | Social graph analysis, ad targeting | Search quality, ranking optimization |
| Real-time Emphasis | Heavy focus on live feed/personalization | Strong in Ads, more batch for Search |
| External Exposure | Limited (mostly via ads platform) | BigQuery available to customers |
What Remains Opaque / Proprietary
⚠️ Important Caveats:
Internal dashboards are not public — Neither company publishes their internal BI tool names or screenshots
Custom-built metrics engines — Their attribution and measurement systems are highly proprietary trade secrets
Real-time recommendation analytics — The systems powering “people you may know” (Meta) or search ranking adjustments (Google) are not documented in detail
A/B testing infrastructure — Both run thousands of experiments daily; their experimentation platforms are internal
Sources & Further Reading
Meta
- Engineering.fb.com — Official engineering blog
- Presto Documentation
- Facebook's Data Engineering talks at QCon, Strata conferences (archived)
- Google Research Publications
- BigQuery Architecture Papers
- “The Datacenter as a Computer” — Book on Google infrastructure