Question from Reddit user:
Is there a way to get a report on 404 errors that my own domain is linking to?
I can’t find a way to do it with the built in reports nor a custom explore report.
I’m thinking there is some way to do it with tag manager and creating a custom event with a parameter that has the previous page in it or something?
The site has 100k+ pages and it’s not feasible to crawl it for a number of reasons.
Answer from Nabil:
The short answer is:
You are correct that the built-in GA4 reports and Explorations cannot easily report on internal 404s and their source pages because GA4’s default data model does not automatically link a 404 page view to the preceding click’s source link, and the Page referrer dimension often reports the previous correct page URL, not the internal link URL that was clicked.
The immediate solution is a two-step process using Google Tag Manager (GTM): first, create a Custom Event that specifically triggers on your site’s 404 page content, and second, add a parameter to that event to capture the Document Referrer
, which will be the last page visited, which is the page containing the broken internal link.
The superior long-term solution is to implement a robust data pipeline using the Google Analytics Data API, Looker Studio API, and BigQuery, potentially enhanced by server-side tagging with Stape or Google Cloud Platform, to perform the complex lookups and historical analysis you need for a site of your scale.
The long answer is:
The reason this is difficult in standard GA4 reports is that a 404 page is just another page view event, typically using the ‘Page not found’ page title, and the Page referrer
dimension will capture the page the user was on before they clicked the bad link.
To reliably identify broken internal links and their source pages, you need a custom implementation with Google Tag Manager.
The process you suggested is the right direction: you need to create a GA4 Event that specifically captures the fact that a 404 occurred and, most importantly, includes the Page Referrer
URL as a custom parameter.
To implement this, you would create a new GA4 Event tag in GTM that fires only on the 404 page (often identified by a page title that contains ‘Page not found’ or by checking the page’s HTTP status code, if possible).
This event would send the Page referrer
(which is readily available as a built-in GTM variable) as a custom event parameter, which you then need to register as a Custom Dimension in GA4.
Once this is set up, you can build an Exploration report where the row is your custom Page Referrer
dimension, filtered by your 404 event name.
For a large site that cannot be easily crawled, an even better, scalable, and historical solution involves leveraging the power of Google Analytics Data API, BigQuery, and Looker Studio.
The GTM setup is still necessary to tag the 404 events with the Page Referrer
and the 404 page’s URL.
Once this data is flowing, you can export all your raw GA4 event data to BigQuery.
In BigQuery, you can write powerful, non-sampled SQL queries to join and analyze this data over long time periods, isolating sessions where the 404 event occurred and directly linking it to the preceding page’s URL from the Page Referrer
parameter.
This allows you to report on trends and aggregate data across your 100k+ pages without any performance issues.
Finally, you use the Google Analytics Data API or connect BigQuery directly to Looker Studio via the Looker Studio API to create a custom dashboard that visualizes the top source pages linking to 404s, which you can filter by date and prioritize for fixing.
This pipeline – GTM/server-side for collection, BigQuery for processing, and Looker Studio for visualization – is the only way to perform this kind of scalable, granular, historical analysis reliably.