This guest post was written by Daniel Peter, Senior Programmer Analyst at Safari Books Online.

Cross-posted from the Google Cloud Platform Blog

Safari Books Online is a subscription service for individuals and organizations to access a growing library of over 30,000 technology and business books and videos. Our customers browse and search the library from web browsers and mobile devices, generating powerful usage data which we can use to improve our service and increase profitability. We wanted to quickly and easily build dashboards, improve the effectiveness of our sales teams and enable ad-hoc queries to answer specific business questions. With billions of records, we found it challenging to get the answers to our questions fast enough with our existing MySQL databases.

Looking for alternative solutions to build our dashboards and enable interactive ad-hoc querying, we played with several technologies, including Hadoop. In the end, we decided to use Google BigQuery.

Here’s how we pipe data into BigQuery:



Our data starts in our CDN and server logs, gets packaged up into compressed files, and runs through our ETL server before finishing in BigQuery.

Here’s one of the dashboards we built using the data:



You can see that with the help of BigQuery, we can easily categorize our books. This dashboard shows popular books by desktop and mobile, and with BigQuery, we are able to run quick queries to dive into other usage patterns as well.

BigQuery has been very valuable for our company, and we’re just scratching the surface of what is possible.

Check out the article for more details on how we manage our import jobs, transform our data, build our dashboards, detect abuse and improve our sales team's effectiveness.


Posted by Scott Knaster, Editor