Using Protocol Buffers with Google APIs

16 Jul 2014

Disclaimer: This is hastily researched and poorly authored. I’d love to have any of the below assertions corrected by wiser folks!

Right up front, you almost certainly don’t want to use protocol buffers to access Google APIs. This is what every smart person told me when I asked them for help. As far as I can tell, very few people actually try this anyway, as evidence by the complete lack of any documentation on the topic.

I wanted to understand the mechanism a bit better so I went spelunking into the deep, undocumented recesses of Google’s APIs anyway, mostly because it’s Hackweek at Dropbox!

If you’ve never heard of Protocol Buffers (or protobufs for short) before, here’s the tagline: “a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.”

At a high-level, it’s a communication protocol not unlike XML and JSON. It also includes a structure definition language like structs in C/C++. In theory, you can use this definition to generate code for working with the data. Protocol Buffers were created by Google and open sourced in 2008. And as far as I can tell, no one else uses them.

Now because Google’s APIs all use shared infrastructure (which enables shared auth and a single SDK), it turns out that they all support protobufs. By default, they return JSON, but every now and then, the marketing will mention protobufs as well. In fact, the Gmail API folks did just that, causing me to follow them all the way down the rabbit hole.

All in all, it’s a rough ride but it can be done. And again, to be clear, you almost certainly don’t want to.

Talking to Google’s REST API endpoints

First up, I wanted to get a sense of how the Gmail API worked. I used my new favorite HTTP exploration tool, Paw, to start exploring endpoints. I’m going to skip right over the pains of setting up apps using the Google developer console, and skip OAuth as well (the guide on OAuth 2 from a webserver worked well).

It’s obvious that Google would rather we all just use their SDKs. The docs and tutorials are entirely set up around one of the individual languages. For the Gmail API, only the very first page of the API reference mentions anything about the REST endpoints. This is what I used to test that I could at least get responses in JSON.

GET https://www.googleapis.com/gmail/v1/users/me/threads

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8

{
  "threads": [
    {
      "id": "1473cacbd4d2f10d",
      "snippet": "",
      "historyId": "8487586"
    },
  ...
  ],
  "nextPageToken": "09120543318741434532",
  "resultSizeEstimate": 226
}

Getting protobufs from the API

This isn’t documented anywhere at all. In fact, I only found this buried in the source of the Python SDK. In order to get protobufs from the API, you need to add this to your API calls:

?alt=proto

Yep, that’s it. With that, the above call transformed into something less familiar.

GET https://www.googleapis.com/gmail/v1/users/me/threads?alt=proto

HTTP/1.1 200 OK
Content-Disposition: attachment
Content-Type: application/x-protobuf

 ...binary...

Cool!

Doing something useful with protobufs

Here in lies the rub. In order to actually handle protobuf data, you’re supposed to have access to the .proto file that defines how the data is laid out. Gmail doesn’t publish the .proto definition for their API. In fact, I couldn’t find a single Google API that does publish a .proto file. The only Google .proto file I could find was for unofficially accessing the Google Play store. Remember when I said no one uses protobufs?

But I’d come too far to be turned away. At this point, with some advice from a fellow Dropboxer, I set to trying to write a .proto file by hand. Turns out there’s a few resource that make this possible, though not something I’d ever want to have to do regularly.

The Google API Discovery API

It’s probably not suprising, but Google has an API just for discovering the details about its other APIs called the Discovery API. I was hoping this would actually programmatically return .proto files but no such luck. Instead, it returns a JSON schema for every Google API. This is actually what Google uses to generate code for all of their SDKs.

For my list threads example, it gave a reasonably detailed response, that looked almost like what I needed, but not quite.

GET https://www.googleapis.com/discovery/v1/apis/gmail/v1/rest

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8

...

  "ListThreadsResponse": {
   "id": "ListThreadsResponse",
   "type": "object",
   "externalTypeName": "caribou.api.proto.ListThreadsResponse",
   "properties": {
    "nextPageToken": {
     "type": "string",
     "description": "Page token to retrieve the next page of results in the list."
    },
    "resultSizeEstimate": {
     "type": "integer",
     "description": "Estimated total number of results.",
     "format": "uint32"
    },
    "threads": {
     "type": "array",
     "description": "List of threads.",
     "items": {
      "$ref": "Thread"
     }
    }
   }

...

Sadly .proto files have a catch: the ordering of fields must be specified using numbered tags. This discovery service does not return the numbering, and it turns out, they’re not returned in the right order either. I needed to try another route.

Decoding protobufs

To decode protobufs on the command line, I installed protoc. The protocol buffers docs links to a Windows binary, but I used brew to install protoc on my Mac.

We discovered (thanks, Kannan!), that protoc has a handy flag --decode_raw which will allow protoc to unbundle a protobuf stream and identify each field by its tag number. I used curl to pipe in the request:

$ curl -X GET "https://www.googleapis.com/gmail/v1/users/me/threads?alt=proto" \
-H "Authorization: Bearer ..." | protoc --decode_raw

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2725    0  2725    0     0   5745      0 --:--:-- --:--:-- --:--:--  5736

1 {
  1: "1473d91500a2b818"
  2: ""
  3: 8488797
}
...
2: "11516987623344274775"
3: 225

Behold, I had my tag numbers.

Putting it together

With that, I could build the simple .proto file needed to parse the response to this request.

message ListThreadsResponse {
  repeated Thread threads = 1;
  optional string nextPageToken = 2;
  optional int64 resultSizeEstimate = 3;
}

message Thread {
  optional string id = 1;
  optional string snippet = 2;
  optional int64 historyId = 3;
}

With that, the fields are now labeled properly.

$ curl -X GET "https://www.googleapis.com/gmail/v1/users/me/threads?alt=proto" \
-H "Authorization: Bearer ..." | protoc --decode=ListThreadsResponse gmail.proto 

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2725    0  2725    0     0   5745      0 --:--:-- --:--:-- --:--:--  5736

threads {
  id: "1473d91500a2b818"
  snippet: ""
  historyId: 8488797
}
...
nextPageToken: "11516987623344274775"
resultSizeEstimate: 225

We experimented with writing a hacky script to contact the Discovery API, using the schema and properties it returned layout the framework for a .proto file, but there’s still a manual step to fill in the index values.

That’s a wrap

All in all, it took longer than expected but was certainly educational. The next logical step would be to use Protocol Buffer’s built-in mechanism to generate client SDK code. Instead, I took my own advice from the beginning. JSON works just fine.

2013 was the year of subscriptions

28 Dec 2013

Looking back at my purchases this year, I realized 2013 appears to be the year when subscriptions services finally crossed the personal tipping point.

What would I do without Amazon Prime?
I finally converted to Pandora One last year, but I’ve added Rdio to the mix as well. As a result, I’ve effectively stopped buying albums with the exception of the odd indie band that hasn’t made it onto Rdio yet. There, I’m using Google Music as my music locker, which thankfully added an iOS app to their stable this year.
My roommates and I split subscriptions to Netflix, Hulu Plus, Amazon Prime Instant Video, and HBO GO covering any TV content we want perfectly and movies relatively well. As a result, there’s no need to even consider any less reputable channels for video. And Google Chromecast has been a quantum leap in viewing experience; my media center Mac Mini and Roku are collecting dust outside of Amazon content. They’re so cheap, they make awesome gifts too. I’ve bought a half dozen at this point.
I effectively have a Dropbox subscription though being employed by them is a strong bias. Though I canceled my Backblaze subscription and went back to an external hard drive.
I fell in love with IRL subscriptions too. Sock Panda makes me happy and I’ve added Mistobox for coffee to the list as well. I experimented with Trunk Club but wasn’t really happy with the experience (same thing happened with Dollar Shave Club last year). I’m excited to find a similar service that solves the experience concerns in the not-too-distant future though.
Conspicuously missing is any subscription to pay for a mobile app. IAP subscription are available so I’m sure it’s also just a matter of time.

So what changed? It suspect a mixture of factors rather than one primary cause. Wider range of services with more complete offerings and more confidence in their ability to deliver definitely helped. A personal premium on time this year almost certainly contributed as well. Still, very surprised by how quickly my attitude and adoption changed. Anyone know if there’s any macro trend data on the adoption of subscription services in general?

Unattributed thoughts on company culture

16 Aug 2013

Can’t attribute as it came to me off-the-record, second-hand, and not verbatim, but too pithy not to capture:

“All company cultures suck. As the CEO, you get to choose the ways it sucks.”

“Your company’s values are trade-offs. The best indicator of a really great value is that it could be reasonably argued against.”

A smartphone vacation in Indonesia

30 Jul 2013

I wrapped a two week vacation to Indonesia. I took a factory reset Nexus S along with me as my internet lifeline. It’s been a few years since I last vacationed internationally and the entire experience was radically different this time, thanks substantially to massive penetration of high speed internet and smartphones. A few notes and observations on using both from my trip:

Internet

I purchased an Indosat SIM card with 6GB of 3G data for the equivalent of $7.50 USD (it also included a few SMS and minutes for coordinating with drivers, the original purpose)
Free wifi blanketed us everywhere we went, airports, train stations, every hotel/hostel, and most restaurants. I have 19 wifi networks remembered.

Apps installed

Facebook
Twitter
Dropbox – Used favorites feature to cache itineraries, tickets, and receipts offline
Rdio – Offline caching worked great
Kindle
Instagram
Hipmunk
FlightAware – Not as good as FlightTracker Pro on iOS but helpful for our Eva Air flights
Indonesia Flights – Really great flight aggregator for a lot of the domestic Indonesian airlines
foursquare – Surprisingly popular in Indonesia (Jarkarta airport was “swarming” when I landed). Great for finding restaurants
TripAdvisor – Good content but app is just a website wrapper and sucks
Tap & Say – Helpful phrase book, but despite what the description says, still requires an internet connection. Wouldn’t buy again.
Google Sky – Cool toy when spending evenings outside civilization
Skype WiFi – Useless. Need to research a better wifi SIP app

Android
Used 4.3 on a relatively old phone (picked it because it had a traditional SIM slot). Experience was slow and rapidly slowed with more apps, battery life was short, many apps performed poorly in low connectivity conditions.

Technology in Indonesia

BlackBerry and candy bar phones still popular
Tons of tablet use among tourists
Nearly every restaurant had its own Facebook, Twitter, and email address (Yahoo popular, occasionally Gmail)
WhatsApp, Line, and other messaging apps heavily advertised/bundled with telco packages

On Google Apps scrapping the free offering

09 Dec 2012

From Readwrite

In a bid to make things “very straightforward,” Google axed the basic Apps plan, which offered free email, calendaring and documents, plus 5GB of generic Google Drive storage, to up to 10 users per month. All companies will now have to pay the $50 per year cost of Google Apps for Business.

The change garnered a lot of coverage, but it isn’t that surprising. Google’s already convinced everyone that the service is worth (at least) the $50/year. While not surprising, it is exciting. I’m excited to see Google’s shift away can unblock some innovation in the email space.

When GMail launched, it dominated a lot crappy incumbents. Sadly though, now GMail is the old king that isn’t innovating. While there’s a fair amount of innovation on the email client side (Mailbox, Mail Pilot, zeromail, and the late Sparrow), they’re all still relying on Gmail as a backend because the free Google Apps offering essentially sucked the oxygen out of the room for any company building a simple, hosted email service for personal domains. Now that Google isn’t offering custom domain email for free, there’s a much more attractive business model available for one of these clients to jump on. Bonus points if a startup could build a rev share model that would financially support any of the clients.

Older Newer

CS and the City Sean Lynch