Hello There, Guest!
View New Posts  |  View Today's Posts
[C] Google Protocol Buffers - Decoding VarInt

  • 0 Vote(s) - 0 Average

09-05-2015, 11:58 PM #1
Posts: 9,733 Threads:1,026 Joined: Jun 2011 Reputation: 76

Google Protocol Buffers - Decoding VarInt
Here's a bit of C code that I wrote to demonstrate decoding a binary stream encoded as a varint based on the google protocol buffers documentation.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

#define MAX_BUF 2048

#ifdef _MSC_VER
typedef __int8  int8;
typedef __int16 int16;
typedef __int32 int32;
typedef __int64 int64;
typedef unsigned __int8  uint8;
typedef unsigned __int16 uint16;
typedef unsigned __int32 uint32;
typedef unsigned __int64 uint64;
#else /* !_MSC_VER */
typedef int8_t  int8;
typedef int16_t int16;
typedef int32_t int32;
typedef int64_t int64;
typedef uint8_t  uint8;
typedef uint16_t uint16;
typedef uint32_t uint32;
typedef uint64_t uint64;

#define encode_sint32(n) (((n) << 1) ^ ((n) >> 31))
#define encode_sint64(n) (((n) << 1) ^ ((n) >> 63))
#define decode_sint(n) (((n) >> 1) ^ (-((n) & 1)))

#define decode_varint(T) decode_varint_f##T
#define def_decode_varint(T)                                       \
  void *decode_varint_f##T(const unsigned char *pbytes,             \
                          void *pbuf, int sint_flag)               \
  {                                                                \
    int e = 0;                                                     \
    T *varint = (T *)pbuf;                                         \
    *varint = 0;                                                   \
    for (;;)                                                       \
    {                                                              \
      *varint |= (*pbytes & 0x7F) << e;                            \
      if (!((*pbytes >> 7) & 1)) break;                            \
      e += 7;                                                      \
      ++pbytes;                                                    \
    }                                                              \
    if (sint_flag)                                                 \
    {                                                              \
      *((T *)pbuf) = decode_sint(*((T *)pbuf));                    \
    }                                                              \
    return pbuf;                                                   \


#define decode_varint_int32(pbytes, pbuf) decode_varint(int32)(pbytes, pbuf, 0)
#define decode_varint_int64(pbytes, pbuf) decode_varint(int64)(pbytes, pbuf, 0)
#define decode_varint_sint32(pbytes, pbuf) decode_varint(int32)(pbytes, pbuf, 1)
#define decode_varint_sint64(pbytes, pbuf) decode_varint(int64)(pbytes, pbuf, 1)

#define WIRE_TYPE_INVALID              -1
#define WIRE_TYPE_VARINT              0x0 /* int32, int64, uing32, uint64, sint32, sint64, bool, enum */
#define WIRE_TYPE_64BIT               0x1 /* fixed64, sfixed64, double */
#define WIRE_TYPE_LENGTH_DELIMITED    0x2 /* string, bytes, embedded messages, packed repeated fields */
#define WIRE_TYPE_START_GROUP         0x3 /* groups (deprecated) */
#define WIRE_TYPE_END_GROUP           0x4 /* groups (deprecated) */
#define WIRE_TYPE_32BIT               0x5 /* fixed32, sfixed32, floag */

#define GET_WIRE_TYPE(v) ((v) & 0x7)
#define GET_FIELD_NUM(v) ((v) >> 0x3)

int main(void)
  unsigned char bytes[] = { 0x08, 0x96, 0x01 },
                          *pbyte = bytes;
  int wire_type, field_number;
  int key = *pbyte++;

  unsigned char buf[MAX_BUF] = { 0 };
  printf("wire_type: %d\n", (wire_type = GET_WIRE_TYPE(key)));
  printf("field_number: %d\n", (field_number = GET_FIELD_NUM(key)));

  if (wire_type == WIRE_TYPE_INVALID)
    fputs("ERROR: Invalid wire_type\n", stderr);

  /* testing */
  decode_varint_int32(pbyte, buf);
  printf("int32 varint: %d\n", *((int *)buf));
  decode_varint_int64(pbyte, buf);
  printf("int64 varint: %d\n", *((int *)buf));
  decode_varint_sint32(pbyte, buf);
  printf("sint32 varint: %d\n", *((int *)buf));
  decode_varint_sint64(pbyte, buf);
  printf("sint64 varint: %d\n", *((int *)buf));


There currently is no parsing for .proto file message formats, and I'm also missing the encoding/decoding of a few other wire types, and probably the last couple subtypes of the varint wire type. The problem here without parsing the .proto file is that from a binary stream there is no way to tell whether a varint was originally intended to be an int32/64 or an sint32/64. This is an issue in this case because you'll never know whether you have to decode the zigzag encoding or not, and you will end up with a completely different decoded value than what the original value was if you determine it to be the wrong type. This is mainly the case with similar datatypes, but between entirely different wire types, it is sometimes easy to determine the exact type based on the byte stream, as long as there is no ambiguity when it comes down to a possibility of more than a single data type that matches the byte stream structure. However, I only wrote this bit of code to demonstrate the process of decoding the byte stream, and specifically for varints: int32, int64, sint32, and sint64.

More information here: https://developers.google.com/protocol-b...s/encoding
This post was last modified: 12-01-2015, 09:22 PM by AceInfinity. Edit Reason: Updated code as per Richard

Microsoft MVP .NET Programming - (2012 - Present)
®Crestron DMC-T Certified Automation Programmer

Development Site: aceinfinity.net

 ▲ ▲

11-28-2015, 03:21 PM #2
Junior Member
Posts: 1 Threads:0 Joined: Nov 2015 Reputation: 0

RE: Google Protocol Buffers - Decoding VarInt
I believe the statement:
e |= 7;
should be
e += 7;

The reason for this is that in the "for" loop the statement:
*varint |= (*pbytes & 0x7F) << e;
is supposed to place each of the extracted 7 bits from each encoded byte higher and higher in varint. With e |= 7 this will work for the first two pbytes through the loop since e=0 the first time and e=7 the second time, but for the third and subsequent pbytes e will be stuck at 7 (so varint will only get filled up to 14 bits) since sequential e |=7 keep e at 7 when instead e should be incrementing by 7 each time in the loop.

As a reference to similar code that uses this kind of approach, there is the following:
that uses the following macro for decoding a varint:

#define CHNM_DCDE_VARINT(shifter, buf, val, idx) \
shifter = 0, val = 0; \
do \
{ \
idx++; \
val |= (buf[idx] & 0x7F) << shifter; \
shifter += 7; \
} \
while (buf[idx] & 0x80); \

where the variable "shifter" is used in the same way as the variable "e". The original macro in this thread is more efficient than the above macro since the original with its "for" loop and "if" check avoid the extra shift increment (or or'ing in the original code).
This post was last modified: 11-28-2015, 07:41 PM by Richard. Edit Reason: changed "of" to "from"

12-01-2015, 09:19 PM #3
Posts: 9,733 Threads:1,026 Joined: Jun 2011 Reputation: 76

RE: Google Protocol Buffers - Decoding VarInt
That is correct. With my initial intention e |= 7 should be changed to e += 7. I was tired when you initially PM'ed me about this but you are right.

edit: That link was cool to read. I was unaware that anybody else actually did this.
Quote:Finally, CHNM_DCDE_VARINT() is declared as a macro in order to ensure the code is placed inline and obviate the overhead a function call would require

That was my thought as well. Otherwise I would have used a function. I guess my structure was just more thought out. I try to avoid loops with redundant arithmetic operations as much as possible which is why I wrote it that way originally.

Well done!
This post was last modified: 12-01-2015, 09:30 PM by AceInfinity.

Microsoft MVP .NET Programming - (2012 - Present)
®Crestron DMC-T Certified Automation Programmer

Development Site: aceinfinity.net

 ▲ ▲

12-02-2015, 04:42 AM #4
Global Moderators
Posts: 4,863 Threads:495 Joined: Jun 2011 Reputation: 67

RE: Google Protocol Buffers - Decoding VarInt
Nice work Ace, again your C/C++ level is so far above mine. I haven't touched this stuff in a few years. I'd like to write lower level stuff, but again.. time is my enemy.

Forum Jump:

Possibly Related Threads...
Thread Author Replies Views Last Post
   Google CodeJam 2010 Round 1C - Problem A. Rope Intranet AceInfinity 1 2,949 03-24-2014, 12:00 AM
Last Post: AceInfinity
   C Style String Encoding & Decoding AceInfinity 0 2,180 05-07-2013, 12:10 AM
Last Post: AceInfinity

Users browsing this thread: 1 Guest(s)