View on GitHub

Quorten Blog 1

First blog for all Quorten's blog-like writings

Long integer arithmetic: When you compute with integers larger than a machine word, you typically need to use assembly language so that you can access the carry bits generated by the machine arithmetic instructions. But, RISC-V does not a have a carry bit? What, that changes everything. Okay, okay, if we do not need to make assumptions that the underlying hardware generates carry bits, then we might as well just do it all in C.

So, how do we generate a carry bit in software? All I’ve heard thus far is “it’s easy,” so let me try my own derivation, then I’ll see what I can find on the Internet more specifically.

Here is your function table:

A = operand A
B = operand B
R = result
C = carry flag
V = overflow flag

A B R C V
0 0 0 0 0
0 0 1 0 1
0 1 0 1 0
0 1 1 0 0
1 0 0 1 0
1 0 1 0 0
1 1 0 1 1
1 1 1 1 0

V = ~(a ^ b) ^ r
C = ((a | b) & ~r) | (a & b)

Okay, overflow looks quite easy to compute, but I don’t know about carry. Let’s try to understand this better.

C = (a & ~r) | (b & ~r) | (a & b & r)
C = (a & ~r) | (b & ~r) | (a & b & r)

Okay, let’s take a step backwards. How is r fundamentally computed?

r = a ^ b ^ carry-in

So:

n = carry-in
n = a ^ b ^ r

C = (a & b) | (a & n) | (b & n)
C = (a & b) | ((a | b) & n)

C = (a & b) | ((a | b) & (a ^ b ^ r))

a ^ b = (a | b) & ~(a & b)
a ^ b = (a | b) & (~a | ~b)
a ^ b = ~(~a & ~b) & ~(a & b)
a ^ b = ~((~a & ~b) | (a & b))
a ^ b = ~a ^ ~b

(a | b) & (a ^ b ^ r)
= ((a | b) & (a ^ b)) ^ ((a | b) & r)
= (a ^ b) ^ ((a | b) & r)

Alright, let's think about this in two cases.  r = 0:
=> (a ^ b)

r = 1:
=> (a ^ b) ^ (a | b) = a & b

Okay, so after looking at that, all I can say is my final simplifications come from looking directly at the truth table rather than manipulating the Boolean equations.

One final trick for simplifying the computation in software. Rather than needing to compute an actual logical OR, you can store the bits in a register. If the register is “not zero,” then one of the bits is set, so that’s the same as a logical OR.

So, what does an add with carry look like in C?

struct ADC32Op_tag
{
  int32 a;
  uint8 c;
};
typedef struct ADC32Op_tag ADC32Op;

/* N.B.: If we can't shift right by one bit because the CPU doesn't
   have the instruction, we can subtract by one and mask to obtain the
   same result, so long as the bit is not zero.  */
ADC32Op *adc(ADC32Op *a, ADC32Op *b)
{
  a->c = (a->a & b->a) >> 1;
  a->c |= (a->a | b->a) & 0x80000000;
  a->a += b->a + b->c;
  a->c &= ~(a->a & 0x80000000);
  a->c = (a->c != 0);
  return a;
}

Although it’s only bit-wise operations, it still looks pretty complicated in terms of the number of steps and the fact that we need a whole nother register to compute the carry. And the fact that we need to be able to return two machine integer results, if you’re using an old C compiler implementing the older C standard you cannot pass data structures as first-class objects.

And whoah, wait, what did I also just mention there? A software equivalent for shifting right when your CPU doesn’t have the instruction. Problem solved in a restricted context: if you only have a single bit in a known location to shift right, do this.

/* Shift the most significant bit right by one in software when CPU
   shift right instructions are lacking.  */
int32 shr32_msb_1(int32 a)
{
  int32 r = a & 0x80000000;
  r = (r - (r != 0)) & 0x40000000;
  return r;
}

/* Another method: */
int32 shr32_msb_1(int32 a)
{
  int32 r = a & 0x80000000;
  /* Use this trick to generate a register-wide bit mask:
     mask = ((int)logic << 31) >> 31;
     mask = ((int)!logic) - 1;
  */
  int32 mask = (r == 0) - 1;
  r -= 0x40000000 & mask;
  return r;
}

With that primitive, we just need to team that together multiple times to accumulate the result of shifting an integer right by one bit. Alternatively, since we can only test in a single position, we could just use conditionals and constants to return the result.

/* Shift the most significant bit right by one in software when CPU
   shift right instructions are lacking.  */
int32 shr32_msb_1(int32 a)
{
  if ((a & 0x80000000) == 0)
    return 0;
  return 0x40000000;
}
/* Table-based method: */
int32 shr32_msb_1(int32 a)
{
  static int32 results[2] = { 0, 0x40000000 };
  return results[(a & 0x80000000) != 0];
}

Let’s improve on the table method, see what we can get.

int32 shr32_msb_1(int32 a)
{
  static int32 results[2] = { 0, 0x40000000 };
  return results[(a & (results[1] << 1)) != 0];
}

/* Shift `a` right by `n` bits.  */
int32 shr32(int32 a, uint8 n)
{
  static int32 results[32] =
    { 0x80000000, 0x40000000, 0x20000000, 0x10000000,
      0x08000000, 0x04000000, 0x02000000, 0x01000000,
      0x00800000, 0x00400000, 0x00200000, 0x00100000,
      0x00080000, 0x00040000, 0x00020000, 0x00010000,
      0x00008000, 0x00004000, 0x00002000, 0x00001000,
      0x00000800, 0x00000400, 0x00000200, 0x00000100,
      0x00000080, 0x00000040, 0x00000020, 0x00000010,
      0x00000008, 0x00000004, 0x00000002, 0x00000001,
    };
  int32 r = 0;
  uint8 i, i_n;
  for (i = 0, i_n = n; i_n < 32; i++, i_n++) {
    /* Use this trick to generate a register-wide bit mask:
       mask = ((int)logic << 31) >> 31;
       mask = ((int)!logic) - 1;
    */
    int32 test_mask = ((a & results[i]) == 0) - 1;
    r |= results[i_n] & test_mask;
  }
  /* Most significant bit is set from carry flag, if applicable.  */
  /* If applicable, final bit (a & 1) goes in carry flag when shifting
     right by one.  */
  return r;
}

As you can see, once you start with the table method, it is really easy to generalize it to shift right by n bits, and even shift left by n bits. It is, of course, a lot slower than shifting left by one bit, but at least it can operate on wide integers without an unwieldy-sized table.

Now, for my Internet search on computing the carry bit in software? What did I find? The results of the searching process were dismal. It turns out my manual derivation was just as well the fastest means for me to answer this question.

20200413/DuckDuckGo compute carry bit in software
20200413/DuckDuckGo “compute” carry bit in software
20200413/https://en.wikipedia.org/wiki/Carry_flag
20200413/DuckDuckGo risc-v carry bit
20200413/https://en.wikipedia.org/wiki/RISC-V

Somewhere on Quora there was a note that RISC-V doesn’t have an instruction to count leading zeros, making a lot of math code more difficult. This article deosn’t seem to mention that, but I’m putting it here since it’s an article from Quora nonetheless.

20200413/https://www.quora.com/What-is-RISC-V?share=1

Not only that, but RISC-V also lacks bit scanning instructions? What the heck, I’m forced to use my software alternative for bit scanning even on modern RISC-V CPUs? Well, looks like it. It doesn’t really make sense either, since modern spell checking dictionaries pretty much require bit scanning instructions for optimal performance, and that’s pretty well a mass market use of that feature. Were it specific to advanced mathematical code, I see how it could be omitted, advanced math needs to do many things that aren’t typical in operating system and commodity application software computations.

Important! This Wikipedia article on “find first set” lists lots of good information on support for this function in machine instructions for different architectures. Also, the POSIX library has the function ffs.

20200415/https://en.wikipedia.org/wiki/Find_first_set