Proposal to extend binary operators in MySQL

In order to make it easier to work with data stored as binary strings (BINARY/VARBINARY) we are considering extending the &,|,^,~,<<,>> operators, to accept any-size binary strings and return binary string data as response. This can be useful for several complex data types not covered by the basic SQL data types (e.g. working with IPV6 addresses, manipulating UUID, etc).

Motivation

Let’s say we’re interested in getting all the networks that contain the given IP address. With ipv4 the common practice is to store the IP addresses as INT and execute:

SELECT inet_ntoa(network) AS network, inet_ntoa(netmask) AS netmask FROM network_table WHERE (inet_aton('192.168.0.30') & netmask) = network;

At the moment you are not able to do the same with ipv6 because inet6_aton('2001:0db8:85a3:0000:0000:8a2e:0370:7334') & netmask converts both operands from VARBINARY to BIGINT resulting in data truncation and when the & operation gets executed, the result is incorrect. But, if the & operator could have worked directly on BINARY/VARBINARY data, this would have been possible:

SELECT inet6_ntoa(network) AS network, inet6_ntoa(netmask) AS netmask FROM network_table WHERE (inet6_aton('2001:0db8:85a3:0000:0000:8a2e:0370:7334') & netmask) = network;

The SQL standard does not define bitwise operations over any-size binary string data but it does define binary string comparison:

All binary string values are comparable. When binary large object string values are compared, they shall have exactly the same length (in octets) to be considered equal. Binary large object string values can be compared only for equality. For binary string values other than binary large object string values, it is implementation-defined whether trailing X’00’s are considered significant when comparing two binary string values that are otherwise equivalent.

Thus, the standard allows binary strings to be zero-padded in the least significant part for comparisons (right-side). If you’re interpreting binary data as a hexadecimal unsigned integer, you would expect the operand with smaller size to be zero-padded to the left side. So an easy approach to avoid confusion would be to allow the operators(^,&,|) to only work with same size operands, thus avoiding any confusion over whether padding occurs in most or least significant part.

Another aspect to be mentioned is that the old behavior with INT arguments would be preserved. Example SELECT 23 & 4 would still return a numeric BIGINT response: 4.

MySQL has two ways of representing hexadecimal string literals: x'val' and 0xval (where val contains hexadecimal digits 0-9, a-f, A-F). The difference between the two is that the first is SQL standard and has a constraint: the number of hexadecimal digits must be even, the second version is not standard and does not require an even number of characters (it will be zero-padded on the left side in case of even number of characters). But there is one issue, in numeric contexts, hexadecimal values act like integers (64-bit precision) and in string contexts, they act like binary strings. So currently when executing SELECT x'10000001' & x'00000001' the operands get converted from VARBINARY to BIGINT(int64), with loss of any parts beyond 64 bits, and this returns BIGINT; with our change, this would return BINARY(4), breaking existing applications. That’s something that can be solved with a new sql_mode named BINARY_BIT_OPS_YIELD_BINARY, off by default for backward-compatibility; if turned on, bit operation on binary strings will yield BINARY result (or an error if operators are not of the same size).

The alternative would be to introduce new functions. Example bin_and(x,y), bin_or(x,y), bin_xor(x,y) that can take two binary string arguments of same length and return binary string. We are considering also other names like binary_and(x,y), binary_or(x,y), binary_xor(x,y).

Pros:

  • No compatibility issues
  • Bitwise operators (&, |, ^) would remain operators that yield integer results (preserving existing functionality), whereas the new functions would yield binary string results

Cons:

  • It’s longer to type bin_and()/binary_and() than &
  • This creates new syntax for users to learn
  • bit_and(x) already exists, it’s an aggregate function; we fear this could confuse users, though, fortunately both functions do a similar thing (they AND bits).

Please let us know in a comment below what are your opinions on this:

  • Is it a good idea to implement bitwise operations for binary strings?
  • Is the BINARY_BIT_OPS_YIELD_BINARY sql_mode necessary?
  • Can you think of other use cases where this can be useful (UUID handling is one such case)?

Thanks to Catalin Besleaga on the optimizer team for ghost writing this post.