Here's how to get the same result from both:
byte[] mm3_le = Hashing.murmur3_128().hashString("abc", UTF_8).asBytes();
byte[] mm3_be = Bytes.toArray(Lists.reverse(Bytes.asList(mm3_le)));
assertEquals("79267961763742113019008347020647561319",
new BigInteger(mm3_be).toString());
If anyone is interested in the reverse answer, converting the python output to the Java output:
import mmh3
import string
char_array = '0123456789abcdef'
mumrmur = mmh3.hash_bytes('abc')
result = [f '{string.hexdigits[(char >> 4) & 0xf]}{string.hexdigits[char & 0xf]}'
for char in mumrmur
]
print(''.join(result))
I've tried three version of murmurhash in java(jedis and guava), go and python. The result of java(guava),go and python version output same hash code but different with java(jedis). All the murmurhash code are shown as follow. I'm confused about the result. I've seen this issue and use Long.reverseBytes in java, and still different with others. So what should I do to make all the output of murmurhash keep same. Thanks~,The difference is due to different versions of murmur. Jedis uses, up to this day, Murmur2 (see source code and documentation), while the other above mentioned libraries use Murmur3.,Besides looking at the comments/code, I also verified this by using the Murmur2 reference implementation. Using the same seed and key leads to the exact same results as your Jedis example.,java gradle compile group: 'redis.clients', name: 'jedis', version: '3.1.0'
java gradle compile group: 'redis.clients', name: 'jedis', version: '3.1.0'
import redis.clients.jedis.util.MurmurHash;
MurmurHash murmurhash = new MurmurHash();
long h = murmurhash.hash("foo");
System.out.println(h);
System.out.println(Long.reverseBytes(h));
output:
-7063922479176959649 6897758107479832477
2. golang version
import "github.com/spaolacci/murmur3"
foo: = int64(murmur3.Sum64WithSeed([] byte("foo"), 0x1234ABCD))
fmt.Println(foo)
java gradle compile group: 'com.google.guava', name: 'guava', version: '28.0-jre'
import com.google.common.hash.Hashing
long foo = Hashing.murmur3_128(0x1234ABCD).hashString("foo", charset.forName("UTF-8")).asLong();
System.out.println(foo);
Code Fragment:
const char * key = "foo";
uint64_t result = MurmurHash64A(key, std::strlen(key), 0x1234ABCD);
std::cout << " result (unsigned): " << result << std::endl;
std::cout << " result (signed): " << (long) result << std::endl;
std::cout << "reversed byte order: " << __builtin_bswap64(result) << std::endl;
Output:
result(unsigned): 11382821594532591967
result(signed): -7063922479176959649
reversed byte order: 6897758107479832477
MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup.[1][2][3] It was created by Austin Appleby in 2008[4] and is currently hosted on GitHub along with its test suite named 'SMHasher'. It also exists in a number of variants,[5] all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop. ,Unlike cryptographic hash functions, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes. ,MurmurHash2[8] yields a 32-bit or 64-bit value. It came in multiple variants, including some that allowed incremental hashing and aligned or neutral versions. ,The current version is MurmurHash3,[6][7] which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher—a hash function test suite.
Algorithm[edit]
algorithm Murmur3_32 is
// Note: In this version, all arithmetic is performed with unsigned 32-bit integers.
// In the case of overflow, the result is reduced modulo 232.
input: key, len, seed
c1← 0xcc9e2d51
c2← 0x1b873593
r1← 15
r2← 13
m← 5
n← 0xe6546b64
hash← seed
for each fourByteChunk of key do
k← fourByteChunk
k← k× c1
k← k ROL r1
k← k× c2
hash← hash XOR k
hash← hash ROL r2
hash←(hash× m) + n
with any remainingBytesInKey do
remainingBytes← SwapToLittleEndian(remainingBytesInKey)
// Note: Endian swapping is only necessary on big-endian machines.
// The purpose is to place the meaningful digits towards the low end of the value,
// so that these digits have the greatest potential to affect the low range digits
// in the subsequent multiplication. Consider that locating the meaningful digits
// in the high range would produce a greater effect upon the high digits of the
// multiplication, and notably, that such high digits are likely to be discarded
// by the modulo arithmetic under overflow. We don't want that.
remainingBytes← remainingBytes× c1
remainingBytes← remainingBytes ROL r1
remainingBytes← remainingBytes× c2
hash← hash XOR remainingBytes
hash← hash XOR len
hash← hash XOR(hash >> 16)
hash← hash× 0x85ebca6b
hash← hash XOR(hash >> 13)
hash← hash× 0xc2b2ae35
hash← hash XOR(hash >> 16)
static inline uint32_t murmur_32_scramble(uint32_t k) {
k *= 0xcc9e2d51;
k = (k << 15) | (k >> 17);
k *= 0x1b873593;
return k;
}
uint32_t murmur3_32(const uint8_t * key, size_t len, uint32_t seed) {
uint32_t h = seed;
uint32_t k;
/* Read in groups of 4. */
for (size_t i = len >> 2; i; i--) {
// Here is a source of differing results across endiannesses.
// A swap here has no effects on hash properties though.
memcpy( & k, key, sizeof(uint32_t));
key += sizeof(uint32_t);
h ^= murmur_32_scramble(k);
h = (h << 13) | (h >> 19);
h = h * 5 + 0xe6546b64;
}
/* Read the rest. */
k = 0;
for (size_t i = len & 3; i; i--) {
k <<= 8;
k |= key[i - 1];
}
// A swap is *not* necessary here because the preceding loop already
// places the low bytes in the low places according to whatever endianness
// we use. Swaps only apply when the memory is copied in a chunk.
h ^= murmur_32_scramble(k);
/* Finalize. */
h ^= len;
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return h;
}
static inline uint32_t murmur_32_scramble(uint32_t k) {
k *= 0xcc9e2d51;
k = (k << 15) | (k >> 17);
k *= 0x1b873593;
return k;
}
uint32_t murmur3_32(const uint8_t * key, size_t len, uint32_t seed) {
uint32_t h = seed;
uint32_t k;
/* Read in groups of 4. */
for (size_t i = len >> 2; i; i--) {
// Here is a source of differing results across endiannesses.
// A swap here has no effects on hash properties though.
memcpy( & k, key, sizeof(uint32_t));
key += sizeof(uint32_t);
h ^= murmur_32_scramble(k);
h = (h << 13) | (h >> 19);
h = h * 5 + 0xe6546b64;
}
/* Read the rest. */
k = 0;
for (size_t i = len & 3; i; i--) {
k <<= 8;
k |= key[i - 1];
}
// A swap is *not* necessary here because the preceding loop already
// places the low bytes in the low places according to whatever endianness
// we use. Swaps only apply when the memory is copied in a chunk.
h ^= murmur_32_scramble(k);
/* Finalize. */
h ^= len;
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return h;
}
Python wrapper for MurmurHash (MurmurHash3), a set of fast and robust hash functions.,mmh3 is a Python wrapper for MurmurHash (MurmurHash3), a set of fast and robust non-cryptographic hash functions invented by Austin Appleby.,The results of hash64 and hash_bytes remain unchanged. Austin Appleby, the author of Murmurhash, ensured this revision was the final modification to MurmurHash3's results and any future changes would be to improve performance only.,Beware that hash64 returns two values, because it uses the 128-bit version of MurmurHash3 as its backend.
Install:
pip install mmh3 # for macOS, use "pip3 install mmh3" and python3
Quickstart:
>>> import mmh3 >>> mmh3.hash("foo") # returns a 32 - bit signed int - 156908512 >>> mmh3.hash("foo", 42) # uses 42 as a seed - 1322301282 >>> mmh3.hash("foo", signed = False) # returns a 32 - bit unsigned int 4138058784
Other functions:
>>> mmh3.hash64("foo") # two 64 bit signed ints(by using the 128 - bit algorithm as its backend) (-2129773440516405919, 9128664383759220103) >>> mmh3.hash64("foo", signed = False) # two 64 bit unsigned ints(16316970633193145697, 9128664383759220103) >>> mmh3.hash128("foo", 42) # 128 bit unsigned int 215966891540331383248189432718888555506 >>> mmh3.hash128("foo", 42, signed = True) # 128 bit signed int - 124315475380607080215185174712879655950 >>> mmh3.hash_bytes("foo") # 128 bit value as bytes 'aE\xf5\x01W\x86q\xe2\x87}\xba+\xe4\x87\xaf~' >>> import numpy as np >>> a = np.zeros(2 ** 32, dtype = np.int8) >>> mmh3.hash_bytes(a) b 'V\x8f}\xad\x8eNM\xa84\x07FU\x9c\xc4\xcc\x8e'
hash64
, hash128
, and hash_bytes
have the third argument for architecture optimization. Use True for x64 and False for x86 (default: True):
>>> mmh3.hash64("foo", 42, True)
(-840311307571801102, -6739155424061121879)
Beware that due to this revision, the result of 32-bit version of 2.1 is NOT the same as that of 2.0. E.g.,:
>>> mmh3.hash("foo") # in mmh3 2.0 - 292180858 >>> mmh3.hash("foo") # in mmh3 2.1 - 156908512
Last Updated : 28 Jun, 2022
We need k number of hash functions to calculate the hashes for a given input. When we want to add an item in the filter, the bits at k indices h1(x), h2(x), … hk(x) are set, where indices are calculated using hash functions.
Example – Suppose we want to enter “geeks” in the filter, we are using 3 hash functions and a bit array of length 10, all set to 0 initially. First we’ll calculate the hashes as follows:
h1(“geeks”) % 10 = 1 h2(“geeks”) % 10 = 4 h3(“geeks”) % 10 = 7
Again we want to enter “nerd”, similarly, we’ll calculate hashes
h1(“nerd”) % 10 = 3 h2(“nerd”) % 10 = 5 h3(“nerd”) % 10 = 4
The question is why we said “probably present”, why this uncertainty. Let’s understand this with an example. Suppose we want to check whether “cat” is present or not. We’ll calculate hashes using h1, h2 and h3
h1(“cat”) % 10 = 1 h2(“cat”) % 10 = 3 h3(“cat”) % 10 = 7
abound inserted abounds inserted abundance inserted abundant inserted accessible inserted bloom inserted blossom inserted bolster inserted bonny inserted bonus inserted bonuses inserted coherent inserted cohesive inserted colorful inserted comely inserted comfort inserted gems inserted generosity inserted generous inserted generously inserted genial inserted bluff is Probably already present cheater inserted hate inserted war is Probably already present humanity inserted racism inserted hurt inserted nuke is Probably already present gloomy is Probably already present facebook inserted geeksforgeeks inserted twitter inserted
abound inserted abounds inserted abundance inserted abundant inserted accessible inserted bloom inserted blossom inserted bolster inserted bonny inserted bonus inserted bonuses inserted coherent inserted cohesive inserted colorful inserted comely inserted comfort inserted gems inserted generosity inserted generous inserted generously inserted genial inserted bluff is Probably already present cheater inserted hate inserted war is Probably already present humanity inserted racism inserted hurt inserted nuke is Probably already present gloomy is Probably already present facebook inserted geeksforgeeks inserted twitter inserted