1193: Externalities

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Wed Apr 03, 2013 5:36 pm UTC

Not a forum regular, but I wrote one of the multi-threaded C brute forcers. You can find it on github under my username 'cemeyer', repository 'xkcd-skein-brute' (the forum is flagging direct links to it as spam? wtf).

With -O3 on my local Core i7 it can get about 1 million hashes/sec on a single core (with TurboBoost; I think it's more like 700k/s per core if all cores are running). I could never get the x64 assembly skein implementation working, so it just uses the "Optimized 64-bit" version. (The asm implementation was more like 1.3 Mhash/s single-core, but it generated incorrect hashes for me. Go figure.)

It starts a compile-time constant number of threads, each of which generates a random starting string based on /dev/urandom and the a-zA-Z0-9 character set, iterates for a few million hashes by incrementing the string (base 62, ASCII order), and then after a few million hashes jumps to a new random string. By jumping around the random space, multiple instances can divide the keyspace with low probability of duplicating work. By not jumping very often, we avoid the overhead of frequently querying the PRNG. By only using A-Za-z0-9, we avoid any url-encoded characters.

When a thread finds a result it thinks is better than the current best, it takes the global lock, replaces current best if it is better, and updates its local view of the best score (so that threads don't contest the global lock very often). If it beat the global best, it wakes up the CURL thread.

The CURL thread sleeps on the global best, and is notified by worker threads when they've beaten the best score. When it wakes and sees a new best score, it automatically posts the new best string to XKCD (uw.edu!).

This doesn't require any coordination of workers, and would have been suitable for an internet-connected compute farm (e.g. ec2), but I never deployed it beyond my local desktop (4-core i7 w/ HT), where it found a 398 scoring string in a few hours.

User avatar
htom
Posts: 357
Joined: Wed Mar 27, 2013 6:13 am UTC
Location: Twin Cities, MN, USA

Re: 1193: Externalities

Postby htom » Wed Apr 03, 2013 5:40 pm UTC

Did someone try "four common word password"?
cmyk wrote:How can I be so riveted to the Internet equivalent of watching grass grow?

Questions? Spoilers! TimeWiki, geekwagon, The Book of Aubron, ExplainXKCD
mscha wrote:We can ignore reality; in fact, we'll have to, since only Randall knows what it is.

ssam
Posts: 2
Joined: Wed Apr 03, 2013 5:18 pm UTC

Re: 1193: Externalities

Postby ssam » Wed Apr 03, 2013 5:41 pm UTC

GCC has a handy builtin for counting the number of 1s in a byte

Code: Select all

for (int i=0; i< 128; i++){
    bit_err += __builtin_popcount ( hashout[i] ^ hashtarget[i]);
}

might save some nanoseconds on the check, though i image most of the time is taken up with the hashing.

ssam
Posts: 2
Joined: Wed Apr 03, 2013 5:18 pm UTC

Re: 1193: Externalities

Postby ssam » Wed Apr 03, 2013 5:48 pm UTC

personally with about 20 cpu cores, my best was a 398. it is only 6 mixed case letters, so i think it is uncharacteristically easy to find. this got my uni to 3rd place for a short while, but anyone able to check a few 1e10 hashes could find it, so 398 was a common score for a while. they folks who beat that must have put some serious cpu time in.

I doubt that the original was a random string. it would be boring to set a competition that was just 'who can get the most cpu time'. i'm sure that the use of an obscure hash, was so that a smart programmer could get an advantage by optimising the reference algorithm or porting it to a gpu/xeon phi/parallella, or actually cracking the hash algorithm (likely impossible). but i suspect that he would also want to reward more creative solutions. my suspicion is that the plaintext is something meaningful. a joke, webaddress, physical constant, rick astley lyric.

of course the quickest easiest way to find the original is with some drugs and a $5 wrench. so i can only assume that randall has some bolts that need tightening and is too lazy to walk to the hardware store.

Amarantamin
Posts: 14
Joined: Mon Apr 01, 2013 9:27 am UTC

Re: 1193: Externalities

Postby Amarantamin » Wed Apr 03, 2013 5:49 pm UTC

I can't seem to find the page to test them anymore...

"play dog with gods"... =D

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Wed Apr 03, 2013 6:00 pm UTC

ssam wrote:GCC has a handy builtin for counting the number of 1s in a byte

Code: Select all

for (int i=0; i< 128; i++){
    bit_err += __builtin_popcount ( hashout[i] ^ hashtarget[i]);
}

might save some nanoseconds on the check, though i image most of the time is taken up with the hashing.


Right, I did that, except in chunks of 8 bytes at a time with:

Code: Select all

__builtin_popcountll()

=)

keithjgrant
Posts: 6
Joined: Mon Apr 01, 2013 2:58 pm UTC

Re: 1193: Externalities

Postby keithjgrant » Wed Apr 03, 2013 6:03 pm UTC

cm_ wrote:Not a forum regular, but I wrote one of the multi-threaded C brute forcers. You can find it on github under my username 'cemeyer', repository 'xkcd-skein-brute' (the forum is flagging direct links to it as spam? wtf).

With -O3 on my local Core i7 it can get about 1 million hashes/sec on a single core (with TurboBoost; I think it's more like 700k/s per core if all cores are running). I could never get the x64 assembly skein implementation working, so it just uses the "Optimized 64-bit" version. (The asm implementation was more like 1.3 Mhash/s single-core, but it generated incorrect hashes for me. Go figure.)

It starts a compile-time constant number of threads, each of which generates a random starting string based on /dev/urandom and the a-zA-Z0-9 character set, iterates for a few million hashes by incrementing the string (base 62, ASCII order), and then after a few million hashes jumps to a new random string. By jumping around the random space, multiple instances can divide the keyspace with low probability of duplicating work. By not jumping very often, we avoid the overhead of frequently querying the PRNG. By only using A-Za-z0-9, we avoid any url-encoded characters.

When a thread finds a result it thinks is better than the current best, it takes the global lock, replaces current best if it is better, and updates its local view of the best score (so that threads don't contest the global lock very often). If it beat the global best, it wakes up the CURL thread.

The CURL thread sleeps on the global best, and is notified by worker threads when they've beaten the best score. When it wakes and sees a new best score, it automatically posts the new best string to XKCD (uw.edu!).

This doesn't require any coordination of workers, and would have been suitable for an internet-connected compute farm (e.g. ec2), but I never deployed it beyond my local desktop (4-core i7 w/ HT), where it found a 398 scoring string in a few hours.


I'm no expert in hashes and cryptography, but this seems like a solid approach--a good compromise between randomness and the faster computation of simple iteration. Also, my wife is a Husky, so bonus points for that :)

I was just doing what I could in python on a macbook pro... I had a couple processes randomly generating strings and a couple systematically iterating alphanum permutations of various lengths. The best I got overnight was only 410 from a process using the latter method on strings of length 8 (0000D8Uu)... no telling how far through the permutations these ones got before I killed them this morning.

rmsgrey
Posts: 3483
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1193: Externalities

Postby rmsgrey » Wed Apr 03, 2013 6:30 pm UTC

ssam wrote:of course the quickest easiest way to find the original is with some drugs and a $5 wrench. so i can only assume that randall has some bolts that need tightening and is too lazy to walk to the hardware store.


That does assume that it's a recorded string, and that Randall has access to the record...

brandnamebob
Posts: 2
Joined: Wed Apr 03, 2013 6:15 pm UTC

Re: 1193: Externalities

Postby brandnamebob » Wed Apr 03, 2013 6:33 pm UTC

cm_ wrote:Not a forum regular, but I wrote one of the multi-threaded C brute forcers. You can find it on github under my username 'cemeyer', repository 'xkcd-skein-brute' (the forum is flagging direct links to it as spam? wtf).

With -O3 on my local Core i7 it can get about 1 million hashes/sec on a single core (with TurboBoost; I think it's more like 700k/s per core if all cores are running). I could never get the x64 assembly skein implementation working, so it just uses the "Optimized 64-bit" version. (The asm implementation was more like 1.3 Mhash/s single-core, but it generated incorrect hashes for me. Go figure.)

It starts a compile-time constant number of threads, each of which generates a random starting string based on /dev/urandom and the a-zA-Z0-9 character set, iterates for a few million hashes by incrementing the string (base 62, ASCII order), and then after a few million hashes jumps to a new random string. By jumping around the random space, multiple instances can divide the keyspace with low probability of duplicating work. By not jumping very often, we avoid the overhead of frequently querying the PRNG. By only using A-Za-z0-9, we avoid any url-encoded characters.

When a thread finds a result it thinks is better than the current best, it takes the global lock, replaces current best if it is better, and updates its local view of the best score (so that threads don't contest the global lock very often). If it beat the global best, it wakes up the CURL thread.

The CURL thread sleeps on the global best, and is notified by worker threads when they've beaten the best score. When it wakes and sees a new best score, it automatically posts the new best string to XKCD (uw.edu!).

This doesn't require any coordination of workers, and would have been suitable for an internet-connected compute farm (e.g. ec2), but I never deployed it beyond my local desktop (4-core i7 w/ HT), where it found a 398 scoring string in a few hours.


I wanted to thank you! I found your branch on github and modified the code a bit. I found that your block_bench program was getting similar numbers for me. I was getting around 1.1M hashes per second per core on my 4 processor quad-core Opteron 8360SE. But, when I added the timing calculations to the main program I was only getting 65K hashes per second per core. After a bunch of fiddling I found that the issue seemed to be the locking between the threads. So I modifed the program to be singlethreaded and pick a random starting place and then increment the string from there. I made the random starting string long enough that colssions would not be an issue, so I ran 16 different processes. With that I was getting the 1.1M hashes per second. Unfortunately I only got a score of 404. Although I did get a 403 and found the 398 once I started one thread with the 'A' starting place, but those were found while I was out, and I didn't bother to use an automatic submitter, so I was too late with those.

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Wed Apr 03, 2013 7:01 pm UTC

brandnamebob wrote:I wanted to thank you! I found your branch on github and modified the code a bit. I found that your block_bench program was getting similar numbers for me. I was getting around 1.1M hashes per second per core on my 4 processor quad-core Opteron 8360SE. But, when I added the timing calculations to the main program I was only getting 65K hashes per second per core. After a bunch of fiddling I found that the issue seemed to be the locking between the threads. So I modifed the program to be singlethreaded and pick a random starting place and then increment the string from there. I made the random starting string long enough that colssions would not be an issue, so I ran 16 different processes. With that I was getting the 1.1M hashes per second. Unfortunately I only got a score of 404. Although I did get a 403 and found the 398 once I started one thread with the 'A' starting place, but those were found while I was out, and I didn't bother to use an automatic submitter, so I was too late with those.

No problem! I haven't measured it, but I'm really surprised you found locking an issue. The workers should only take the lock if they think they've beaten the global score, which they keep track of locally... I should look into this =).

Yeah, I just looked into it... the lock gets taken a really insignificant number of times after "some time has passed." E.g. after we've found a 421-bit score string:

Code: Select all

Found 'KOXEI9wcSxS' with distance 421
lock taken 109 times (0.535 locks/sec)
lock taken 110 times (0.417 locks/sec)
lock taken 111 times (0.412 locks/sec)
lock taken 112 times (0.385 locks/sec)
lock taken 113 times (0.261 locks/sec)
Found 'Q1kDrQuEOf6' with distance 414
lock taken 113 times (0.261 locks/sec)

So that really shouldn't be impacting performance at all. Maybe someone else's fork is locking more? Also, it's possible you forked an earlier copy of the program, the latest has lots of improvements over the first cut =).

burpen
Posts: 13
Joined: Wed Apr 03, 2013 12:20 am UTC

Re: 1193: Externalities

Postby burpen » Wed Apr 03, 2013 7:39 pm UTC

htom wrote:Did someone try "four common word password"?

Code: Select all

hash: 71f8bd2fb10b55de688c4c758d8a25ea6a14608a52c3d834bf1b57aabbf83f57cbbffc4a6a55365bc1e3f8a7c5c792f3fe2a7fe089a8f961f94045f9767e3713d04d9a12dfe074cfb7c9d9b2a3ded47dbe2a0c9a0666919e880ef6406aa365c10e9691113c61eba7d4e9b0a89b8c066db5da806d2a58d53a8bfc681590ed2882
distance: 508

Amarantamin wrote:I can't seem to find the page to test them anymore...

"play dog with gods"... =D

Code: Select all

hash: 091c5c4d7341554b13afbc20cbe958749a147865757ca0ac3e0955dfb7c687729a8233c61d71d107597ea34d125fa0371b539bc854088f604af5479a2d8c7f18c239347e7499cbfffa8ea69689e6d5f1420aa1114cfeb0eb00f5c1fba5e3add2b9f2562c437c4c7772e0f85d62cf8403a0d6fcc134331ce8c4a4e29c9921e90e
distance: 509


edit: formatting

christandiono
Posts: 2
Joined: Wed Apr 03, 2013 6:38 am UTC

Re: 1193: Externalities

Postby christandiono » Wed Apr 03, 2013 8:04 pm UTC

slicedtoad wrote:So how hard would it be to implement a gpu-compute program for solving the hash?
I've never touched GPGPU programming before and my understanding of cryptography is a little shaky but would it be as simple as assigning the gpu a bunch of random strings to hash, checking them, then permeating (right word?) them a small amount before hashing again and repeating?

Also, if anyone has a tutorial or something for creating a similar GPGPU program, could you link it? Preferably in C++ and for AMD's architecture (since I don't have any CUDA cards to play with atm).


I didn't do it, but someone working with me ported the reference implementation to CUDA. It was partly just slapping __device__ on all the functions, then writing some supporting code to make it actually run. We got about 7M hashes/second on a GTX 560 Ti, 6M hashes/second on a GTX 660, 8M hashes/second on a GTX 690, 3M hashes/second on a Tesla C1060, 7M hashes/second on a Tesla M2050 (EC2). We tried to get an OpenCL version running but we gave up.

User avatar
Latent22
Posts: 622
Joined: Mon Apr 01, 2013 11:57 pm UTC
Location: NZ

Re: 1193: Externalities

Postby Latent22 » Wed Apr 03, 2013 9:17 pm UTC

"xkcd.com","0"
"cmu.edu","384"
"uiuc.edu","386"
"uic.edu","388"
"ox.ac.uk","389"
"kit.edu","389"
"mit.edu","390"

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Thu Apr 04, 2013 12:05 am UTC

christandiono wrote:
slicedtoad wrote:So how hard would it be to implement a gpu-compute program for solving the hash?
I've never touched GPGPU programming before and my understanding of cryptography is a little shaky but would it be as simple as assigning the gpu a bunch of random strings to hash, checking them, then permeating (right word?) them a small amount before hashing again and repeating?

Also, if anyone has a tutorial or something for creating a similar GPGPU program, could you link it? Preferably in C++ and for AMD's architecture (since I don't have any CUDA cards to play with atm).


I didn't do it, but someone working with me ported the reference implementation to CUDA. It was partly just slapping __device__ on all the functions, then writing some supporting code to make it actually run. We got about 7M hashes/second on a GTX 560 Ti, 6M hashes/second on a GTX 660, 8M hashes/second on a GTX 690, 3M hashes/second on a Tesla C1060, 7M hashes/second on a Tesla M2050 (EC2). We tried to get an OpenCL version running but we gave up.


Awesome! Care to put the CUDA implementation up on Github? On the performance side, 7 Mhashes/sec doesn't seem too different from an 8-core Xeon. I would expect an optimized implementation could see much higher performance.

bsittler
Posts: 7
Joined: Tue Apr 02, 2013 6:07 pm UTC

Re: 1193: Externalities

Postby bsittler » Thu Apr 04, 2013 1:51 am UTC

jpvlsmv wrote:On a completely theoretical note, does anyone know of an eigenvalue for any common hash algorithm?

i.e. a 64-hexdigit string whose md5sum is that string?


Not quite the same, but I think I just found a string whose xkcd-style Skein-1024-1024 hash includes its own score:
Spoiler:
*/^>}[k)H&%/:d!9`7v404_ ]oE3I.tyM7o!6*NrgO4Tf],Ah&FCXjz

... this differs from the target hash by 404 bit positions (and note the 404 in the middle.) Too bad I didn't find that one while the contest was running. Of course the bit count depends on the particular URL-encoding chosen as the step immediately preceding hashing. The one I use matches what I observed with manual testing of the almamater site (when it still worked) in Chrome on a Mac:

*%2F%5E%3E%7D%5Bk%29H%26%25%2F%3Ad%219%607v404_+%5DoE3I.tyM7o%216*NrgO4Tf%5D%2CAh%26FCXjz

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Thu Apr 04, 2013 6:14 am UTC

jpvlsmv wrote:On a completely theoretical note, does anyone know of an eigenvalue for any common hash algorithm?

i.e. a 64-hexdigit string whose md5sum is that string?


As long as we're talking about "common hash algorithms", it is pretty trivial to find a CRC32 value that sums to itself ;-). The longer the hash output and stronger the hash, the harder it becomes...

User avatar
XonqNopp
Posts: 163
Joined: Thu Mar 28, 2013 9:10 am UTC
Location: http://xkcd.com/1190#verbose

Re: 1193: Externalities

Postby XonqNopp » Thu Apr 04, 2013 6:15 am UTC

Has anyone wondered if Randall hacked in a webserver to find a skein hashed password of someone (that maybe he dislikes) and made us fight to crack it??? Maybe Randall waits for us to give him the original string, himself having no clue about it...
RandallsWhimsyist - WWRD (What Would Randall Draw

Wait well, all of you blitzers!

I guess that it will be the End of Time. Wait for it.

edenist
Posts: 23
Joined: Wed May 16, 2012 5:10 am UTC

Re: 1193: Externalities

Postby edenist » Thu Apr 04, 2013 6:23 am UTC

XonqNopp wrote:Has anyone wondered if Randall hacked in a webserver to find a skein hashed password of someone (that maybe he dislikes) and made us fight to crack it??? Maybe Randall waits for us to give him the original string, himself having no clue about it...


Or he, himself, is in a competition to crack a hash and the most efficient way for him to do so was to run his own "competition"?

Sort of a hash-cracking ponzi scheme :p

User avatar
XonqNopp
Posts: 163
Joined: Thu Mar 28, 2013 9:10 am UTC
Location: http://xkcd.com/1190#verbose

Re: 1193: Externalities

Postby XonqNopp » Thu Apr 04, 2013 6:53 am UTC

edenist wrote:
XonqNopp wrote:Has anyone wondered if Randall hacked in a webserver to find a skein hashed password of someone (that maybe he dislikes) and made us fight to crack it??? Maybe Randall waits for us to give him the original string, himself having no clue about it...


Or he, himself, is in a competition to crack a hash and the most efficient way for him to do so was to run his own "competition"?

Sort of a hash-cracking ponzi scheme :p

Or a profzi scheme??

Image
RandallsWhimsyist - WWRD (What Would Randall Draw

Wait well, all of you blitzers!

I guess that it will be the End of Time. Wait for it.

User avatar
Klear
Posts: 1965
Joined: Sun Jun 13, 2010 8:43 am UTC
Location: Prague

Re: 1193: Externalities

Postby Klear » Thu Apr 04, 2013 9:13 am UTC

blowfishhootie wrote:
martin34 wrote:That was a rather anti-climactic ending. Just the winning school announced on the new comic for a few hours, and then nothing.


What were you expecting, and what indication did you receive that led you to expect it? Did you expect a donation to the winning school's endowment or something?


I was hoping there would be a joke.

trakof
Posts: 12
Joined: Sat Jul 17, 2010 8:33 pm UTC

Re: 1193: Externalities

Postby trakof » Thu Apr 04, 2013 9:34 pm UTC

Here's a pretty crappy OpenCL implementation. I know there's some spots it could improve so if anyone would like to, go for it.

kernel.txt
Spoiler:

Code: Select all

//const short modifier_words = 2;
//const short state_words = 16;
//const short state_bytes = 128;
//const short state_bits = 1024;
//const short block_bytes = 128;

typedef unsigned char u08b_t;   
typedef unsigned long u64b_t; 

typedef struct
{
   uint  hashBitLen;
   uint  bCnt;
   u64b_t  T[2];   
} Skein_Ctxt_Hdr_t;


typedef struct
{
   Skein_Ctxt_Hdr_t h;
   u64b_t  X[16];
    u08b_t  b[128];   
} Skein1024_Ctxt_t;

#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
                               
#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
                               
/* twerk word T[1]: flag bit definition(s) */
#define SKEIN_T1_FLAG_FIRST     (((u64b_t)  1 ) << SKEIN_T1_POS_FIRST)
#define SKEIN_T1_FLAG_FINAL     (((u64b_t)  1 ) << SKEIN_T1_POS_FINAL)
#define SKEIN_T1_FLAG_BIT_PAD   (((u64b_t)  1 ) << SKEIN_T1_POS_BIT_PAD)
                               
/* twerk word T[1]: tree level bit field mask */
#define SKEIN_T1_TREE_LVL_MASK  (((u64b_t)0x7F) << SKEIN_T1_POS_TREE_LVL)
#define SKEIN_T1_TREE_LEVEL(n)  (((u64b_t) (n)) << SKEIN_T1_POS_TREE_LVL)

/* twerk word T[1]: block type field */
#define SKEIN_BLK_TYPE_KEY      ( 0)                    /* key, for MAC and KDF */
#define SKEIN_BLK_TYPE_CFG      ( 4)                    /* configuration block */
#define SKEIN_BLK_TYPE_PERS     ( 8)                    /* personalization string */
#define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
#define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
#define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
#define SKEIN_BLK_TYPE_MSG      (48)                    /* message processing */
#define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
#define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */

#define SKEIN_T1_BLK_TYPE(T)   (((u64b_t) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* public key (for digital signature hashing) */
#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key identifier for KDF */
#define SKEIN_T1_BLK_TYPE_NONCE SKEIN_T1_BLK_TYPE(NONCE)/* nonce for PRNG */
#define SKEIN_T1_BLK_TYPE_MSG   SKEIN_T1_BLK_TYPE(MSG)  /* message processing */
#define SKEIN_T1_BLK_TYPE_OUT   SKEIN_T1_BLK_TYPE(OUT)  /* output stage */
#define SKEIN_T1_BLK_TYPE_MASK  SKEIN_T1_BLK_TYPE(MASK) /* field bit mask */

#define SKEIN_T1_BLK_TYPE_CFG_FINAL       (SKEIN_T1_BLK_TYPE_CFG | SKEIN_T1_FLAG_FINAL)
#define SKEIN_T1_BLK_TYPE_OUT_FINAL       (SKEIN_T1_BLK_TYPE_OUT | SKEIN_T1_FLAG_FINAL)

#define SKEIN_VERSION           (1)

#ifndef SKEIN_ID_STRING_LE      /* allow compile-time personalization */
#define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
#endif

#define SKEIN_MK_64(hi32,lo32)  ((lo32) + (((u64b_t) (hi32)) << 32))
#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION,SKEIN_ID_STRING_LE)
#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA,0xA9FC1A22)

#define SKEIN_CFG_STR_LEN       (4*8)

/* bit field definitions in config block treeInfo word */
#define SKEIN_CFG_TREE_LEAF_SIZE_POS  ( 0)
#define SKEIN_CFG_TREE_NODE_SIZE_POS  ( 8)
#define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)

#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)

#define SKEIN_CFG_TREE_INFO(leaf,node,maxLvl)                   \
    ( (((u64b_t)(leaf  )) << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
      (((u64b_t)(node  )) << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
      (((u64b_t)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS) )

#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0,0,0)

#define Skein_Start_New_Type(ctxPtr,BLK_TYPE) { Skein_Set_T0_T1(ctxPtr,0,SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt=0; }
#define Skein_Get_Tweak(ctxPtr,TWK_NUM)         ((ctxPtr)->h.T[TWK_NUM])
#define Skein_Set_Tweak(ctxPtr,TWK_NUM,tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal);}

#define Skein_Get_T0(ctxPtr)    Skein_Get_Tweak(ctxPtr,0)
#define Skein_Get_T1(ctxPtr)    Skein_Get_Tweak(ctxPtr,1)
#define Skein_Set_T0(ctxPtr,T0) Skein_Set_Tweak(ctxPtr,0,T0)
#define Skein_Set_T1(ctxPtr,T1) Skein_Set_Tweak(ctxPtr,1,T1)

/* set both twerk words at once */
#define Skein_Set_T0_T1(ctxPtr,T0,T1)           \
    {                                           \
    Skein_Set_T0(ctxPtr,(T0));                  \
    Skein_Set_T1(ctxPtr,(T1));                  \
    }

#define Skein_Set_Type(ctxPtr,BLK_TYPE)         \
    Skein_Set_T1(ctxPtr,SKEIN_T1_BLK_TYPE_##BLK_TYPE)
#define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
#define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }

#define Skein_Set_Tree_Level(hdr,height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height);}

void Skein1024_Init(Skein1024_Ctxt_t* ctx, unsigned int hashBitLen)
{
   union
   {
      u08b_t b[128];
      u64b_t w[16];
   }cfg;

   ctx->h.hashBitLen = hashBitLen;

    //case 1024: memcpy(ctx->X,SKEIN1024_IV_1024,sizeof(ctx->X)); break;
    //((lo32) + (((u64b_t) (hi32)) << 32))
    ctx->X[0] = (0x41E72355) + (((u64b_t)(0xD593DA07)) << 32);
    ctx->X[1] = (((u64b_t)(0x15B5E511)) << 32)+(0xAC73E00C);
    ctx->X[2] = (((u64b_t)(0x5180E5AE)) << 32)+(0xBAF2C4F0);
    ctx->X[3] = (((u64b_t)(0x03BD41D3)) << 32)+(0xFCBCAFAF);
    ctx->X[4] = (((u64b_t)(0x1CAEC6FD)) << 32)+(0x1983A898);
    ctx->X[5] = (((u64b_t)(0x6E510B8B)) << 32)+(0xCDD0589F);
    ctx->X[6] = (((u64b_t)(0x77E2BDFD)) << 32)+(0xC6394ADA);
    ctx->X[7] = (((u64b_t)(0xC11E1DB5)) << 32)+(0x24DCB0A3);
    ctx->X[8] = (((u64b_t)(0xD6D14AF9)) << 32)+(0xC6329AB5);
    ctx->X[9] = (((u64b_t)(0x6A9B0BFC)) << 32)+(0x6EB67E0D);
    ctx->X[10] = (((u64b_t)(0x9243C60D)) << 32)+(0xCCFF1332);
    ctx->X[11] = (((u64b_t)(0x1A1F1DDE)) << 32)+(0x743F02D4);
    ctx->X[12] = (((u64b_t)(0x0996753C)) << 32)+(0x10ED0BB8);
    ctx->X[13] = (((u64b_t)(0x6572DD22)) << 32)+(0xF2B4969A);
    ctx->X[14] = (((u64b_t)(0x61FD3062)) << 32)+(0xD00A579A);
    ctx->X[15] = (((u64b_t)(0x1DE0536E)) << 32)+(0x8682E539);

   Skein_Start_New_Type(ctx,MSG);
}

void* memcpy(void* dest, const void* src, uint count) {
    char* dst8 = (char*)dest;
    char* src8 = (char*)src;

    while (count--) {
        *dst8++ = *src8++;
    }
   return dest;
}

void *memset(char *s, char c, size_t n)
{
uint i;
for (i = 0; i < n; i++, s++)
{
*s = c;
}
return s;
}

#define BLK_BITS        (WCNT*64)               
#define KW_TWK_BASE     (0)
#define KW_KEY_BASE     (3)
#define ks              (kw + KW_KEY_BASE)               
#define ts              (kw + KW_TWK_BASE)
#define WCNT 16
#define RCNT 10
#define SKEIN_UNROLL_1024 1
#define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
#define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
#define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
enum   
    { 
    R1024_0_0=24, R1024_0_1=13, R1024_0_2= 8, R1024_0_3=47, R1024_0_4= 8, R1024_0_5=17, R1024_0_6=22, R1024_0_7=37,
    R1024_1_0=38, R1024_1_1=19, R1024_1_2=10, R1024_1_3=55, R1024_1_4=49, R1024_1_5=18, R1024_1_6=23, R1024_1_7=52,
    R1024_2_0=33, R1024_2_1= 4, R1024_2_2=51, R1024_2_3=13, R1024_2_4=34, R1024_2_5=41, R1024_2_6=59, R1024_2_7=17,
    R1024_3_0= 5, R1024_3_1=20, R1024_3_2=48, R1024_3_3=41, R1024_3_4=47, R1024_3_5=28, R1024_3_6=16, R1024_3_7=25,
    R1024_4_0=41, R1024_4_1= 9, R1024_4_2=37, R1024_4_3=31, R1024_4_4=12, R1024_4_5=47, R1024_4_6=44, R1024_4_7=30,
    R1024_5_0=16, R1024_5_1=34, R1024_5_2=56, R1024_5_3=51, R1024_5_4= 4, R1024_5_5=53, R1024_5_6=42, R1024_5_7=41,
    R1024_6_0=31, R1024_6_1=44, R1024_6_2=47, R1024_6_3=46, R1024_6_4=19, R1024_6_5=42, R1024_6_6=44, R1024_6_7=25,
    R1024_7_0= 9, R1024_7_1=48, R1024_7_2=35, R1024_7_3=52, R1024_7_4=23, R1024_7_5=31, R1024_7_6=37, R1024_7_7=20
    };

void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,uint blkCnt,uint byteCntAdd)
{
   uint r;
   u64b_t kw[WCNT+4+RCNT*2];

    u64b_t  X00,X01,X02,X03,X04,X05,X06,X07,X08,X09,X10,X11,X12,X13,X14,X15;
    u64b_t  w [WCNT];   

   ts[0] = ctx->h.T[0];
    ts[1] = ctx->h.T[1];
   
   do  {
        /* this implementation only supports 2**64 input bytes (no carry out here) */
        ts[0] += byteCntAdd;                    /* update processed length */

        /* precompute the key schedule for this block */
        ks[ 0] = ctx->X[ 0];
        ks[ 1] = ctx->X[ 1];
        ks[ 2] = ctx->X[ 2];
        ks[ 3] = ctx->X[ 3];
        ks[ 4] = ctx->X[ 4];
        ks[ 5] = ctx->X[ 5];
        ks[ 6] = ctx->X[ 6];
        ks[ 7] = ctx->X[ 7];
        ks[ 8] = ctx->X[ 8];
        ks[ 9] = ctx->X[ 9];
        ks[10] = ctx->X[10];
        ks[11] = ctx->X[11];
        ks[12] = ctx->X[12];
        ks[13] = ctx->X[13];
        ks[14] = ctx->X[14];
        ks[15] = ctx->X[15];
      ks[16] = ks[ 0] ^ ks[ 1] ^ ks[ 2] ^ ks[ 3] ^
                 ks[ 4] ^ ks[ 5] ^ ks[ 6] ^ ks[ 7] ^
                 ks[ 8] ^ ks[ 9] ^ ks[10] ^ ks[11] ^
                 ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
      
      ts[2]  = ts[0] ^ ts[1];

        Skein_Get64_LSB_First(w,blkPtr,WCNT); /* get input block in little-endian format */////////////////////////////////////

        X00    = w[ 0] + ks[ 0];                 /* do the first full key injection */
        X01    = w[ 1] + ks[ 1];
        X02    = w[ 2] + ks[ 2];
        X03    = w[ 3] + ks[ 3];
        X04    = w[ 4] + ks[ 4];
        X05    = w[ 5] + ks[ 5];
        X06    = w[ 6] + ks[ 6];
        X07    = w[ 7] + ks[ 7];
        X08    = w[ 8] + ks[ 8];
        X09    = w[ 9] + ks[ 9];
        X10    = w[10] + ks[10];
        X11    = w[11] + ks[11];
        X12    = w[12] + ks[12];
        X13    = w[13] + ks[13] + ts[0];
        X14    = w[14] + ks[14] + ts[1];
        X15    = w[15] + ks[15];

      #define Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rNum) \
    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0;   \
    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2;   \
    X##p4 += X##p5; X##p5 = RotL_64(X##p5,ROT##_2); X##p5 ^= X##p4;   \
    X##p6 += X##p7; X##p7 = RotL_64(X##p7,ROT##_3); X##p7 ^= X##p6;   \
    X##p8 += X##p9; X##p9 = RotL_64(X##p9,ROT##_4); X##p9 ^= X##p8;   \
    X##pA += X##pB; X##pB = RotL_64(X##pB,ROT##_5); X##pB ^= X##pA;   \
    X##pC += X##pD; X##pD = RotL_64(X##pD,ROT##_6); X##pD ^= X##pC;   \
    X##pE += X##pF; X##pF = RotL_64(X##pF,ROT##_7); X##pF ^= X##pE;   \
   
   #define R1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
      Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn)

   #define I1024(R)                                                      \
    X00   += ks[r+(R)+ 0];    /* inject the key schedule value */     \
    X01   += ks[r+(R)+ 1];                                            \
    X02   += ks[r+(R)+ 2];                                            \
    X03   += ks[r+(R)+ 3];                                            \
    X04   += ks[r+(R)+ 4];                                            \
    X05   += ks[r+(R)+ 5];                                            \
    X06   += ks[r+(R)+ 6];                                            \
    X07   += ks[r+(R)+ 7];                                            \
    X08   += ks[r+(R)+ 8];                                            \
    X09   += ks[r+(R)+ 9];                                            \
    X10   += ks[r+(R)+10];                                            \
    X11   += ks[r+(R)+11];                                            \
    X12   += ks[r+(R)+12];                                            \
    X13   += ks[r+(R)+13] + ts[r+(R)+0];                              \
    X14   += ks[r+(R)+14] + ts[r+(R)+1];                              \
    X15   += ks[r+(R)+15] +    r+(R)   ;                              \
    ks[r  +       (R)+16] = ks[r+(R)-1];  /* rotate key schedule */   \
    ts[r  +       (R)+ 2] = ts[r+(R)-1];                             
   
   for (r=1;r <= 2*RCNT;r+=2*SKEIN_UNROLL_1024)
   {
      #define R1024_8_rounds(R) \
         R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_0,8*(R) + 1); \
         R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_1,8*(R) + 2); \
         R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_2,8*(R) + 3); \
         R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_3,8*(R) + 4); \
         I1024(2*(R));                                                             \
         R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_4,8*(R) + 5); \
         R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_5,8*(R) + 6); \
         R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_6,8*(R) + 7); \
         R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_7,8*(R) + 8); \
         I1024(2*(R)+1);

        R1024_8_rounds(0);
   }
      ctx->X[ 0] = X00 ^ w[ 0];
        ctx->X[ 1] = X01 ^ w[ 1];
        ctx->X[ 2] = X02 ^ w[ 2];
        ctx->X[ 3] = X03 ^ w[ 3];
        ctx->X[ 4] = X04 ^ w[ 4];
        ctx->X[ 5] = X05 ^ w[ 5];
        ctx->X[ 6] = X06 ^ w[ 6];
        ctx->X[ 7] = X07 ^ w[ 7];
        ctx->X[ 8] = X08 ^ w[ 8];
        ctx->X[ 9] = X09 ^ w[ 9];
        ctx->X[10] = X10 ^ w[10];
        ctx->X[11] = X11 ^ w[11];
        ctx->X[12] = X12 ^ w[12];
        ctx->X[13] = X13 ^ w[13];
        ctx->X[14] = X14 ^ w[14];
        ctx->X[15] = X15 ^ w[15];

      ts[1] &= ~SKEIN_T1_FLAG_FIRST;
        blkPtr += 128;
    } while (--blkCnt);
    ctx->h.T[0] = ts[0];
    ctx->h.T[1] = ts[1];
}

void Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, uint msgByteCnt)
{
   uint n;

    /* process full blocks, if any */
    if (msgByteCnt + ctx->h.bCnt > 128)
        {
        if (ctx->h.bCnt)                              /* finish up any buffered message data */
            {
            n = 128 - ctx->h.bCnt;  /* # bytes free in buffer b[] */
            if (n)
                {
                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
                msgByteCnt  -= n;
                msg         += n;
                ctx->h.bCnt += n;
                }
            Skein1024_Process_Block(ctx,ctx->b,1,128);//////////////////////////////////////////////////////////
            ctx->h.bCnt = 0;
            }
        /* now process any remaining full blocks, directly from input message data */
        if (msgByteCnt > 128)
            {
            n = (msgByteCnt-1) / 128;   /* number of full blocks to process */
            Skein1024_Process_Block(ctx,msg,n,128);///////////////////////////////////////////////////////////
            msgByteCnt -= n * 128;
            msg        += n * 128;
            }
        }

    /* copy any remaining source message data bytes into b[] */
    if (msgByteCnt)
    {
    memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
    ctx->h.bCnt += msgByteCnt;
    }
}


void Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
{
   uint i,n,byteCnt;
   u64b_t X[16];

   ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;           
   if (ctx->h.bCnt < 128)   
   {
      memset(&ctx->b[ctx->h.bCnt],0,128 - ctx->h.bCnt);
   }     
   Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt); 
   
   byteCnt = (ctx->h.hashBitLen + 7) >> 3;   

   
   memset(ctx->b,0,sizeof(ctx->b));
   memcpy(X,ctx->X,sizeof(X));   
   for (i=0;i*128 < byteCnt;i++)
   {
      Skein_Start_New_Type(ctx,OUT_FINAL);
      Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t));
      n = byteCnt - i*128; 
      if (n >= 128)
      {
         n  = 128;
      }
      Skein_Put64_LSB_First(hashVal+i*128,ctx->X,n);

      //memcpy(ctx->X,X,sizeof(X));
      ctx->X[0] = X[0];
      ctx->X[1] = X[1];
      ctx->X[2] = X[2];
      ctx->X[3] = X[3];
      ctx->X[4] = X[4];
      ctx->X[5] = X[5];
      ctx->X[6] = X[6];
      ctx->X[7] = X[7];
      ctx->X[8] = X[8];
      ctx->X[9] = X[9];
      ctx->X[10] = X[10];
      ctx->X[11] = X[11];
      ctx->X[12] = X[12];
      ctx->X[13] = X[13];
      ctx->X[14] = X[14];
      ctx->X[15] = X[15];
   }
}

const int bits[256] = {
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8};

const char target[] = "5b4da95f5fa08280fc9879df44f418c8f9f12ba424b7757de02bbdfbae0d4c4fdf9317c80cc5fe04c6429073466cf29706b8c25999ddd2f6540d4475cc977b87f4757be023f19b8f4035d7722886b78869826de916a79cf9c94cc79cd4347d24b567aa3e2390a573a373a48a5e676640c79cc70197e1c5e7f902fb53ca1858b6";
const char goal[] = {0x5b, 0x4d, 0xa9, 0x5f, 0x5f, 0xa0, 0x82, 0x80, 0xfc, 0x98, 0x79, 0xdf, 0x44, 0xf4, 0x18, 0xc8, 0xf9, 0xf1, 0x2b, 0xa4, 0x24, 0xb7, 0x75, 0x7d, 0xe0, 0x2b, 0xbd, 0xfb, 0xae, 0x0d, 0x4c, 0x4f, 0xdf, 0x93, 0x17, 0xc8, 0x0c, 0xc5, 0xfe, 0x04, 0xc6, 0x42, 0x90, 0x73, 0x46, 0x6c, 0xf2, 0x97, 0x06, 0xb8, 0xc2, 0x59, 0x99, 0xdd, 0xd2, 0xf6, 0x54, 0x0d, 0x44, 0x75, 0xcc, 0x97, 0x7b, 0x87, 0xf4, 0x75, 0x7b, 0xe0, 0x23, 0xf1, 0x9b, 0x8f, 0x40, 0x35, 0xd7, 0x72, 0x28, 0x86, 0xb7, 0x88, 0x69, 0x82, 0x6d, 0xe9, 0x16, 0xa7, 0x9c, 0xf9, 0xc9, 0x4c, 0xc7, 0x9c, 0xd4, 0x34, 0x7d, 0x24, 0xb5, 0x67, 0xaa, 0x3e, 0x23, 0x90, 0xa5, 0x73, 0xa3, 0x73, 0xa4, 0x8a, 0x5e, 0x67, 0x66, 0x40, 0xc7, 0x9c, 0xc7, 0x01, 0x97, 0xe1, 0xc5, 0xe7, 0xf9, 0x02, 0xfb, 0x53, 0xca, 0x18, 0x58, 0xb6};

int countbits(char* c1, char* c2)
{
   int count = 0;
   uchar x;
   for(int iu=0;iu<128;++iu)
   {
      x = c1[iu] ^ c2[iu];
      count += bits[x];
   }
   return count;
}

__kernel void skein(__global char* input,
   __global int* scores,
   const unsigned int INPUT_SIZE)           
{                                       
    int i = get_global_id(0);

   char word[64] = {};
   int isi = INPUT_SIZE * i;
   for(int j=0;j<INPUT_SIZE;j++)
   {
      word[j] = *(input + isi + j);
   }
   
   
   u08b_t hashval[128] = {0};
   Skein1024_Ctxt_t ctx;
   Skein1024_Init(&ctx, 1024);
   Skein1024_Update(&ctx, (const u08b_t*)word, INPUT_SIZE);
   Skein1024_Final(&ctx, hashval);

   int cc = countbits(&hashval,&goal);
   scores[i] = cc;
}
 


main.cpp
Spoiler:

Code: Select all

#include <iostream>
#include <fstream>
#include <string>
#include <time.h>
using namespace std;

#define __NO_STD_VECTOR
#include <CL/cl.h>

#define DATA_SIZE (1<<16)
#define INPUT_SIZE 20


int main(int argc, char* argv[])
{
   srand(time(NULL));
  int devType=CL_DEVICE_TYPE_GPU;

 
  cl_int err; 
 
  size_t global; 
  size_t local;   
 
  cl_platform_id cpPlatform;
  cl_device_id device_id; 
  cl_context context;   
  cl_command_queue commands;
  cl_program program;   
  cl_kernel kernel;
 

  err = clGetPlatformIDs(1, &cpPlatform, NULL);
  if (err != CL_SUCCESS) {
    cerr << "Error: Failed to find a platform!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
 
  err = clGetDeviceIDs(cpPlatform, devType, 1, &device_id, NULL);
  if (err != CL_SUCCESS) {
    cerr << "Error: Failed to create a device group!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
 
  context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
  if (!context) {
    cerr << "Error: Failed to create a compute context!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
 
  commands = clCreateCommandQueue(context, device_id, 0, &err);
  if (!commands) {
    cerr << "Error: Failed to create a command commands!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
 
  string source;
  std::ifstream in("kernel.txt", std::ios::in | std::ios::binary);
  if (in)
  {
    in.seekg(0, std::ios::end);
    source.resize(in.tellg());
    in.seekg(0, std::ios::beg);
    in.read(&source[0], source.size());
    in.close();
  }
  else
  {
   cerr << "Error: Could not load kernel.txt!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
  const char* ks = source.c_str();

  program = clCreateProgramWithSource(context, 1,
                (const char**)&ks,
                  NULL, &err);
  if (!program) {
    cerr << "Error: Failed to create compute program!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
 
  err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
  if (err != CL_SUCCESS) {
    size_t len;
    char buffer[2048];
   
    cerr << "Error: Failed to build program executable!" << endl;
    clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG,
           sizeof(buffer), buffer, &len);
    cerr << buffer << endl;
   getchar();
    exit(1);
  }
 
  kernel = clCreateKernel(program, "skein", &err);
  if (!kernel || err != CL_SUCCESS) {
    cerr << "Error: Failed to create compute kernel!" << endl;
   getchar();
    exit(1);
  }


  char* databuff1 = new char[DATA_SIZE * INPUT_SIZE];
  char* databuff2 = new char[DATA_SIZE * INPUT_SIZE]; 
  char** data = &databuff1;// = new char[DATA_SIZE * INPUT_SIZE];
  char** otherdata = &databuff2;
  int* score = new int[DATA_SIZE];
  int bestScore = 10000;
  char* bestWord = new char[INPUT_SIZE + 1];
  bestWord[0] = 0;
  cl_mem input;                   
  cl_mem outputscores;
 
  unsigned int count = DATA_SIZE;
  memset(score,0,DATA_SIZE * sizeof(int));

  input = clCreateBuffer(context,  CL_MEM_READ_ONLY, 
          sizeof(char) * 128 * count, NULL, NULL);
  outputscores = clCreateBuffer(context, CL_MEM_WRITE_ONLY,
         sizeof(int) * count, NULL, NULL);
  if (!input || !outputscores) {
    cerr << "Error: Failed to allocate device memory!" << endl;
   getchar();
    exit(1);
  }   

  char* chars = "0123456789qwertyuioplkjhgfdsazxcvbnmQWERTYUIOPLKJHGFDSAZXCVBNM";
  int charslen = strlen(chars);

  unsigned int is = INPUT_SIZE;

  int loops = 0;

  char* b = *data;
   for(int i=0;i<DATA_SIZE * INPUT_SIZE;i++){
       b[i] = chars[(rand() % charslen)];
   }

  while(1)
  {
     err = clEnqueueWriteBuffer(commands, input,
                CL_TRUE, 0, sizeof(char) * INPUT_SIZE * count,
                *data, 0, NULL, NULL);
   //  if (err != CL_SUCCESS) {
   //   cerr << "Error: Failed to write to source array!" << endl;
   //   getchar();
   //   exit(1);
   //  }
 
     err = 0;
     err  = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);
     err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &outputscores);
     err |= clSetKernelArg(kernel, 2, sizeof(unsigned int), &is);
   //  if (err != CL_SUCCESS) {
   //   cerr << "Error: Failed to set kernel arguments! " << err << endl;
   //   getchar();
   //   exit(1);
   //  }
 
     err = clGetKernelWorkGroupInfo(kernel, device_id,
                CL_KERNEL_WORK_GROUP_SIZE,
                sizeof(local), &local, NULL);
   //  if (err != CL_SUCCESS) {
   //   cerr << "Error: Failed to retrieve kernel work group info! "
   //    <<  err << endl;
   //   getchar();
   //   exit(1);
   //  }
 

     global = count;
     err = clEnqueueNDRangeKernel(commands, kernel,
                  1, NULL, &global, &local,
                  0, NULL, NULL);
     //if (err != CL_SUCCESS) {
   //   cerr << "Error: Failed to execute kernel!" << endl;
   //   getchar();
   //   return EXIT_FAILURE;
    // }
 
     clFlush(commands);

     if(loops > 0)
     {
        for(int i=0;i<count;++i)
        {
           if(score[i] < bestScore)
           {
              bestScore = score[i];
              char* w = &(*otherdata)[i * INPUT_SIZE];
              strncpy(bestWord, w, INPUT_SIZE);
              bestWord[INPUT_SIZE] = 0;

              printf("%d %s\n",bestScore, bestWord);
           }
        }
     }


   char* b = *otherdata;
   for(int i=0;i<DATA_SIZE * INPUT_SIZE;i++){
       b[i] = chars[(rand() % charslen)];
   }



     clFinish(commands);
 

     err = clEnqueueReadBuffer( commands, outputscores,
               CL_TRUE, 0, sizeof(int) * count,
               score, 0 ,NULL, NULL);
     //if (err != CL_SUCCESS) {
   //   cerr << "Error: Failed to read output array! " <<  err << endl;
   //   getchar();
   //   exit(1);
    // }



     char* t = *data;
     *data = *otherdata;
     *otherdata = t;
    
     loops++;
     if(loops % 1000 == 0)
     {
        printf("%d\n",loops);
     }
  }

  delete [] data;
  //delete [] results;
  delete [] score;
 
  clReleaseMemObject(input);
  //clReleaseMemObject(output);
  clReleaseMemObject(outputscores);
  clReleaseProgram(program);
  clReleaseKernel(kernel);
  clReleaseCommandQueue(commands);
  clReleaseContext(context);
 
  getchar();

  return 0;
}

User avatar
macraw83
Posts: 623
Joined: Wed Mar 27, 2013 7:06 pm UTC
Location: Present (L:13)

Re: 1193: Externalities

Postby macraw83 » Fri Apr 05, 2013 4:16 am UTC

Klear wrote:
blowfishhootie wrote:
martin34 wrote:That was a rather anti-climactic ending. Just the winning school announced on the new comic for a few hours, and then nothing.


What were you expecting, and what indication did you receive that led you to expect it? Did you expect a donation to the winning school's endowment or something?


I was hoping there would be a joke.


The joke was all the time (both human and CPU) invested into the comic. I think.
In a comically tragic turn of events, Addams didn't die. You can help her here.

Does the basement ever end?

Good luck to all the Blitzers out there!

Blitzgirl wrote:suddenly all the following pagepopes become goated from that point onward, mustarding up those who are trying to blitz

dburden
Posts: 1
Joined: Fri Apr 05, 2013 5:56 am UTC

Re: 1193: Externalities

Postby dburden » Fri Apr 05, 2013 6:02 am UTC

I'm still hoping the plaintext was something other than a random string.

Has anyone set up a website for those of us interested in hashing whimsical strings?

I'm wondering now if a binary version of U+202e might be involved. I mostly spent my time swapping D's and G's during the "competition."
Last edited by dburden on Fri Apr 05, 2013 2:36 pm UTC, edited 1 time in total.

User avatar
higgs-boson
Posts: 519
Joined: Tue Mar 26, 2013 12:00 pm UTC
Location: Europe (UTC + 4 newpix)

Re: 1193: Externalities

Postby higgs-boson » Fri Apr 05, 2013 6:21 am UTC

macraw83 wrote:
Klear wrote:
blowfishhootie wrote:
martin34 wrote:That was a rather anti-climactic ending. Just the winning school announced on the new comic for a few hours, and then nothing.


What were you expecting, and what indication did you receive that led you to expect it? Did you expect a donation to the winning school's endowment or something?


I was hoping there would be a joke.


The joke was all the time (both human and CPU) invested into the comic. I think.

Not unheared of. Being member of a chess club does not pay well, but you can invest a hell of a time and still have fun.
And other sports demand extra-human resources as well (e.g. car races - try that without cars. Or new motors any other race).
Just drop the commercial advertising part (I don't think Randall got paid by the Nasdaq TOP 100). :-)
Apostolic Visitator, Holiest of Holy Fun-Havers
You have questions about XKCD: "Time"? There's a whole Wiki dedicated to it!

ekim
Posts: 109
Joined: Mon Dec 18, 2006 12:40 pm UTC
Location: Seattle

Re: 1193: Externalities

Postby ekim » Fri Apr 05, 2013 6:26 am UTC

At least all the time spent wasn't for nothing. I learned Kernighan's nice trick for counting bits:

Code: Select all

// use c to count bits set in v
for (c = 0; v; c++)
  v &= v - 1; // clear the least significant bit set

And this much faster trick, apparently essentially from HAKMEM:

Code: Select all

c = (v * 0x200040008001ULL & 0x111111111111111ULL) % 0xf;

User avatar
pitareio
Posts: 128
Joined: Wed Sep 19, 2012 7:03 pm UTC
Location: the land of smelly cheese

Re: 1193: Externalities

Postby pitareio » Fri Apr 05, 2013 6:56 am UTC

ealloc wrote:
skeptical scientist wrote:Did anyone come up with a better method for finding approximate hash matches than just brute force hashing of a bunch of random crap?


There is no other way - that's the point of a cryptographic hash function!

I guess you could use some psychoology and a dictionary attack, but that wouldn't increase your hash score it would only increase the already negligible probability of getting the '0' score. And for all we know Randall just generated 1024 random bits and didn't even have a message.


The funny part is : if there is actually a message, it's far easier to get the 0 score than any other score <100 (the upper limit is probably much higher, but you get the idea and I won't do the maths on this one). You might guess the message if it's not random enough, hack into Randall's computer, bribe him or send him to Guantanamo to confess the message with a much (much) higher chance of success than brute forcing to get a low non-zero score.

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Fri Apr 05, 2013 7:20 am UTC

ekim wrote:At least all the time spent wasn't for nothing. I learned Kernighan's nice trick for counting bits:

Code: Select all

// use c to count bits set in v
for (c = 0; v; c++)
  v &= v - 1; // clear the least significant bit set

And this much faster trick, apparently essentially from HAKMEM:

Code: Select all

c = (v * 0x200040008001ULL & 0x111111111111111ULL) % 0xf;

Alternatively, GCC has a trick. It's called __popcountll() (64-bit variant) and it reduces to a single x86 instruction with no multiplies.

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Fri Apr 05, 2013 8:26 am UTC

trakof wrote:Here's a pretty crappy OpenCL implementation. I know there's some spots it could improve so if anyone would like to, go for it.

I'm going to take that as license to integrate this into my brute forcer. Please let me know if that's not fine with you =).

rmsgrey
Posts: 3483
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1193: Externalities

Postby rmsgrey » Fri Apr 05, 2013 8:55 am UTC

cm_ wrote:Alternatively, GCC has a trick. It's called __popcountll() (64-bit variant) and it reduces to a single x86 instruction with no multiplies.

There are reports that it performs badly in testing - though without knowing the methodology used, it's not clear whether the speed of the look-up table relies on locality of reference in the table to avoid caching costs.

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Fri Apr 05, 2013 9:41 am UTC

rmsgrey wrote:
cm_ wrote:Alternatively, GCC has a trick. It's called __popcountll() (64-bit variant) and it reduces to a single x86 instruction with no multiplies.

There are reports that it performs badly in testing - though without knowing the methodology used, it's not clear whether the speed of the look-up table relies on locality of reference in the table to avoid caching costs.

Sure, but the OP wasn't using a LUT, (s)he was using some multiplication and bit-masking trick.

Ardaglash
Posts: 10
Joined: Wed Mar 27, 2013 11:36 am UTC

Re: 1193: Externalities

Postby Ardaglash » Fri Apr 05, 2013 11:59 am UTC

A likely dumb question, but that hasn't stopped me before...

I was Googling for info on the Time comic, and stumbled over this odd set of results:

/search?q=There+were+some+surprise+overnight+upsets+in+the+hash+breaking

A webcomic of romance, sarcasm, math, and language. There were some surprise overnight upsets in the hash breaking! The official winner will be the leader at ...


I can't find the text above in the forum, in the blag, or whatnot. So where was (or is) it?

rmsgrey
Posts: 3483
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1193: Externalities

Postby rmsgrey » Fri Apr 05, 2013 12:09 pm UTC

Ardaglash wrote:A likely dumb question, but that hasn't stopped me before...

I was Googling for info on the Time comic, and stumbled over this odd set of results:

/search?q=There+were+some+surprise+overnight+upsets+in+the+hash+breaking

A webcomic of romance, sarcasm, math, and language. There were some surprise overnight upsets in the hash breaking! The official winner will be the leader at ...


I can't find the text above in the forum, in the blag, or whatnot. So where was (or is) it?

It was where the xkcd Store banner is in the header

User avatar
macraw83
Posts: 623
Joined: Wed Mar 27, 2013 7:06 pm UTC
Location: Present (L:13)

Re: 1193: Externalities

Postby macraw83 » Fri Apr 05, 2013 12:11 pm UTC

Ardaglash wrote:A likely dumb question, but that hasn't stopped me before...

I was Googling for info on the Time comic, and stumbled over this odd set of results:

/search?q=There+were+some+surprise+overnight+upsets+in+the+hash+breaking

A webcomic of romance, sarcasm, math, and language. There were some surprise overnight upsets in the hash breaking! The official winner will be the leader at ...


I can't find the text above in the forum, in the blag, or whatnot. So where was (or is) it?


It was on the front page, for a time. The text up there would change periodically depending on the status of the contest.

Edit: ninja'd
In a comically tragic turn of events, Addams didn't die. You can help her here.

Does the basement ever end?

Good luck to all the Blitzers out there!

Blitzgirl wrote:suddenly all the following pagepopes become goated from that point onward, mustarding up those who are trying to blitz

brandnamebob
Posts: 2
Joined: Wed Apr 03, 2013 6:15 pm UTC

Re: 1193: Externalities

Postby brandnamebob » Fri Apr 05, 2013 3:14 pm UTC

Ardaglash wrote:A likely dumb question, but that hasn't stopped me before...

I was Googling for info on the Time comic, and stumbled over this odd set of results:

/search?q=There+were+some+surprise+overnight+upsets+in+the+hash+breaking

A webcomic of romance, sarcasm, math, and language. There were some surprise overnight upsets in the hash breaking! The official winner will be the leader at ...


I can't find the text above in the forum, in the blag, or whatnot. So where was (or is) it?


The text was:

There were some surprise overnight upsets in the hash breaking!
The official winner will be the leader at midnight EST when a new comic goes up.
KIT leads (389 bits wrong), followed by CMU (390 bits wrong).


I happened to still have it in a open tab.

speising
Posts: 2289
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: 1193: Externalities

Postby speising » Fri Apr 05, 2013 6:16 pm UTC

btw, does anybody know why this comic's font doesn't render correctly in IE10? it just shows the text in times new roman.
i thought ie10 was supposed to be standards compliant.

gorn
Posts: 18
Joined: Wed Mar 27, 2013 11:39 am UTC

Re: 1193: Externalities

Postby gorn » Fri Apr 05, 2013 6:24 pm UTC

Is it deliberate humour that the league table of results shows lots of non-existent domains with a distance of 404?

User avatar
pitareio
Posts: 128
Joined: Wed Sep 19, 2012 7:03 pm UTC
Location: the land of smelly cheese

Re: 1193: Externalities

Postby pitareio » Fri Apr 05, 2013 6:37 pm UTC

gorn wrote:Is it deliberate humour that the league table of results shows lots of non-existent domains with a distance of 404?


They're certainly not non-existent, the script checked for a valid dns reply. They may or may not have a website running, though.

I'd love to know who's behind "totallynotabotnet.org", scoring 390, at par with nsa.gov and google.com.

gorn
Posts: 18
Joined: Wed Mar 27, 2013 11:39 am UTC

Re: 1193: Externalities

Postby gorn » Fri Apr 05, 2013 6:47 pm UTC

I doubt the following are really universities.

Code: Select all

"v.co","404"
"r.to","404"
"xn--p1b6ci4b4b3a.xn--11b5bs3a9aj6g","404"
"g.to","404"
"k.to","404"
"t.st","404"
"w.cc","404"
"q.to","404"
"b.st","404"
"ad.ag","404"
"xn--fdbk5d8ap9b8a8d.xn--deba0ad","404"
"c.cc","404"
"d.co","404"
"ba.gs","404"
"b.co","404"
"s.to","404"
"xn--fsqu00a.xn--0zwm56d","404"
"l.st","404"
"ab.gs","404"
"e.to","404"
"xn--hxajbheg2az3al.xn--jxalpdlp","404"
"s.co","404"
"e.co","404"
"o.co","404"

User avatar
pitareio
Posts: 128
Joined: Wed Sep 19, 2012 7:03 pm UTC
Location: the land of smelly cheese

Re: 1193: Externalities

Postby pitareio » Fri Apr 05, 2013 7:00 pm UTC

gorn wrote:I doubt the following are really universities.

Code: Select all

"v.co","404"
"r.to","404"
"xn--p1b6ci4b4b3a.xn--11b5bs3a9aj6g","404"
"g.to","404"
"k.to","404"
"t.st","404"
"w.cc","404"
"q.to","404"
"b.st","404"
"ad.ag","404"
"xn--fdbk5d8ap9b8a8d.xn--deba0ad","404"
"c.cc","404"
"d.co","404"
"ba.gs","404"
"b.co","404"
"s.to","404"
"xn--fsqu00a.xn--0zwm56d","404"
"l.st","404"
"ab.gs","404"
"e.to","404"
"xn--hxajbheg2az3al.xn--jxalpdlp","404"
"s.co","404"
"e.co","404"
"o.co","404"


In the last hours of the contest, the constraint on the domain name (that had to be .edu, .ac.uk or .edu.au) was lifted, and you could submit anything as long as it had a dns entry.

The xn-- are internationalized domain names that point to the icann wiki in a variety of languages, namely greek, hebrew, chinese and one I'm not sure about (maybe hindi). If your browser supports it, it will replace the url with chinese/hebrew/greek characters in the address bar.

gorn
Posts: 18
Joined: Wed Mar 27, 2013 11:39 am UTC

Re: 1193: Externalities

Postby gorn » Fri Apr 05, 2013 8:09 pm UTC

As you wish.


Return to “Individual XKCD Comic Threads”

Who is online

Users browsing this forum: No registered users and 35 guests