MD5 Hash with single quote = grief in dao.findfirst

  • Thread starter Thread starter Stephen Rasey
  • Start date Start date
FWIW, Access programmers tend to not use BINARY fields, because the
Access interface does not allow you to create them. VB programmers,
who use the Access interface less, have no such inhabitations.
You need to use SQL/DAO/ADOX to create a BINARY field in a table,
but I never heard that VB programmers had any problems with them.
BINARY fields are non-case sensitive text fields.

(david)
 
What function would I use to convert the string? is there a CHex() in VBA?
I do not think so. The Hex() functions require a numeric input, not a
string.

This is rocket science, admittedly:

Public Function StringToHex(S As String) As String
Dim j As Long, T As String

For j = 1 To Len(S)
T = T & Right("0" & Hex(Asc(Mid(S, j, 1))), 2)
Next
StringToHex = T
End Function
On a practical matter, The HASH is a unique key key field. A 16 character
primary key is smaller than a 32 character primary key. So it would
increase the size of some of my tables.

Yes, though this isn't much of an issue unless you have millions of
records.
On the other hand, I was originally looking for a Hash function that
returned a base 10, base 36 or something less than 256. Stephen Bullen had
an MD5 Hash on the newsgroup in February and I used it after modifying it to
fix a bug and change its handeling of empty cells.

Stephen Rasey
WiserWays
Houston
http://excelsig.org
 
Re StringToHex: Thanks. My first reaction was it would be slow.
Compared to a C# function, it would be. But it is a small penalty compared
to everything elso going on. I had to think twice about that RIGHT("0"&
step. Clever.

StringToHex on a 16 character hash creates a 32 character "Hex" string in 80
microseconds.
The H70CheckADO on a 16 character hash (searching a 1800 record table)
takes 5600 microsec.
So the crucial question is how the CheckADO speed would be with a 32
character key.
I am not going to test it right away. Things work now and I cannot believe
a 32 character key will be a faster search than a 16 character key.

Stephen Rasey.
 
Tim Ferguson said:
Okay:- I obviously misunderstood the problem (not a rare occurrence!).


Now there was definitely an article recently that cited a couple of MSDN
articles about searching in SQL Server and jet, where special characters
are ignored such as "'" and "-". This is apparently to improve handling of
names like O'Connor and Handley-Page. Sorry I cannot find the reference to
it now... :-(


Your original post:-


All the best


Tim F
 
Stephen Rasey said:
The database looks like it correctly checks case on insert.

The MD5 Hash is a 16 character string containing any of 256 characters. So
it is a 16 digit base 256 "number"

The only problem would be if I generated a HashB that differed from an
already stored HashA in different case. The H70CheckADO function would
INCORRECTLY report a hit and I would use an incorrect record insted of store
a new one. The odd of that happening once in a billion record table are
in the magnitude of 1 in 10^24.

Let's compare that with using a Long Integer Autonumber Random primary key.
2^32 is 4.3*10^9. The chances of a random collision in a billion record
table is not 1 in 4, but it is virtually guaranteed.

It is the old "same birthday in a room ful of people" problem. The
birthday paradox statement is:
You have N number of people in a room randomly selected. At what is the
lowest value N is there a >50% chance there will be two or more people that
share a birthday. I think N is 29. [edit: nope it is 23]
http://www.people.virginia.edu/~rjh9u/birthday.html
http://mathforum.org/dr.math/faq/faq.birthdayprob.html)
http://science.howstuffworks.com/question261.htm
http://science.howstuffworks.com/framed.htm?parent=question261.htm&url=http:
//burtleburtle.net/bob/crypto/exchange.html

With the long integer case, the billionth record has a 1:4 of picking a PK
allready used. The record before that is also 1:4. In fact the last
million records also has a 1:4.

Even a million record table has a VERY High probability of a collision. I
would guess there is a 50% probability of a collision at SQRT(2^32) = about
64,000 records.

So I literally did not need to loose sleep over the case insensitivity of
the ADO recordset Find method.

I will check the performance of the H70CheckBin routine against the
H70CheckADO. If there is a big penalty on performance for case sensitive,
then I'll switch back to the ADO method.

I've learned something along the way and I have that d***ed single quote
problem solved.

Stephen Rasey
WiserWays
Houston
http://excelsig.org



david epsom dot com dot au said:
I think what you just said was that, given the odds, you
won't worry about hash collisions on inserts. But given
the odds, you could just as easily UCASE the hash value???
If you are using a Text field, I think that UCASEing the
hash value would be a much clearer/cleaner idea ???? Or
using a Binary field if you want to retain case ????

(david)
 
Stephen Rasey said:
What function would I use to convert the string? is there a CHex() in VBA?
I do not think so. The Hex() functions require a numeric input, not a
string.

On a practical matter, The HASH is a unique key key field. A 16 character
primary key is smaller than a 32 character primary key. So it would
increase the size of some of my tables.

On the other hand, I was originally looking for a Hash function that
returned a base 10, base 36 or something less than 256. Stephen Bullen had
an MD5 Hash on the newsgroup in February and I used it after modifying it to
fix a bug and change its handeling of empty cells.

Stephen Rasey
WiserWays
Houston
http://excelsig.org
 
Back
Top