47 Comments
- ipearx, on 10/12/2007, -0/+11(SSN is Social Security Number for the 95% of the world population not living in the USA)
- HeroreV, on 10/12/2007, -0/+5Digg needs to get its act together and prevent comments from containing hundreds of s in a row.
Untill then, you can deal with it by adding something like
@-moz-document domain(digg.com) {
.comment-body{ max-height: 20em; overflow: auto !important; }
}
to userContent.css if you're using Firefox. - angedinoir, on 10/12/2007, -1/+5This is a pointless article, you can save yourself the trouble of reading it by following these basic rules anyhow:
1. If you're planning on doing math with what appears to be a number, store it as an int (or appropriate floating data type).
2. If the data is a serial number (SSN, Serial Number, Zip / Postal Code). Store it as a character data type. - diescheisse, on 10/12/2007, -0/+2if you need an article to tell you to not store SSNs (or any ID type of "number", CC#, ISBN, DL#, etc) as an int, perhaps you need to retake database 101.
- r©ain, on 10/12/2007, -0/+1Well duhh
like nnn-nn-nnn is a numeric value anyway, it's a freakin string.
to remove the hyphens and store as a numeric would require processing to both store and retrieve since 2 functions must be built: one to strip hyphens and one to add hyphens.
I've always just stored the data as a hash string as I would a CC number or other piece of sensitive data. Since a hash contains no information, even if the data is compromised, the risk of obtaining the sensitive data is far lesser than if the data was encrypted. Of course, if you're like me and use AES, you only have to worry about the Gov't or other such large organizations being able to break your algo's in less than a human lifespan. -- But I digress.
Fact is, you should never be storing a SSN or CC number in plain form.
Hash or encrypt and then base64 encode the binary string for safe entry into the database as a string or text value. It's not the hackers you have to be worried the most about, it's trusted internal users who are the most likely candidates for malicious behavior. So why leave such sensitive data out for them to see? - arizonagroove, on 10/12/2007, -0/+1"SSN are 9 charters long if there are 8 charters then it's easy fix add a zero at the front."
But what if for some reason people start getting issued with 10 character SSNs? Some people will have 9 character SSNs, some people 10. If an SSN stored as an integer is 9 digits long, should you add a leading zero? - drn666, on 10/12/2007, -0/+1"It's not like most hashes aren't reversible these days with rather trivial methods (MD5/4, SHA-4 are all broken). "
This is pretty loaded - while there are examples of "reversible" MD5, it takes enormous amounts of CPU power to reverse even a 5 character password. Hardly "trivial". - whatever43, on 10/12/2007, -0/+1Most of the commentators, while perhaps excellent programmers, DBAs and the like, obviously don't understand the data about which they're commenting, which is most likely the source of all of the derision.
An SSN is not a number. It's a string made up of three distinct numbers, and only the third one (comprised of the last four digits) is serial. It can be treated like a number, but in most if not all cases it shouldn't be.
It's not a question of mathematical operations, storage space or anything else that's been mentioned here; it's a matter data integrity.
Thought not perfect by a long shot, SSNs are much more sophisticated than most people think. If you do any type of database programming and deal with SSNs, please go to www.ssa.gov and learn a little about them.
(No, I don't work for the Social Security Administration, but I have a lot of experience in data storage, validation and integrity, and I've encountered almost every question one can think of about SSNs.) - SteveDeGroof, on 10/12/2007, -0/+1I've run across this sort of thing on nearly every project I've worked on. People assume that any string that contains only numeric characters is a number. PINs, phone numbers, ZIP codes, account numbers - they try stuffing them into integer fields then look surprised when leading zeroes disappear or, worse, a value rolls over into negative. Like angedinoir said, if you're not going to do math with it, it's not a number.
- SethX9, on 10/12/2007, -1/+2"If an SSN stored as an integer is 9 digits long, should you add a leading zero?"
Integers have an infinite number of leading zeros already; we just don't print them. - Phil246, on 10/12/2007, -0/+1"It's not like most hashes aren't reversible these days with rather trivial methods (MD5/4, SHA-4 are all broken). "
hashes are not 100% reversible. Sure you can generate as many items as you wish which have an equal hash value (theoretically), but you have no way of knowing which one was used to generate the hash.
not to mention the abhorrant CPU power required to find hash collisions, as drn666 posted - HeroreV, on 10/12/2007, -0/+1That should say "hundreds of [br]s in a row". Digg also needs a preview feature. >_
- ToeCheese, on 10/12/2007, -0/+1Don't store Credit Cards and SSNs in a database. If you really need to store a SSN then encrypt it! Store the key in a file on with Root Only Access and then allow the web server process read access. If you are in a shared environment then don't take the SSN and store it.
Credit Cards: I wouldn't store this even if I had firewalls, encryptions and a watchdog! Just use the Credit Card companies' Recurring Transaction or something similar. - inactive, on 10/12/2007, -0/+1"if you need an article to tell you to not store SSNs (or any ID type of "number", CC#, ISBN, DL#, etc) as an int, perhaps you need to retake database 101."
What he said.
http://coffeetornado.com
Coffee - tankko, on 10/12/2007, -0/+1OK, here's the real question: Why the BLEEP are you storing my SSN. Really. This is a serious question. You should NEVER EVER give anyone you SSN unless it is related to axes. EVER. Anyone writing a database to store SSN that is not working for a bank is an idiot. And, as others have stated, it should be encrypted.
My god, no wonder identity theft is such a big problem. - LiquidPenguin, on 10/12/2007, -0/+0Dang, wrote this nice post and lost it. Oh well :-
launchpad90210 you really need to look at the problem and not go around attacking people on some principle of primary keys. Which, incidently, while *you* may not do math operations on, the database does.
Like whatever43 (and others) explained, you don't store things like SSN's as numbers. Other examples include phone numbers and zip codes.
From a number of postings here, I say most people don't even understand the data set in question or even how a database functions internally. Sorry to pick on you launchpad90210, but you seem to be the most vocal on this. If you want to present your argument logically, don't be an ***** about it. - soimless, on 10/12/2007, -0/+0yeah but is that leading zero really that important? SSN are 9 charters long if there are 8 charters then it's easy fix add a zero at the front.
- Rhomboid, on 10/12/2007, -0/+0While it may be true that you should not treat a SSN as an integer, the justification given is completely 100% bogus. "What if the user left off a digit?" That is why you have to validate all data before putting it in the database. You should NEVER let garbage like that anywhere near your DB, and if you do you will probably have much more serious errors in your program logic.
And anyway, even if you were too ignorant to validate your data before inserting it, in this case it really doesn't matter one bit in this case. The SSN-with-missing-digit is completely unusable anyway. It doesn't matter if it's mishandled by erroniously prepending a leading zero or not - it's not going to match in a query that is requesting the proper 9 digit SSN, because it's invalid data. If you somehow end up with such a case it doesn't matter at all what you do because the data is already *****. Garbage in, barbage out. - xerratus, on 10/12/2007, -0/+0My point was that I'm currently dealing with a project that is storing SSN's as ints, NON-ENCRYPTED which I agree is a BIG PROBLEM. But in this case, it's what I inherited -so I'm dealing with it. Yes, in a perfect world it would be encrypted. I can't hash is because it needs to be decrypted.
Knowing full well that I can't & don't want to rewrite logic written by somebody else that isn't here (unless you're going to get the bill) I have to deal with it and yes, we are having problems with zeros. I wrote the article to vent my frustration with what I inherited.
Many good points in some of the comments above. Remember, the point of the article is to THINK before you code. If a validator had been put in place by the original programmer, this wouldn't have gotten under my skin so much. The original programmer never thought that a SSN could start with a zero. That's it.
Just remember that if you ever run into a situation like this. Solve it however you want. - Rhinehold, on 10/12/2007, -0/+0should we have to add that you should be following the privacy act of 74 when dealing with SSNs as well?
- Launchpad90210, on 10/12/2007, -0/+0LiquidPengiun,
Ah, the old vague "other people don't know what they're talking about, but I do". Are you Tom Cruise hinting at your grand knowledge of psychology?
As to knowledge of database engines - if this wasn't so hilarious. See, the problem is that I work on the engine team for a large database vendor, which happened to be why this story caught my interest (and why the ridiculous ignorance in here is so offensive). People like yourself, who clearly don't have a clue and simply manufacture justifications, giving their "wisdom" on databases.
For the people actually trying to learn - ignore all of the nonsense in here. This article, and the "Experts" chiming in have offered very little in the way of legitimate advice, and instead they're exposing their ignorance. - Launchpad90210, on 10/12/2007, -0/+0Oh, and the observation about storing it as a number precludes length integrity checks - that's a flipping data entry business rule in the GUI (along with at the data layer it that's one's desire). It is a completely ridiculous reason to justify the string method.
- grennis, on 10/12/2007, -0/+0What a stupid article.
SSN's should ALWAYS be ENCRYPTED. Encryped values are not stored as ints or varchars, so what is the point of this?
Again... stupid article with a bad message that is so wrong it's almost funny. I just hope this guy is not actually running some e-commerce site out there with MY information in it. - Launchpad90210, on 10/12/2007, -0/+0neo,
I don't think he wanted to reverse his own Digg. I think the comment was more the ability to add a -1 to a story as your "vote", rather than being limited to Digging or not digging at all.
There have been several terrible articles that have earned hundreds of votes lately, and I suspect it's the Digg Exploit guy running his script. There is no way that even groupthink could be this stupid. - phr0ze, on 10/12/2007, -0/+0Well as someone with some learning experience with using SSN in a database, there are two things to mention. Don't limit yourself to only allowing or relying on SSN. Never say never but you should keep in mind foriegn ID numbers may be all that are available. The other thing I want to mention is after storing SSN with dashes for a while, it is better to filter out dashes and spaces prior to storing the value.
- inactive, on 10/12/2007, -0/+0>"SSN are 9 charters long if there are 8 charters then it's easy fix add a zero at the front."
>But what if for some reason people start getting issued with 10 character SSNs? Some people will have 9 character SSNs, some >people 10. If an SSN stored as an integer is 9 digits long, should you add a leading zero?
if it is that big a dealt then:
123-56-7890
022-xx-xxxx
if($ssn < 100000000){$stringssn="0" & $ssn}
elseif{$stringssn=$ssn} - inactive, on 10/12/2007, -0/+0OMG how hard is it to convert int fields to string and add 0 to all that is missing?
- ph713, on 10/12/2007, -0/+0
Here's a whole book that does the topic much more justice in a much more advanced light:
http://www.wayner.org/books/td/ - SavannahLion, on 10/12/2007, -0/+0For those of you suggesting that it's OK to store a SSN as an integer (nevermind the security of such a practice) need to retake basic Data handling. Like angedinoir writes in the very first post, the basic rules are very simple. If you plan on doing math, store it as an integer or appropriate float. If you're not doing math, store it as a char.
Though I don't necessarily agree with the logic that hardware is an issue nor do I agree that using varchar is ideal, quite a few of you have missed the point of the article entirely. Pretty sad for those Diggers who claim to be technically savvy. - geminitojanus, on 10/12/2007, -1/+1"Hash or encrypt and then base64 encode the binary string for safe entry into the database as a string or text value. "
It's not like most hashes aren't reversible these days with rather trivial methods (MD5/4, SHA-4 are all broken). Secondly, a social security NUMBER is a NUMBER. If they do go to having 10 digits (which is wholy likely... in 100+ years (10-digits would mean 9,999,999,999 is the highest storable ssn, which is currently roughly 4/3rds the planet's population)), then you simply extend the freaking data field, and zero-pad the entry like you should have been doing in the first place. Every, Single, Commercial, Database has an option for zeropadding an integer, not using it is like using your teeth to unbolt a nut on your bike; both might get the job done, but three guesses on which one's more painful.
For this reason alone, no digg. - Stopher, on 10/12/2007, -0/+0I think one point of the article is that not everyone out there is adhereing to what we call "common sense". Half the people in this post don't agree with it and don't think it's a big deal. I just think it's one of those things that makes your life easier later on.
- Mr.Scientist, on 10/12/2007, -1/+1If you really want, you can still store SSNs as 32bit-integers, with leading zeros. You need 30 bits to store a 9-digit SSN. If the SSN is shorter, you need fewer bits to encode the number, so you get more "spare" bits to encode the actual length.
For 9-digit SSNs, just store the SSN (which uses at most 30 bits, so the two most significant bits will be zero).
Make the two most significant bits 01 if the SSN is 8 digits long, add the SSN (which uses only 27 bits).
Make the three most significant bits 011 if the SSN is 7 digits long, add the SSN (which uses only 24 bits).
...
Make the nine most significant bits 011111111 if the SSN is 1 digit long, add the SSN.
Store 0111111111 if the SSN is 0 digits long (not entered).
9 minus the number of bits which are 1 directly after the most significant bit is the length of the SSN, remove these bits to get the SSN, add leading zeros as indicated by the stored length.
Yeah, that's what some programmers call "job security".
(Welcome Slashdot trolls. Switching to greener pastures?) - Midnightbrewer, on 10/12/2007, -0/+0"yeah but is that leading zero really that important? SSN are 9 charters long if there are 8 charters then it's easy fix add a zero at the front."
If you RTFA, you'll see that this doesn't account for the error case where a user accidentally leaves a number out. You might end up accidentally associating them with somebody else's SSN, and then you've got real problems on your hands. - waiwai, on 10/12/2007, -1/+1Damn, that was the most ignorant article I've read yet. He goes on about how hard drives and RAM are bigger so the size doesn't matter, LOL. The overall size probably has nothing to do with the decision.
A number is a fixed size unlike the varchar he recommends. So the size of the database record is constant instead of variable making the database much faster. Even if you used fixed length char, a number is a lot fewer comparisons to work with and index. So there's a huge performance difference that just flew over this guys head.
He seems to be having trouble validating his input data as well. Idiotic crap like he was complaining about should never get anywhere near the database anyway, but checked for immediately. - ateoto, on 10/12/2007, -0/+0"if you need an article to tell you to not store SSNs (or any ID type of "number", CC#, ISBN, DL#, etc) as an int, perhaps you need to retake database 101."
Yeah pretty much just stop working on databases at that point. - SweetsGreen, on 10/12/2007, -1/+0digg because people should know this
- Launchpad90210, on 10/12/2007, -1/+0"For those of you suggesting that it's OK to store a SSN as an integer (nevermind the security of such a practice) need to retake basic Data handling. Like angedinoir writes in the very first post, the basic rules are very simple. If you plan on doing math, store it as an integer or appropriate float. If you're not doing math, store it as a char."
This is _completely_retarded_, and a couple of code-monkeys that know absolutely nothing about database theory are giving everyone their brilliant expertise. The more correct rule is "if it's a number that you don't need to do string operations upon, store it as a number. If it's a string, store it as a string". If you're doing math...fk'n idiotic. The question of encrypting or not is a COMPLETELY different problem, and what we're talking about here is storing it unencrypted, so save the red-herrings.
The originator of this is obviously running a script to digg this, because there is NO WAY that this idiotic piece of tripe has enamored so many ignorant followers.
OMG...a lot of my identity primary keys I'll never, ever do math upon. I guess I'd better go convert them to strings, given that it's the "point" of this moronic "article". - billflu, on 10/12/2007, -1/+0Any good programmer will know this. If you aren't doing mathematical operations on it, don't store it as an integer (or other number format). The same is true for zip code and phone number.
- SkydiveMike, on 10/12/2007, -1/+0Where is the "undigg" button to remove some of the moronic diggs this has already received.
- Launchpad90210, on 10/12/2007, -1/+0"Any good programmer will know this. If you aren't doing mathematical operations on it, don't store it as an integer (or other number format)."
If you have no clue what you're talking about, don't talk. What do you think identity primary keys are? My god, they're NUMBERS. And strangely I never do mathematical operations on them.
Idiots propping up their own lameness are pathetic. - zubov, on 10/12/2007, -1/+0I can almost guarantee that anyone that actually dugg this article didn't read it
- Adam0431, on 10/12/2007, -1/+0they need an UnDigg.... Common sense stuff people
- inactive, on 10/12/2007, -3/+0SSN shouldnt be even stored online :(
Kiltak
http://geeksaresexy.blogspot.com - snowboarder, on 10/12/2007, -3/+0you learn something new every day, at least this is a better digg post than this:
http://digg.com/links/Changes_to_the_Digg_Story_Posting_Screen - angedinoir, on 10/12/2007, -3/+0Why did this get dugg? Why was this posted in the first place?
- neofactor, on 10/12/2007, -3/+0@adam0431:
There IS an UNDIGG... click on your history and there is plain as the ***** on your face.


What is Digg?
The Digg Toolbar for Firefox lets you Digg, submit content, and keep track of Digg even when you're not on the Digg site. Download the official