Archive for March 2007
JSON schemas/validation with CFJSON
Sunday, March 25th, 2007

I’ve made it no secret that I’m not a fan of XML. However, one of the advantages of XML that is constantly brought up is the ability to validate XML documentes using an XML Schema or DTD. I haven’t had much use for them myself, but seeing as many people consider this important, I figured there was no reason the same couldn’t be done in JSON. So I came up with a format for a JSON schema file (written in JSON, of course) and added a function to the CFJSON library to validate a JSON document based on a schema.

While I don’t have docs written for this, it’s pretty self-explanatory. Here’s an example of a schema that shows pretty much all the features I’ve put in it so far:

{
   type: "struct",
   keys: ["title","body","categories","start_date","active"],
   items: {
      title: {
         type: "string", maxlength: 20, minlength: 8
      },
      body: {
         type: "string", minlength: 1
      },
      categories: {
         type: "array",
         minlength: 1,
         maxlength: 4,
         items: {
            type: "struct",
            items: {
               id: {
                  type: "number", min:6, max:10
               },
               name: {
                  type: "string"
               }
            }
         }
      },
      start_date: {
         type: "date", mask: "mm-dd-yyyy"
      },
      active: {
         type: "boolean"
      }
   }
}

In a nutshell, your schema is always an object/struct and it must always have a “type” key set. Then you go on nesting more definitions under the “items” key if you have more complex data types like structs or arrays. The valid types right now are “struct”, “array”, “date”, “number”, “boolean”, and “string”. Each of these data types have some additional options, which are explained below.

struct
For a type “struct, you have two additional keys you can add. The first is “keys”, in which you can provide an array of the keys that this structure MUST have. The second is “items”, which is a structure with keys for each structure key you want to add validation rules for. The “keys” and “items” values are both optional. You can specify an array of keys without providing “items” and vice-versa.

array
An “array” type can take 3 additional parameters, “minlength”, “maxlength”, and “items”. The first two are pretty self-explanatory, they check for a certain array length. The last is similar to the “struct” type’s “items” key, except inside that structure you do not list validation individually for keys since arrays are just numeric. So you simply put in the structure you want for all the items in the array.

date
The only option for the “date” struck type is “mask”. In it you can specify a date mask that you want the date to conform to. The possible masks are the same as the ones used by ColdFusion’s DateFormat function, although this could change because JSON is a universal format not restricted to CF.

number
A “number” type provides additional options of “min” and “max”, which allows you to specify minimum and maximum values for a number. Note that right now the “number” type allows for floats too, this will eventually be updated to allow for validating for integers, floats, unsigned numbers, etc.

boolean
The boolean type has no additonal options, although eventually it’ll probably allow for choosing what can be considered a valid boolean.

string
The “string” type can specify a “maxlength” and “minlength” option that will make sure the string is not longer or shorter than the values specified. Note that the “string” type will actually accept booleans and numbers. Right now I’m not seeing a way around it given the way I implemented things, and I’m not sure it’s really a big deal.

After defining the JSON schema standard I added a validate() function to CFJSON that validates a JSON document based on a schema conforming to the format described above. I also made a small example, creating a JSON document that conforms to the schema above and running the validate function against it. There’s a link to download the example below, all you have to do is change some values in the document or in the schema to see the validation working. There are a few more options that are probably easy to figure out looking at the code, such as the “errorVar” and “stopOnError” arguments. If anybody has any feedback on this I’d love to hear it.

DOWNLOAD CFJSON DOCUMENT VALIDATION EXAMPLE (includes the latest CFJSON)
FIND OUT MORE ABOUT CFJSON

Finding Similar Text and Words
Friday, March 23rd, 2007

This is one of those times when I discover something that I probably should have known for a long time, and I certainly wish I had. And if you’re one of my friends and you knew about this and didn’t tell me, may you rot in hell lying in a bed of nails covered with black flies. Now that I got this out of the way, on with the article…

I decided to look for some kind of algorithm that would allow for matching words that are similar. This would inevitably be to retrieve records from a database based on some search criteria. So I search google and find something about a Soundex algorithm. The algorithm goes as follows (copied straight from Wikipedia):

  1. Retain the first letter of the string
  2. Remove all occurrences of the following letters, unless it is the first letter: a, e, h, i, o, u, w, y
  3. Assign numbers to the remaining letters (after the first) as follows:

    • b, f, p, v = 1
    • c, g, j, k, q, s, x, z = 2
    • d, t = 3
    • l = 4
    • m, n = 5
    • r = 6
  4. If two or more letters with the same number were adjacent in the original name (before step 1), or adjacent except for any intervening h and w (American census only), then omit all but the first.
  5. Return the first four characters, right-padding with zeroes if there are fewer than four.

But that’s not the good part… The good part is that this algorithm is implemented in some DBMS systems. And apparently, you guessed it, it’s implemented in the most popular ones, SQL Server, Oracle, and MySQL. How does it work? It’s oh so difficult… Check out the code below:

SELECT *
FROM address
WHERE SOUNDEX(city) = SOUNDEX('Washgton')

If you have any records in the database for Washington (note in the query it’s missing the “i”) it will be returned. Wonderful! I could probably have used this before. And for those who have the possibility of adding UDFs to their DB server, there are implementations of other algorithms such as Metaphone and Similar_text.

Note that Soundex is a phonetic algorithm, so it looks for words that would sound similar. So you might not always get the results you want. When I searched my person table using SOUNDEX(first_name) = SOUNDEX(‘Tomas’) I did get a bunch of “Thomas” records back, but if I use SOUNDEX(‘Thomas’) I did not, I got a bunch of “Tom”, “Tommy”, and even “Tony” records, but no “Thomas”. Oh well, still better than nothing. I bet that using a combination of different algorithms you can probably get some good results. More research to be done…

Suppressing whitespace in ColdFusion
Friday, March 16th, 2007

One of the things about ColdFusion that I don’t like so much is its inability to conveniently suppress whitespace. It’s ridiculous, there are like 3 million tags/attributes/options that are related to whitespace, I seem to discover a new one all the time and it invariably fails to do what I want. First, there’s <cfsilent>. The result is pretty straightforward, it kills all output found in between it. Then there’s <cfsetting enablecfoutputonly=”Yes”>. That supposedly suppresses things that aren’t inside of <cfoutput> tags, but my experience has shown that it doesn’t pick up everything. Then there’s <cfprocessingdirective suppresswhitespace=”Yes”>, which I never use, but I tried it recently for something and it was useless. Plus from what I read it’s the worse performance-wise.

Besides the tags, you can set output=”no” on both your <cfcomponent> and <cffunction> tags, and it’s often necessary to do so. Finally, there’s an option in the Settings section of the CF Administrator that you can use to let CF manage whitespace suppression.

So I had a whitespace problem the other day and I was trying to get rid of it by all means necessary. I have all these options right? Something’s gonna work, right? Wrong! None of these things did it for me. I had a function in a component that you pass some arguments to and it outputs a <select> field populated with <option> tags. So I was calling the function something like this

<cfsetting enablecfoutputonly="yes" />

<cfoutput>Label: #selectBox(options)#</cfoutput>

<cfsetting enablecfoutputonly="no" />

This was part of a larger framework, but everything was wrapped in a <cfsetting enablecfoutputonly=”Yes”> and all components and functions had output=”no” where possible. I was really starting to get pissed when I had one of my Einstein moments. I took the code out of the <cfoutput> and changed it to the following and all was well:

<cfsetting enablecfoutputonly="yes" />

<cfoutput>Label: </cfoutput><cfset selectBox(options) />

<cfsetting enablecfoutputonly="no" />

Presto! Like magic it worked. I’m glad I found a workaround, but I still think it’s pathetic that CF can’t handle this for me.