Monday, December 29, 2008

Single Responsibility Principle

I really do like what Robert C. Martin has to say about OO but I just don't agree with his position on the Single Responsibility Principle. It sounds like a good principle but then when I try to apply it in my mind, problems arise.

Let's look at a few of the classes he ends up with in his paper on SRP. The Connection interface has two methods: dial (which takes a string) and hangup. Does this class really have only one reason to change? No, there are still a number of reasons why this class might need to change. There might be different ways of hanging up ("politely" for valid connections and abruptly for would-be hackers). There might be different ways of dialing (with or without the speaker, supplying credentials, selecting a protocol, specifying minimum modem speed, etc.). There might be modems that never hang up.

I would posit that most classes have more than one reason for change. Here are a few examples.

Rectangle - the data types for the coordinates may change (16-bit to 32-bit), we may want polar coordinates instead of Cartesian. We may even want non-euclidean rectangles.

String - The character set may need to change. The storage mechanism may change (e.g. list, array, linked blocks of memory). The allocation scheme may change.

Just as easily as someone can say there is only one reason for a class to change, someone can find out multiple reasons for that same class to change (with very few exceptions). It's quite subjective.

If you could split all your classes into single responsibility objects, you'd have a huge number of classes that each did very little. To do anything useful, you would need to enlist the help of many classes. This makes for poor comprehension of the system and poor usability.

I want there to be a one to one mapping between concepts and classes. If I have a concept of a file, and I want to read and write from a file, I will expect there to be a file object with read and write methods. If I have to create input streams and output streams and file-opener objects, I will not enjoy the experience. I want a file object. I don't care if, behind the scenes the file is using an input stream and a output stream and a file-opener. I just don't want to see them or need to know about them.

With this one concept per class approach, there is a greater chance of complexity if precautions aren't taken. To combat that complexity, I would consider breaking up the main concept into sub concepts (e.g. input, output, seeking, etc.) and then coding each sub concept as a separate class and then hide that class behind the facade of the main concept's class.

This approach also helps with testing. If each of the sub concept classes can be mocked and injected into the main concept's class, the class is easy to use and easy to test.

You still need to guard against combining incompatible concepts into one class (e.g. file management with searching directories) mostly to avoid confusing the user. Remember that class size and complexity can be managed by decomposing the main concept's class into other classes.

Monday, May 12, 2008

Measuring Talkativeness With Big O Notation

In addition to its commonly known uses, Big O Notation can also useful for describing a person's talkativeness. Given the number of words you say to a person, their Big O factor tells you how long their response will probably be.

The stereotypical male's Big O factor is O(1). Regardless of what you say to him, you get a constant response (which may or may not be a grunt).

A cooperative conversation would be some where in the neighborhood of O(n) meaning that both parties are contributing equal amounts.

Then we get to the chatterbox who has a Big O factor of O(n^3). I get a little nervous when I encounter a person with this kind of Big O factor. I know that even if I grunt, that will spawn a long and drawn out discussion on some topic.

I used to know a guy who was somewhere around O(n^n!). I really dreaded seeing him.

What gets dangerous is when you have two O(n^n) (or worse) folks talking with each other. Each one's response is amplified by the others until they're both talking over each other and they pass out from buffer overflows.

I've been too general up too this point. I know that based on the topic of conversation, I have different Big O factors. Here is a list of topics and my Big O factor for each topic.

  • Sports - O(1) (yes, I will grunt).
  • Politics - O(n/2)
  • Computers - O(n^2)
  • Clothing, Shoes or Purses - O(1)

Thursday, May 1, 2008

SharePoint : The Perfect Storm

Wikipedia defines a perfect storm as "the simultaneous occurrence of events which, taken individually, would be far less powerful than the result of their chance combination."

Rather than events, in the case of SharePoint, I see the elements of the perfect storm being conditions. Those conditions are:

  • SharePoint is widely accepted by CEOs, CTOs, architects, and users all around. SharePoint's user experience is very nice for the most part. I think that the product brings a lot of value to users. This is a good thing.
  • SharePoint does a lot just out of the box. It has all sorts of useful templates for sites and it has common workflows. However, beyond a certain point (as with most if not all software) SharePoint cannot do what you want without customizing it with managed code (C#, VB.NET, etc.). This too is a good thing.
  • SharePoint's developer experience is terrible. I won't go into details.
So on the one hand you have CEOs, CTOs, and architects pushing for the use of SharePoint in the company, and on the other hand you've got a terrible developer experience. The eventual result is a shortage of SharePoint developers and an increased demand for them.

Will enough SharePoint developers leave SharePoint that Microsoft will be pressured into improving the developer experience? Or will there always be enough SharePoint developers to squeak by?

CTOs and CEOs who select SharePoint and need to customize it might start feeling the effects of this perfect storm when they have a harder time hiring and retaining developers, and then have to pay them more.

If you are in the software development business for the money, SharePoint development could be a great cash cow. The only problem is that you have to develop in SharePoint.

The next time you hear a technology is hot, you may want to find out why the technology is hot.

Tuesday, April 29, 2008

The Forces of Characterization and Cohesion

Often times the first stage of enlightenment is when an original and revolutionary idea comes to me and grabs my attention. In most of those cases, the second stage of enlightenment is when I realize that I had actually heard that idea from someone else.

Regarding my latest "enlightenment", in the area of class names, I can't remember all of the sources of inspiration that came together but I experienced it while I was reading Kent Beck's book "Implementation Patterns". The book is a collection of principles that promote improved code readability.

While reading about the importance of selecting names for classes, a couple of principles that I had heard before came together into a single picture.

That picture is made up of two forces.

The Force of Characterization

The first is the Force of Characterization. It exists between a class' name and the class' code. If, and only if, the class' name properly characterizes what the class does, the Force of Characterization is in harmony.

For example, if the class' name is Account and the code for the Account class does currency conversion, the name Account is a mis-characterization of the class' code.

The Force of Characterization moves you to to examine the fit between the class' name and what the methods do and, where the fit is bad, it causes you to adjust either the name or what the methods do to bring the Force of Characterization into balance.

The Force of Cohesion

The second is the Force of Cohesion. Cohesion exists in a class' methods to the extent that the class' methods work together for a single purpose.

If some of Account's methods provide account related operations (e.g. deposits, withdrawals), and other methods perform currency conversion, the methods are not cohesive.

The Force of Cohesion moves you to examine the class' methods and how well they work together to a single end. Bringing harmony to the Force of Cohesion is done by moving methods and perhaps data from one class to another. It may also involve splitting up a single method into multiple methods.

Interplay Between the Forces

Where this gets interesting is when these two forces start interacting with each other.

As you change a class' name, you may be changing its purpose. For example, changing the name of a class from CustomerInfo to Account significantly changes the expectations that a user would have for the class' code.

Because the class' methods should be acting towards one purpose (the class' single purpose), changing the name of the class can affect the cohesiveness of the class' methods. The single purpose that the class' methods were working towards before the class was renamed might have been redefined into two or more purposes.

In other words, when you change the name of a class in an attempt to harmonize the Force of Characterization you may temporarily introduce disharmony with respect to the Force of Cohesion. The converse of this is also true.

Sometimes a disharmony in one force can be resolved without introducing disharmony in the other force. However, I imagine these two forces occasionally going through cycles of harmony and disharmony, in and out of phase with one another, until both forces are in harmony, and a stable state has been reached.

Conclusion
  1. Changing a class' name can move you to change the class' methods to restore harmony in one of the two forces.
  2. A class' name is important when adding features; it helps you answer the question, "Does this method/data member belong in this class?" If you achieve harmony between the two forces, the class name will be a helpful guide as you read and maintain the class' code.
  3. Changing a class' name may prompt you to rename methods to be more consistent with the name.
  4. A class name that is a good characterization of the class' methods can actually assist you in maintaining the cohesiveness of the class by making un-cohesive code more obvious.
Question(s):
  1. Are there other forces that interact with class names and cohesion in interesting ways?

Thursday, April 24, 2008

A Different Perspective on Web Applications

Notice that I said this perspective is different. I intentionally left out words like: smart, useful, good, and many other positive adjectives.

My goal for taking this perspective is to see if it fits, what problems it solves (or creates), if it makes reasoning about or building web applications any easier. The good thing about perspectives is that you can try them on, look around, and then take them off (or even discard them).

The perspective that I'm taking off for this post is the one that states that a web application extends from the user in front of their browser all the way back to the data layer. That perspective is nice and has been around for a while now. But it has some rough edges that I'm trying to smooth out.

The Rough Edges

1) Data Validation is spread throughout all three layers of a three-tiered web application architecture. I wrote about this here.

2) Presentation Layer activities are performed in two places: in the browser via JavaScript, CSS, HTML, etc. and on the server when the HTML is dynamically generated (e.g. JSP, PHP, etc.).

Perhaps these rough edges cannot be avoided and all my attempts to do so will fail. Maybe I'm the rough edge and I just need to accept that the nature of the web requires that similar things need to happen in different places. Or maybe I can just change the way I look at the problem so that the rough edges disappear.

Here is the different perspective. What if we equate a web application to a standalone desktop application like such:

Selecting the Application to Run

To start a desktop application, you run a program at a certain location (its path) with optional arguments

To start a web application, you enter a certain location (its URL) into your browser with optional arguments.

Application Start Up

When you start a desktop application, one or more resources are loaded from the disk into memory, a user interface is presented, and an event loop is entered.

When you start a web application, one or more resources are loaded from the web server into the browser, a user interface is presented, and an event loop is entered.

Additional Resources

As the desktop application is asked to do different things, it may load additional resources (e.g. user interface elements, XML files, DLLs, etc.) from the disk.

As the web application is asked to do different things, it may load additional
resource (e.g. HTML, JavaScript, CSS, images, etc.) from the web server.

Changing State

The desktop application user wants to save their changes so the application the writes the changes to disk.

The web application user wants to save their changes so the application sends the changes to the web server.


After drawing these parallels, would you say that a Desktop Application extends from the user in front of the screen all the way to the disk? Probably not. Yes, the disk is involved, but that's not part of your application proper.

So could we say that a web application starts at the user in front of the browser and ends inside the browser?

Is that too much of a stretch? Can we really equate the Internet, our web servers, and back end databases all to a storage device? I'm going to try. Maybe the analogy will work, maybe not.

Implications

A number of interesting things happen once we say that a web application is just what is going on inside the browser:

1) There are no more tiers to think about on the browser side (see also Tautology). Perhaps we can simply adopt a Model/View/Controller approach in the browser. It's perfectly okay to have data validation in an MVC application so I don't need to worry about spreading data validation through out all three tiers (yes, I am totally ignoring the web server and database layer for now). The browser is not self-sufficient; just as the desktop application loads resources from disk, the browser application will load resources from the web server.

2) The web server becomes less user-serving and more application-serving. A page served by the web server is not as much a page that the user will see as as it is some user interface resource that is loaded by the application. The difference is subtle but I'm going to note it anyway. Perhaps it's important.

3) The web server gives up much of the control it had once we start putting more focus on the browser as the web application. Many web server applications of old carefully created their HTML pages so that every link and button would initiate the proper request back to the server. If the browser is the web application, the JavaScript code can play a much larger role in deciding what resources to request.

I have really tightened my focus to the browser client and have forgotten all about the web servers, and database servers. In doing so, I've seen some nice properties emerge on the client side. For one, I can reason about the client in isolation. But have I created a terrible mess on the server side? Let's see.

What Does the Server Look Like Now?

As I start this section, I've isolated the browser client and abstracted away the rest of the application. Now I've got to deal with the Application Layer and the Data Layer. I'm starting to wonder if my attempt to gain a different perspective on three-tiered architecture will land me right back at a three-tiered architecture.

Let me do a roll call to see what we have left on the server.

Presentation Layer. Check. Granted it's not really presenting things to the user (as I pointed out earlier).

Application Layer. Check. But couldn't we put all of the Application Layer on the client? Yes we could, but should we. Browsers with good debuggers allow users to modify the DOM and the JavaScript environment. As a safety precaution, we need some code that the user can't tamper with.

Data Layer. Check. We haven't gotten rid of that either.

Conclusion

I tried on a different perspective. At first it seemed promising, and then the three tiers appeared again on the server side.

In addition, the rough edges remain although the presentation rough edge is slightly different as we have two different parties we're presenting to (i.e. the browser application and the user).

I like the way that the client looked in this perspective. The only way to see if it really works though is through using this perspective to develop some web applications.

Monday, April 21, 2008

Objects and Data Validation

I've written a lot of objects that do data validation but I have yet to come up with a standard approach to data validation for objects.

When should the validation occur?

I have usually taken the approach that as soon as something goes wrong, I want to know about it. If you call a setter method passing in invalid data, I want to catch it immediately and reject it by throwing an exception.

Ken Pugh, in his book Prefactoring mentions the use of specific data types. He would say that your setPhoneNumber() method should not take a string but it should take a PhoneNumber object. The PhoneNumber class' constructor would parse whatever string you tried to initialize it with and throw an exception if you didn't pass it a proper phone number. In this way, you can't even pass an invalid phone number to setPhoneNumber().

Delayed Data Validation

There are some cases where delayed data validation is the only way to go. Consider a very contrived but simple class whose "lower" attribute must be the lower case version of its "upper" attribute. If "lower" is 'a' then "upper" must be 'A'. If the class provides only setLower(char c) and setUpper(char c) these methods cannot do data validation. Either the class must supply a setBoth(char upperC, char lowerC) method or there must be a way to delay the data validation.

A less contrived example would be a Location object that has a City and a State property. If the object tries to make sure that the City value is always a city in the State value and that the State value always has a city by the name of the City value, users will find it difficult to use.

Not A City

Let's try something. If you had a special City value (e.g. NaCity resembling NaN for not a number) and a special State value (NaState) and setting the City would always set the State to NaState and vice versa, the validation could be simplified. If the City value is NaCity don't validate it. Hmm... but now you can have locations that don't contain City or State information. That doesn't sound very valid.

Try To Be More Accepting

Let me make things a little more complicated. Alan Cooper in his book "About Face" talks about data validation in the user interface. He thinks that a user should be able to enter incomplete information. Why should you have to discard your work if you don't have every required field filled in? Perhaps data validation should only be done in select situations? For example, I should be able to set attributes in my Customer object to whatever the data types will allow and save that invalid data. But when I want to send out some invoices to my customers, I should only send out invoices to customers that pass data validation. That way, I'm not sending mail to Fooville, or to Mars. This approach would probably require some report or view that showed all customers who were in an invalid state. That way folks can keep an eye on all of the Customers they can't yet bill.

Reporting what is actually wrong might involve a wee challenge. A Customer object might fail validation due to one of the Customer's aggregated objects. It's up to the programmer to make sure that the source of the validation failure is reported correctly.

Taking this on demand approach to validation in the business logic tier and not in the data tier puts you in a bind. Now you've got all of these invalid objects in memory, and you can't save any of them because of some database constraint. The moral of the story is to synchronize your data validation approaches throughout all of the tiers of your application.

Changing My Mind

The more I go on about data validation, the more I like the on demand approach. The nicest thing about it is that it separates data modification from data validation. You're always performing validation on stationary data.

Now when I put on my user hat, I like to know if I've made a mess of something even if I can't fix it at the moment so flagging a Customer as invalid in the user interface is probably a good idea.

There are some situations where this on demand approach to data validation won't work. For example, if we were writing a code generator for Eclipse, and we allowed the user to create a class named "<:^)" we would be doing them a disservice as the code would not compile and there may be many references to "<:^)" that they would need to change.

One Way of Data Validation

I'm sure there's not One True Way of Data Validation. You may have a legacy database that has bullet proof constraints and is guarded by German Shepherd attack stored procedures. You may have an object model that must be 100% valid at all times (e.g. Air Traffic Control systems, SDI, a super safe lethal injection device, etc.).

For future projects, I think that I will start off with the on demand approach and see if that works.

Exercises Left to the Reader

1. Consider mixed models (e.g. strict data validation mixed with an on demand validation approach). Would the mixture of data validation models be too confusing?

In a Three-Tiered Architecture Where Should Data Validation Go?

I have not had much experience with three-tiered architectures so I'm working through some of the basic concepts that are tripping me up.

Right now I'm considering where data validation should take place. That sounds like something that would fit in the business logic tier. But there are benefits of putting some form of data validation in each of the three tiers.

Presentation Tier

When writing a web application, if you can avoid a round trip, you might want to. You might not want to avoid the round trip if:
  • the cost (development or throughput) of avoiding the round trip is prohibitive.
  • the latency is expected to be low (e.g. intranet applications).
One way of avoiding that round trip is by performing some of the data validation on the client-side.

Most of the data validation I've encountered is very narrowly focused. For example, the user's name is validated separately from the user's age. I don't think I've ever seen cross-field validation (e.g. "A customer with status X must have a account balance below Y."). Does this suggest that the client-side validation should just be a first line of defense against invalid data?

It seems like data validation is at home in the Presentation Tier.

Business Logic Tier

Here is where data validation naturally belongs (right?). You've got a bunch of Customer objects that are being manipulated in memory. When do you want to know that your Customer data is invalid? When you try to save the Customer data to the database? Or would you rather know that the Customer data is invalid the moment you attempt to modify a Customer object in an invalid way? Personally, I like knowing that something went wrong as soon as it goes wrong.

It seems like data validation is at home in the Business Logic Tier too.

Data Tier

What good DBA would ever trust all clients of his/her data to keep that data in a valid state at all times? Enter database constraints designed to maintain the data's validity. The database constraints are the last line of defense for data validation.

It seems like data validation is at home in the Data Tier too.

Questions

The validation performed in the presentation tier is probably different from the validation performed in the data tier and the business logic tier. How does one get a good idea of what valid data is if the validation is spread over three tiers?

It would be useful if I could specify what makes the data valid in one place in the form of multiple validation rules and then for each validation rule specify the tiers where they should be enforced. It seems like this would be a fairly challenging problem to solve without some sort of framework.

Saturday, April 19, 2008

How JavaScript Stirs Up The Three-Tier Architecture

A Place for Everything and Everything in it's Place

When it comes to designing web applications I thought I knew where everything should go. I thought that business logic should go in the "logic tier", and that user interface stuff should go in the "presentation tier", and that the data would be stored in the "data tier" (see three-tier Architecture). If I had been asked, "Where does JavaScript fit into all this?", I would have said that it is useful for providing basic client-side validation a la, "Customer name cannot be blank.".

Enter JavaScript

Within in the last six months or so, I've discovered two things that have shaken up my three-tiered perspective.

First, JavaScript (or at least the way it is being used) has matured quite a bit. Now those who have been following JavaScript carefully for a while will be able to calculate how deep my head has been in the sand from this last statement.

Second, a number of powerful JavaScript libraries/toolkits have been produced (e.g. JQuery, Dojo, Script.aculo.us, Prototype, GWT, etc.). I'm sure that some of these libraries have been around for a long time, but I am only discovering them now. And I'm having a great time playing with them.

The Power of JavaScript

These JavaScript libraries go a lot further than just telling the user that the "Customer name cannot be blank." They provide serious event handling, cool effects on top of JavaScript's ability to create objects with methods and properties. With all of these features, you could potentially move all three tiers of the three-tier architecture into the browser client. At least you could if you don't mind the total lack of persistence. If you want persistence, then you do have to communicate with a server (unless you find that storing your data in cookies works for your application).

Where Does Everything Go Now?

So now I'm confused. I like being able to split things up into pieces and then put the pieces somewhere. If I try to split up my web application according to its tiers, I find that JavaScript can play in all three tiers although it's typically only used in the browser. So we have some dissonance between the physical tiers and how client-side JavaScript can be used to augment each of the three tiers:
  • JavaScript can create and manipulate objects just like the data tier.
  • JavaScript can run business logic on the client side. This can reduce round trips to the server.
  • JavaScript can present the data to the user in all sorts of ways.
When JavaScript deals with all three parts of the three-tier architecture some interesting results occur:
  • The logical tiers are now split across the physical tiers (the client and the various types of servers).
  • Code may be duplicated in different places in different languages. Customer validation in the client would be in Java while Customer validation on the business logic tier would be done in Java or C#. Copy/Paste Programming is bad but try to imagine Copy/Translate/Paste Programming.
  • Thanks to JavaScript debuggers that can be embedded in a browser, users can inspect, the DOM and make changes to it. This has a number of security implications. A JavaScript application can't always trust its own DOM.

Simplifying the Question

Let me back up for a minute. Should JavaScript be used in all three tiers (on the client side)? I think that it's safe to say JavaScript should not generally be used in the client-side data tier. The data tier is all about persistence and JavaScript has some significant limitations in the persistence department. It's very clear that JavaScript is a very natural fit for use in the presentation tier. This leaves only the question of whether or not JavaScript should be used in the logic tier? And if so, how?

I think that JavaScript brings a lot of benefits to the client-side logic tier. Unfortunately it can only do so safely if the server mirrors that business logic and distrusts the client-side logic. Since JavaScript is so dynamic users can tamper with client-side code and make the business logic do anything they want.

Code Generation to the Rescue?

I wonder if code generation is a partial solution to this. The book, "The Pragmatic Programmer" suggested that if code/data does need to be duplicated, then there should be one source representation of that code or data and all duplicates should be generated from it.

The wonderful thing about JavaScript is that even if client-side code and server side code are generated from a single source, they will always be in synch because the client is always downloading the most recent version of the JavaScript.

Conclusion

I'll spend some more time thinking about this and then write up some thoughts. I'd be very interested in hearing how you approach this.

Thanks.

Friday, April 18, 2008

The Free Stuff You Get From {Gr|R}ails

I'm learning Grails (before that it was Rails). Up until recently, I've been really impressed by what these frameworks give you "for free": ORM, pages for performing CRUD operations on the data, one or more application servers, etc.

But as I consider the usability of the software that these frameworks give you for free, I start wondering if some of this free stuff is worth it.

The more practical side of me has been confronting the side of me that was initially so giddy over the power and RAD-ness offered by these frameworks.

Here's one recent exchange between my practical side and my giddy side...

Giddy Tim: So with just a few commands and some editing of a Customer.rb (or .groovy), I can show users a page that lists every Customer in the database!

Practical Tim: What kind of user wants to see every Customer in the database? Sure if you're interested in one of the first Customers in the list, you might be happy with the list but what about every one else? Everyone else has to some how get to the Customer they're interested in. What your users probably want is a search feature so that they can search for a particular Customer. They probably would like a list of the Customers that they deal with the most, you know, their favorites. Or maybe they want to narrow the field by listing all Customers in a particular Category.

Giddy Tim: You've got some good points there. At least I have an edit page that shows the Customer and a place where I can briefly list, but not edit or view the details of, the Customer's Orders.

Practical Tim: So you have to go to another screen to edit or view the Orders?

Giddy Tim: Yes, at least one screen. There's the "View Order" page, then I click the "Edit" link and I'm taken to the "Edit Order" page. Then when I'm done editing, I click "Save" and then I'm redirected to the "View Order" Page again. Then, if I customize the Controller, I can have it redirect me back to the "Edit Customer" Page of the Customer whose Order I wanted to edit. Even though the system knows that there is a one to many relationship between Customers and Orders there is still a one to one relationship when it comes to pages. Each "Edit" page edits a single record in the database. Simple, eh?

Practical Tim: With a powerful JavaScript library and Ajax couldn't you easily allow the user to edit Customers and their Orders all from the same page? That seems like a better user experience to me. Alan Cooper in his book "About Face" says that it's a good idea to allow users to edit data where it's displayed. It sounds like these frameworks create an unnecessary distance between editing and viewing, and part/whole relationships.

Giddy Tim: So to offer the users a good user experience, it sounds like I'll need to modify the automatically generated code extensively.

Practical Tim: Yep, see you around Giddy Tim!

Giddy Tim: Uh, just call me Tim.

I will continue to use Grails, and Rails, and maybe even Merb, but I will use them with open eyes knowing that once the frameworks have generated code for me, I've probably still got a lot of work ahead of me.

P.S. I'm sure that the code generation engines can be improved to generate more usable interfaces (although they may need some hints and guidance from the programmers).

Thursday, April 10, 2008

A Very Sad Book Title

"Managing Catastrophic Loss of Sensitive Data"

When I hear this title and think of who would buy the book, I imagine some guy with tear-stained cheeks, who has only recently arrived at the final stage of grief (acceptance), sheepishly approaching the checkout at the bookstore, getting ready to say, “It’s for a friend” to the cashier.

I feel for this guy.

Wednesday, April 2, 2008

The SharePoint One-Question Personality Test

Do you like developing in SharePoint?

If you answered "yes"...
  • You are not bothered by frequently performing repetitive tasks.
  • You do not mind expending large amounts of energy in order to receive small rewards.
  • You can handle large amounts of cognitive dissonance.
  • You like swimming in uncharted, undocumented waters.
  • You don't need constant reassurance that your code works (e.g. unit tests).
If you answered "no"...
  • You prefer to automate any repetitive tasks.
  • You want your energy investment to be close to the expected rewards for that investment.
  • You like it when things are logical, orderly and, just make sense.
  • You like navigating waters that have been mapped and documented.
  • You like being constantly assured that your code works (e.g. continuous integration and unit tests).
Just a theory but I think that these two sets of answers align with two categories of people. Those who answer "yes" seem to fit into the thrill-seekers category. Those who answer "no" seem to fit into the more cautious category.

Monday, March 3, 2008

Microsoft SharePoint Development and Tractor Pulls

If you don't know what a tractor pull is, you might want to check out Wikipedia. If you're a SharePoint developer, you are probably more familiar with tractor pulls than you might expect.

The basic concept of a tractor pull is that you attach a big weight on wheels to a vehicle and the vehicle starts moving forward. As time progresses the weight being towed shifts in such a way that it gets harder for the vehicle to tow it forward. After a while, the vehicle cannot even move.

So how is that like SharePoint development? If you do all your development in the production environment, then you probably aren't experiencing the "Tractor Pull Effect". There are other effects that you're more likely to encounter (e.g. the "Who deleted all of the Customers Data? Effect", or the "Users keep asking why the system is down Effect").

But if you've carefully set up a Development, Test, and Production environment, you probably already know the "Tractor Pull Effect".

When you need to take your changes that you made in Development and promote them to the Test environment, SharePoint tools only take you so far. Where I work, we've got some "extra special" tools written by a SharePoint genius that automate much of the promotion process. These tools are not perfect or automated (yet).

Even after these tools run, there is much work to be done to the Test environment to make it mirror the Development environment. Web Parts need to be reconnected, connection strings in charts need to be set to point to the Test database servers. Then the site needs to be manually checked to make sure that the promotion process didn't hiccup (which it does on occasion).

Then when you want to deploy from Test to Production a different set of rules apply. The promotion process has a tendency to delete data (which is fine in a Test environment, right?). These different rules add complexity and additional effort because now you're redoing a lot of work in production that you did in Development.
A Short Digression. Maybe every single list, site definition, and page you deploy to SharePoint is done via a SharePoint "feature" so you're not experiencing any of the problems I'm talking about. I've also done that style of development and it appears to be much more clean and reproducible. My frustration there is that if I want to add a link to a navigation area, I have to create a feature folder with two XML files in it and then install and activate that feature. That's a simple task. Creating a custom list took me days the first time I tried it. That is nowhere near agile. I would imagine that most SharePoint shops are somewhere between the do everything in the user interface approach and the do everything via a feature approach... Digression over.
So the more features you add to your product, the more setup, configuration, and rework you end up doing as you promote your work from Development onward.

In a tractor pull, the weight eventually ceases all forward motion. I'm not looking forward to the point where we have so many features, that all of our forward motion ceases. Hopefully before that point, the "extra special" tools will work 100% of the time and they will be automated and I'll have nothing to complain about.

Thursday, January 31, 2008

Database Constraints and Usability

I've been playing with Grails and have not been satisfied with the user experience of creating an object that is on the one side of a one-to-many relationship.  You must first create and save the object on the one side of the relationship, then edit it to add objects on the many side of the relationship.  Almost certainly, that is because a relational database cannot save the objects on the many side until the object on the one side has a row ID.

I don't like making the user suffer just to appease the relational model.  So what if I create and save the object on the one side at the moment the user requests to create one.  That way it has its row ID and the user can add objects on the many side.  Unfortunately, database constraints may prevent me from saving my object on the one side of the relationship.

I've read Alan Coopers' book, "About Face" and it has changed the way I view a number of things in the world of software.  It's been a while since I've read the book so maybe this post's topic was covered in "About Face".

The facts:
  1. Databases can have constraints to make sure that the data is "valid".  For example, in a customer billing application, a constraint on the customers table might say that all customers must have a mailing address.  This constraint would prevent you from entering a customer in the system that did not have an address.
  2. Sometimes users don't have all the data and it would be inconvenient to force them to enter all the data or none of it.  Imagine filling out 80 fields in a form and then trying to save the data to the database only to be told that one field cannot be empty.  At that moment, you can either discard all of your work (potentially inconveniencing your customer who is providing the data to you) or entering some made up value into the form (possibly confusing the system or other people who interface with the system as in the case of a made up phone number or street address).
At the moment, I'm considering having a boolean/bit column in some of my tables whose name is 'valid'.  Then whenever a business object tries to persist to a row in a table, it performs validation and sets the 'valid' field to the result of the validation.  All other fields are also saved at that time. 

The user can be notified that the data they entered was invalid and will be ignored by the system until one or more errors are fixed.  If you put a 'saved_by' field in the row that contains the user name of the last person that saved the row, you could periodically ask the user to correct their data.

Most of the queries in the system would only seek rows where the 'valid' field was true.

Database constraints are useful because they actually prevent the data from being invalid (as much as a constraint is able to anyway) and that is their strength.  Removing all constraints and using the 'valid' column does leave the database open to containing invalid rows whose 'valid' field is true, but only if you go around the business objects.

Perhaps database vendors will see this problem and introduce constraints that can allow invalid data to be stored as long as it's marked as invalid.  Maybe then my mail will stop going to 101 Nowhere Lane.

Saturday, January 19, 2008

Random Thoughts and Questions

Some random thoughts and questions....

If two objects of the same class have different rules, how did they get those different rules? Was there some object factory that gave them different rules? If other objects modify an object's rules, is that a violation of encapsulation? If an object is the only entity allowed to modify its rules, that means that it must know about all possible rules it can have within the domain. For some domain types (e.g. account types) where there is a closed set of kinds, this might not be a bad idea. For open ended domain types (e.g. widgets) this could not possibly work.

In the case of the Customer class, what would be involved when giving a Customer a discount? We certainly need to specify and store the amount of the discount. But do we also add a rule to the class? Or was the rule there all along waiting for a discount attribute to be set? Since rules may operate on attributes, it's important to keep them in sync. If you have a rule that expects a discount attribute but no discount attribute is in your object's Description Object, your rule will fail or misbehave.

Also it seems like a Description Object might encounter name collision issues. Perhaps there should be namespaces within a Description Object?

Rules vs. Overriding

In "Object Thinking" by David West, it appears that there are two main ways of making two different objects that respond to the same message react differently to those messages.

The first is using polymorphism but that necessitates two different classes. The other is making sure that the two objects will execute different rules when responding to the message.

There is a third way that I'm not going to go over in this post; it's when two different objects have different state and react differently because of their differing state.

The author discusses good and bad specialization on page 79. I can't say that I am sure I know what the author is talking about. If I take his words in their most literal sense, I get the following statements.

1. Any time one class derives from another, it is assumed that the subclass is a specialization of the superclass.

2. Specializing by extension means only adding methods but never overriding methods. This is the good kind of specialization because it preserves the substitutability of subclasses for superclasses.

3. Specializing by constraint (overriding a superclass' method) almost always results in a "bad" object because now you have to be concerned about how an object does something not just what it does.

4. The exception in the footnote is that there will be some methods introduced high in the inheritance hierarchy that are designed to be overridden by subclasses and how the method does what it does is unimportant.

Given that there are two ways of making two different objects react differently to a message, how do you pick between the two ways? Is one way preferred in general?

Let's say I'm doing some Object Discovery and definition of a Customer class. From the domain experts I've found that there are two kinds of customers customers with discounts and customers without discounts. Having a discount makes the payment calculation different but no additional behavior is added (for argument's sake). If a Customer object can be given a discount or their discount can be revoked, then modifying the Customer's rules would be better than recreating the Customer as a CustomerWithDiscount or vice versa. Perhaps the GoF's State Pattern would also be a good solution here.

Let's say that I'm doing some Object Discovery and definition on some classes that represent various shapes in a drawing software package. In this case it seems most fitting to use polymorphism for differing the way that the shapes draw themselves. The reason it fits better sounds a lot like the exception listed in the author's foot note.

Perhaps the reason that rules sound more natural for the Customer object is because it avoids us creating a subclass for every different kind of payment calculation (e.g. CustomerWithDiscountAndGoldMembershipAndOutstandingBalance). In this case the rules approach is superior to even the State Pattern because you'd have to have a separate state class for each different method of payment.

It looks like I need to read up on the author's thoughts on inheritance...

Friday, January 18, 2008

Object Thinking and Description Objects

I'm reading about Description Objects in "Object Thinking" by David West, and I'm having a hard time wrapping my mind around them. Their primary mentions are on pages 125 and 217. The index in the book is a bit thin. I've had to use O'Reilly's Safari book search feature to find things in the book.

At their simplest, an Object Description is a map from labels (strings? atoms? symbols?) to primitives (e.g. string, int, etc.).

Rather than fill up a CRC card with lots of attribute based responsibilities (e.g. know age, know first name, etc.), your object would have a responsibility called describe self. This responsibility is met by returning a Description Object that contains attributes (e.g. age, first name, etc.).

Description Objects create at least one benefit. Let's say that you have a customer class that has the following attributes: age, first name, and last name. Your domain analysis shows that some customers also have discount. In a non-Object Thinking world, you would have two classes: Customer and CustomerWithDiscount. The only need for the CustomerWithDiscount is the fact that it adds a single attribute. In an Object Thinking world however, you would have a single Customer class but some Customers would return a Description Object that had a Discount label in it and others would not.

This raises some questions:

How does one Customer object get a Discount label in its Description Object?

I see that we have two options:

The first is that if it's the Customer object that adds the Discount label that, taken to its somewhat logical conclusion, implies that a Customer object would have to know about all labels that a Customer could possibly have, when they should be added, and what values they should have.

The other option is that some other object adds the Discount label, and that sounds like a breach of encapsulation to me.

Of the two options, I dislike the first one the most (but the second option comes in close behind). The first option implies that potentially a lot of intelligence would be placed in one class (what labels to add in different circumstances). The second option might spread that intelligence all around the place.

Perhaps we can use factory methods to encapsulate the population of the Description object. For example:

Customer createCustomer(age, firstName, ...)
Customer createCustomerWithDiscount(age, firstName, ..., discount, ...)

In this case the intelligence is all in one place and all of the encapsulation violations are done in one place. Perhaps that's not too evil...

If the presence of a label in a Description Object or its value, influences the behavior of an object, how is difference of behavior implemented?

If the Customer has a responsibility to calculate cost and a Customer with a Discount label in their Description Object calculates a different amount that a Customer without the Discount label, where does that decision logic go?

You might say that this is a job for polymorphism but if that is the case, then we need two Customer objects: one with a discount and one without. Hmmm, we introduced the Description Object to reduce the number of classes in the system but now we're back to two classes.

So does this mean that if the presence of a label in a Description Object or its value can influence behavior of the object, that entry doesn't need to be in the Description Object (and probably should not be there)?

I'll have to read up on Rule Objects but perhaps the cost calculation is a job for one or more rule objects (if inheritance and polymorphism are not appropriate).

What happens when you need to change a value in the Object Description?

If the Object Description is implemented as a mutable map and another object just goes to the map and makes a change, this is a definite violation of encapsulation.

The author states that Description Objects might have their own behaviors. If the author considers setting values of entries in the Description Object as one of the Description Object's behaviors, then we have created a new problem. Now we need different kinds of Description Objects for different kinds of customers. One Customer Description Object knows how to set the discount, while others know how to set other attributes.

Perhaps rule objects would come to the rescue again. When you create a Customer with a discount, you attach different rules to the Description Object than when you create other kinds of Customers.

It seems that in Object Thinking rule objects might be used in many places where polymorphic methods would be used in non-Object Thinking.

All this to say, I think I'll reexamine my CRC cards...

Thursday, January 17, 2008

Object Discovery on the BIll Payment System

I'm developing a bill payment system based on the principles of "Object Thinking" by David West.

I drew up a semantic net based on the domain and covered almost an entire page. I then started writing out CRC cards for the classes in the system and found that many of the classes I had drawn on the semantic net weren't classes at all. Some became part of an object's description (payment schedule, payment rule, website, user name, password). Several were outside of the system (e.g. website to pay bills on, check to mail the creditor).

Since I'm practicing anthropomorphization, I listed each class' motivation on the CRC card. For example a bill wants to pay itself. A creditor wants to be paid the full amount on time. A fund (that keeps track of how much money I have allocated for bill paying) wants to stay in the black.

I was surprised by how my complex semantic net shrunk down to four objects.

Object Thinking and Confusion over Collaboration

David West, the author of "Object Thinking", has a different definition of collaboration than I had ever seen.

The main difference (found on page 130) is noted in a scenario where object A is collaborating with object B. He states that the object being collaborated with (B) is "not an object occupying one of object A's instance variables, a temporary variable declared in the method that object A is executing in order to satisfy the original request, or an object supplied to object A as an argument".

Then on page 254, the author states, "Collaborating objects are very tightly coupled. For this reason, collaborations should occur inside the encapsulation barrier, with objects occupying the instance variables, objects being received along with messages, and objects occupying temporary variables".

At this point, I was confused, thinking I had found a contradiction. Let's see if other references to collaboration clear things up...

On page 226, the author talks about the airplane collaborating with an instrumentCluster which on page 227 is shown as being held in an instance variable (Figure 8-2). Okay, maybe that first definition just had a typo...

On the previous page, page 225, the author restates the definition of collaboration as found on page 130, the definition where collaborations don't take place between arguments, instance and temporary variables.

If you're keeping score, that's two references to collaborations taking place between arguments, temporary variables, and instance variables and two references for collaborations not being between arguments, and the rest.

My thinking on the matter is that any time one object uses the services of another object, there is collaboration. If collaboration is not occurring between instance variables, temporary variables, and arguments, then what is it called when you use the services of these other objects?

After some more thought, I think we have a catch-22 situation here. When you are filling out a CRC card, and you are about to write down a collaborator, you must first think whether or not that collaborator will be stored in an instance variable, temporary variable, or argument. If the collaborator will be referenced via an instance variable, then it's not a collaborator at all and you should not write it down. That sounds like jumping to implementation details during the "Object Discovery" process.

If collaboration is something that is to be discovered during the "Object Discovery" process, then you cannot make restrictions on what is and what isn't a collaboration based on implementation details.

Thursday, January 3, 2008

The Secret (and Guilt-Ridden) Life of Bills

I'd like to write a bill paying system for Mac OS X in Objective-C. I'm looking at the system from an "Object Thinking" (by David West) perspective.

So if beer can want to hop on a truck to be taken to a customer, a bill (the kind you pay) can want to be paid, right?

In fact, as I see it, a bill knows that it is a debt to society (or at least to me). It feels terrible about its mere existence. All it wants is to secure funds so that it can pay itself, record the fact that it paid its debt, and then shuffle off the mortal coil.

I almost start feeling bad for the little bill.

I don't think I'd use the words guilt or despondent anywhere in the code, but it's a fun way of looking at the bill.

Before reading "Object Thinking" the sad little bill would have been an inanimate, vegetable of a bill, with no motivation or smarts (or guilt).

Beer and Motivation

This post is not about my drinking habits. It's about a couple of paragraph's from "Object Thinking" by David West. Since "beer" does not appear in the index, the paragraphs are on page 298.

The author states that the beer is trying to get into a bottle, get capped, and into a truck. One obvious (and wrong) way of implementing this would be to make the beer a controlling object ("Vat, pour me into a bottle", "Capper, cap me", etc.). This approach is wrong because the control becomes too centralized.

So how would you implement this? Could you use events?

For example, the beer registers with the vat so that the beer is told when a vat full of fresh beer is ready. The bottle registers with the beer so that the bottle can be told when the beer is ready to be poured into the bottle. I'm not sure I like this approach. If I look at these objects as humans, as the author encourages, I see behavior that I don't like to see in humans. It's that whole, "I'm ready (and tapping my foot while I'm waiting for you to meet my expectations)." If you link enough of these kind of events together, I'm concerned that you'll have a lot of weak links and hard to follow logic.

That is why I want to implement some simple system based on the principles found in "Object Thinking"; to see if my concerns are valid.


In a podcast (part two of two) the author talks more about the beer.

Thinking about "Object Thinking"

A few weeks ago a new acquaintance recommended that I read David West's "Object Thinking" book. Now I had thumbed through the book before and had dismissed the book largely because of its strong focus on philosophy and history. I'm a "show me the code" kind of guy after all. I was reluctant to take his advice.

But here was this smart, pragmatic, in-the-trenches guy basically telling me that he wanted to work with people who agreed with this book. The thought of using a book on philosophy and history as a litmus test for knowing if you wanted to work with someone made a strong impression on me so I bought the book, and read through it quickly.

I found the author's perspective on OO interesting and unorthodox. I got to the end of the book and had a lot of questions. I decided to read the book again focusing on deeper comprehension and I'm very glad I did.

The strange thing is that the second reading answered very few of my questions. It did make a lot of things more clear but I still had questions. My questions fall into the following categories:

  • Does "Object Thinking" work in the real world?
  • How would you implement "Object Thinking"? To really understand something, I like seeing it from its inception to its completion. For programming topics that means from concept to code (with apologies to Jesse Liberty). This book covered the concepts very well (all that philosophy, history, and culture stuff) but strayed away from code.
  • Can I just pick and choose some of the authors principles and still reap the benefits? Or do I have to adopt the principles that make me uncomfortable?
Having questions after reading a book is not necessarily a bad thing. In fact that 's why I'm finally blogging after who knows how many years of putting it off.

My goal is to try to approach one or more simple projects using the "Object Thinking" book as my guide. I'd like to document that process on this blog.