How to become a programmer in 21 years

Wednesday, November 5, 2008

More Java XML

Adding to my previous post about Java XML.

Take the example where you want to take an existing class and customize your own XML output. You could use XStream out of the box without any configuration and just trust it to create the correct XML. This is fine if you want the object to be fully represented in the XML, but what if you only want a subset of the data in the object?

Well, XStream allows you to write custom converters for this. So, you need to create a converter class (that implements the Converter interface), and in the marshal() method, take the fields that you desire and manually create the XML nodes. This gives you much more control over the output.

Another option is to use StAX to directly map the input elements to the output XML. Again this gives you full control as to what is going into the actual XML document.

I ran a few tests to determine the performance difference between Woodstox (a StAX implementation) and XStream. Basically the test involved both libraries marshalling an object into XML data 100000 times. The difference was quite large, with Woodstox outperforming XStream by about 30~40%.

Friday, October 31, 2008

XML Data Binding and Serialization Java

When working with XML data, there are a number of libraries that can be used, depending on the complexity of your needs.

For a once off reading of an XML file, you can just use a SAX/DOM based parser, such as those defined in the javax.xml.parsers.

If you want to get a particular element or attrbute out of an XML document, then you can use the javax.xml.xpath library to parse the document. This makes it easy to quickly find a single element, or set of elements in an XML document.

If you want to do something more heavyweight, and you are going to use a particular XML schema repeatedly, then you will probably need have some sort of binding/marshalling libraries. Binding refers to the action of taking an XML schema and automatically making a set of classes that correspond to the schema. Marshalling goes the opposite way and creates an XML file from a class. Note that to marshall an object, you don't necessarily need to have bound it previously. You can just declare any old class and marshall it.

The most common binging/marshalling library is JAXB (javax.xml.bind.JAXB). This includes a tool for automatically binding an XML schema to a set of classes, as well as a Marshaller that will create XML from the created objects.

If you just want to marshall an object, then there are libraries, such as XStream, that make this very simple. Just create any old object, and use XStream to define/create an XML document for it. It is a quick and easy solution, especially if you want to define your own classes to hold the XML data (instead of going down the automatic binding route).

Sunday, March 23, 2008

Friends in C++

Friend functions and classes are not the most common of C++ features but there are situations where the friend keyword is useful.

A friend is something that allows your class to be accessed by a different class or function. The friend keyword can be used in 2 cases, to declare a friend function or a friend class.

When you declare a friend function in your class, then it can do anything that a member function of the class can do. So in the following, friendFn() can do anything to the object passed to it that memberFn() can do.

At the end of the above example, the private popObject data mData is 6. Note that the object passed is only changed as it is passed as a reference. If the function declaration for friendFn was as follows:
friendFn(PopularClass pObject, int x);
then friendFn() would have made a copy of the object passed and changed its member variable.

Friend Classes follow along the same principle, they allow another class to have access to the data in a class. So in the following, PopularClass allows the class FriendClass access to its private data. So when fObject.addVals(popobject) gets its hands on popObject it is allowed access to the private data so that it can add the private data from both its own class (which it, of course, always had access to) and the passed in popObject which is an object of a different class.

So friends can be useful if you treat them right (sorry, way too corny I know...)

Saturday, March 1, 2008

C++ inheritance oddity

There are plenty of things that can make C++ a complicated language. But in general the creators of C++ would have gone though a painstaking process to ensure that the language as intuitive and useable as possible (without sacrificing on functionality of course). However, when I came accross the following C++ behaviour I was confused.

Lets just say that I have a base class and a derived class. The base class has 3 member functions, each called doSomething() as below.

The functionality of the doSomething() methods isn't important right now, but assume that the methods are implemented. What is important is that you can create an object of type DerivedClass and it can use the 3 doSomething() methods defined in the base class. This is all straightforward inheritance stuff you say, and you'd be right, but wait, the craziness is about to begin...

Now, lets say I want to redefine one of the doSomething() methods, but I want to do it in the derived class, i.e. I want to do the following:

Again, assume that the DerivedClass's doSomething() method is implemented. Now, assume I make an object of type DerivedClass. If I call the bare doSomething() method everything is ok, but if I try to call doSomething(int x) or doSomething(double x) the program won't compile!!

This is because if you re-implement any of the functions in a base class, all other functions with the same name are now un-callable.

So what, you say. Sure re-implementing non-virtual functions in derived classes is a bad idea anyway. But this behaviour is the same even if the three doSomething() functions are virtual. I'll say that again: this behaviour is the same even if the three doSomething() functions are virtual. So the following will see the exact same behaviour.

The derived object is only able to call the doSomething() method that it defines itself, and not the other 2 defined in the base class. This seemed really strange to me, as I would have assumed that if I the doSomething() functions that accept the int and double could be called. After all, they are all virtual functions, which means that they are supposed to be selectively re-definable!!

The reason for this strangeness is so that you do not get caught out inheriting from distant base classes by mistake. But, I'd still prefer if it was not the case as it seems kind of counter intuitive.

There is a simple way around this, and it is to add in a using BaseClass::doSomething; in the derived class definition. This will work for both the virtial and non-virtual cases above, and will allow the DerivedClass to re-implement some of the doSomething() functions and still allow the other ones to be used.

--
Lurning Man
--

Sunday, February 17, 2008

Inheritence and Casting

In order to go through inheritence and casting, I will use the following classes:

It is possible to create 2 objects of these classes like so:

You can view these classes like in the diagram below with the base object on the left and the derived object on the right. The derived class can be viewed as a base class with the additional member variables/functions of the derived class (of course it cannot access the private member variables of the base class object within it, but they do exist in memory).

All is well and good so far. Lovely jubbly. So now we want to copy the objects. We can either copy the baseObject to the derivedObject or vice versa. One which is fine, but the other is not.

Why is this so. Well, if you have a derivedObject and want to convert it to a baseObject, you can just strip away the extra functionality from the derived class and you are left with a base object.The value of baseMember will be of that created in derivedObject, i.e. 222.

We cannot go in reverse though as if we want to create a derived object from a base object, where is the extra derived functionality going to come from?? All we have is a plain old base object, not a derivedMember in sight....

Ok, hopefully that is nice and clear. Now if we want to create pointers that point to these objects we can do the following:

This is obviously ok, and can be viewed as such:

pBaseObject expects to point to an object of size BaseClass, and pDerivedObject expects to point to an object of size DerivedClass and whaddaya know, both of them are. All is well.

So now, lets do something crazy and switch the pointers around. Point pBaseObject to derivedObject and pDerivedObject to baseObject. Again, only one of these is ok to do, and the other will result in an error. Pointing pBaseObject to derivedObject is ok as derivedObject also contains a baseObject, so when the BaseClass pointer points to it, it can just point to the BaseClass object contained within it (as on the left in the picture below).

However, you cannot do the oppossite. If pDerivedObject tries to point to a BaseClass object it expects to point to something of size DerivedClass. Where is all of the extra derived functionality going to come from?? You cannot call setDerivedMember() because it does not exist for that object. This will result in an error.

The big difference between casting between pointers and copying between objects is that when you copy a derived object to a base one, all of the derived functionality gets sliced off and all that is left is the BaseClass object. When you point a BaseClass pointer to a DerivedClass object the entire DerivedClass object remains, you can point a different pointer to a DerivedClass back to it and it will still contain all of the functionality/member values.

So, in above, pDerivedObject2 points to the origional object with the derived class functionality intact (i.e. a derivedMember and baseMember value of 222).

The rules for casting with references is the same as for casting real objects (after all, references are just another name for the objects), Thus you can assign a BaseClass reference to a DerivedClass object, but you cannot go vice versa. Note however, that assignment in this way will not 'slice' the derived object and all of the settings/functionality of it will remain intact.

That's all for now, hope you like the pictures, they took ages!!!

--
Lurning Man
--

Thursday, February 14, 2008

More Pointers and references

I talked a little last time (in Pointers and References Basics) about, well, the basics of pointers and references. I'll go into a bit of detail now about some practical uses of them.

Probably the most useful thing about references/pointers is when you pass them to functions. If you declare a function like so:

void someFunction(SomeClass a, SomeClass b);

If you call that function like so:

SomeClass a;
SomeClass b;
someFunction(a, b);

both a and b are passed by value to the function. This means that the copy constructor is called, and copies of a and b are made which are used inside the function. This is desired behaviour and all, but there is some overhead involved in calling the copy constructors every time, especially in a performance intensive program (also, remember that the destructors will have to be called each time also...).

So, as a solution to this you can do one of two things. You can pass a pointer to the objects into the function, or you can pass a reference to the objects to the function. This will mean changing the way the function is defined. In the case of passing a pointer it means defining:

void someFnByPointer(SomeClass *pA, SomeClass *pB);

And for references it means defining it as:

void someFnByRef(SomeClass &a, SomeClass &b);

However, when calling the functions, there is a difference. To pass by pointer, you either have to create a pointer to the 2 objects being passed in, and pass that in, or pass in the address of the objects. To pass by reference, you call it as normal.

//By pointer
someFnByPointer(&a, &b);
//Or by reference
someFnByRef(a, b);

In my opinion the second way looks more natural, and you don't have to worry about making sure the pointer address is correct etc. Also in the function itself you can still use the '.' notation rather than the '->' notation. So I tend to prefer passing by reference.

Actually, there is another benefit to passing by reference/pointer (from now on, I'm just going to refer to 'passing by reference', even though it is equally applicable to pointers). When passing by reference, the passed object will not get 'sliced' if the reference is to a derived object. For example, if you specify that a base class object must be passed in by value, if you pass in a derived object, the additional functionality in the derived object gets 'sliced' off (as the copy constructor will only take the values from the base class). However, if you pass by reference the copy constructor is never called and you still have the full derived object. This is especially important if there are virtual functions floating about and you want the derived ones to be called.
So if you have a class, SomeDerivedClass that inherits from SomeClass, you can do the following and still have all of the derived class functionality (provided a derived class object is passed in):

void someFunction(SomeClass &a)
{
//if a is SomeDerivedClass object, then whole object
//remains, and the derived functionality is maintained
//so the following is fine:
SomeDerivedClass c = (SomeDerivedClass)a;
}

Finally, one last plug that should be filled regards what you can and can't do with the objects that are passed to a function. The benefit of passing by value is that when you pass an object you can forget about it, safe in the knowledge that it cannot be changed by the function. The same cannot be said if you pass by reference.

SomeClass someObjectA, someObjectB;
someFnByRef(someObjectA, someObjectB);
//The function can do anything it wants to someObjectA
//and someObjectB.

This is clearly undesirable in many cases. someFnByRef() needs to guarantee that the objects passed to it won't change. To do that, just add our old friend const when declaring the function.

void someFnByRef(const SomeClass &a, const SomeClass &b);

So basically you get all of the benefits of passing by reference (i.e. faster, no slicing etc.) with none of the drawbacks (it is still safe, cannot change the passed object). It's just like a late night infomercial!!

(Note: For an excellent discussion on this, and many other useful topics, check out Scott Meyers' 'Effective C++')

--
Lurning Man
--

Thursday, February 7, 2008

C++ Pointers and References Basics

This will talk about how references and pointers relate to each other and highlight some of their differences.

Pointers and references are quite similar in what they do, as in they both utilise an already existant object/variable. However, they have a number of big differences. Pointers are more flexible, but can be a lot more dangerous, whereas references are somewhat more restrictive, but still very useful.

Ok, so here is an example of using both of them, starting with pointers. I will use the following class for illustration:
class SomeClass
{
public:
void setValue(int i){value = i;};
private:
int value;
}

So you can declare an object of this class, and then a pointer to the object like this:
SomeClass someObject;
someObject.setValue(55);

SomeClass *pToObj; //Not pointing to anything yet
pToObj = &someObject; //Pointer is initialised now
pToObj->setValue(11); //someObject.value is now changed

Pointers can be moved around and set any amount of times, so in the above example, you could set pToObj to point to another object, or something different entirely (which is why they can be so dangerous).

You are given a lot of freedom with pointers that can lead to problems, like the following where an unnitialized pointer is operated on:

SomeClass *pToObj2;
pToObj2->setValue(12); //BAD IDEA - CRASH!!

References on the other hand are much safer. They must be initialised, can only be initialised once (as a reference to some object/variable), and once they are initialised they cannot be changed.
So you can do something like this:
SomeClass someObject;
SomeClass &someRef = someObject;

But once this is done you cannot change someRef to reference another object of SomeClass (or anything else for that matter). Note that you have to initialise a reference. If you do not then the compiler will throw out an error.

There is a difference in notation when dealing with pointers and references. When accessing member variables / functions using a pointer it is done using the arrow notation. i.e.

pToObj->memberVariable;
pToObj->memberFn();

Whereas when using references, it is done using the same '.' notation as if you were using the object itself. i.e.

someRef.memberVariable;
someRef.memberFn();

References can be viewed as an alias for an object. It's like making an object of the LeadSinger class called paulHewson, having him make a band with some of his friends, and then make a reference to the paulHewson object called bono.

LeadSinger paulHewson;
paulHewson.makeBand();
LeadSinger &bono = paulHewson;
bono.sing(); //Both bono and paulHewson sing here as they are the same object!!

paulHewson and bono are just two different names for the same object. You cannot then go on and make bono into something else:

LeadSinger noelGallagher;
bono = noelGallagher;

You can, however, change pointers to point to other objects. So the following is possible:

Drummer johnPepys;
Drummer ericChilds;
Drummer peterBond;
Drummer *pSpinalTapDrummer = &johnPepys;

johnPepys.DieInGardeningAccident();
pSpinalTapDrummer = &ericChilds; //Same pointer, different object
ericChilds.ChokeOnVomit();
pSpinalTapDrummer = &peterBond;
...
and so on
...

I think that covers the basics, I've got a fair bit more to write about pointers and references in posts to come, but that's enough for now!

--
Lurning Man
--