XML Feeds

Here are some notes on using XML feeds, some real life, hands on experience.

I'm playing with geospatial data. I have a application under development that displays where Radio Amateurs are. (Why - well why not?) All this tests my XML knowledge, using Rapid Application Development to automatically generate source within a C# .NET 2.0 solution.

I already have a web service that provides live position information. Radio amateurs drive around with radios that fire off their position and call sign every few minutes. I can then plot where they've been on a map.

I now want to superimpose traffic information on that map.

Traffic Information

There are a several xml feeds available. I'm using the BBC and NTCC.

http://www.bbc.co.uk/travelnews/xml/

http://www.roadtraffic-technology.com/projects/traffic_control/

http://www.itsproj.com/otap/

Two separate feeds, from two separate organisations.

Smoke and Mirrors

Be warned - there's a lot of TLAs around. Allow me to informally introduce them:

The XML feed - that's the information such as "God awful smash up at 52N 1W".

The schema definition is information about the feed. For "God awful smash up at 52N 1W" its is something like "Descriptive text, position". We don't use quotes and commas but XML.

Schema definitions come in several favours. Here I'll cover my exploits with DTD and XSD.

The BBC uses DTD (Document Type Definition) from TPEG, NTCC uses XSD (XML Schema Definition) from OTAP, that includes TPEG.

http://www.w3.org/XML/Schema

http://www.oasis-open.org/cover/schemas.html

Clarify

So allow me to clarify:

Theses are two organisations that are supplying live traffic information. They collect and report information. (Some traffic incidents may be reported on both feeds, but the details will be most likely different - depends on who keyed in the data.)

The format of the XML feed is different. Superficially they're similar, with lots of "< / >", but the layout is different. This layout is defined by the schema document that says what goes where. The schema documents could be be DTD or XSD. These documents, in turn are in XML and are defined by their own schemas. Some validation tool (application or person) can say "yes this is a DTD document." I'm sure we could keep going like this until we reach Adam.

Schemas are good - they tell the application I'm writing what to expect. In the .NET 2.0 C# world this is a BIG THING. From the schema definition I can use a utility tool (XSD.EXE) at design time to create C# classes that know about the feed. 

What's more, at run time - when the user puts his fat pudgy fingers on my OnButtonEvents - I can get use XmlSerializer to dynamically compile code that reads the generated classes and "knows" how to populate them with the live data. All I have to do, as a programmer, is to add code that walks through the structure the XmlSerializer has populated, and bombard the user with it.

This makes my life easy. If the schema changes all I have to do is recompile. (I'm afraid support for existing customers with old feeds is thrown out of the window.)

Automatic code generation is all very good, but...

XML is very good, but there are a few wrinkles. The data feeds are fine - don't get me wrong. It's how an application is designed to consume the feed that is a bit rough. I'm sure the tools will improve.

I'm very positive that all this "xml clutter" is worthwhile. But let me share my sad sagas.

My saga with .DTD

I agree with Ryan. He has trodden the path. I follow, and I agree with him. I ended up ignoring the schema altogether - I wanted to get something going quickly!

My path

How do I get the DTD (Document Type Definition) in to a form that I can access using C#?

I looked at various Java tools.

XMLSPY looks like it could pass muster.

I tried XSD.EXE. XSD.EXE is bundled with dev studio. (Isn't the Microsoft Development Studio supposed to be an integrated environment?)

XSD.EXE does not read DTD files. I tried feeding it with a copy of live data XML. That failed: I was presented with:

C:\temp\tpeg>xsd rtm_tpeg.xml /c
Microsoft (R) Xml Schemas/DataTypes support utility
[Microsoft (R) .NET Framework, Version 2.0.50727.42]
Copyright (C) Microsoft Corporation. All rights reserved.
Error: There was an error processing 'rtm_tpeg.xml'.
- DataSet cannot expand entities. Use XmlValidatingReader and set the EntityHa
ndling property accordingly.

If you would like more help, please type "xsd /?".

I went down an avenue of writing a console app hosting XmlVailidatingReader - before I saw sense. I just hacked it!

Yes - I ditched all this schema palaver. Here's my hack:

using System;

using System.Collections.Generic;

using System.Text;

using System.Drawing;

using GeoTransform;

using System.Xml;

using System.Xml.Schema;

using System.Xml.Serialization;

using System.Net;

using System.Diagnostics;

...

private List<ThirdPartyInfo> Read(string szPath)

{

List<ThirdPartyInfo> listLL = new List<ThirdPartyInfo>();

XmlReaderSettings settings = new XmlReaderSettings();

settings.ValidationType = ValidationType.DTD;

settings.ProhibitDtd = false;

settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);

XmlReader reader = XmlTextReader.Create(szPath, settings);

reader.MoveToContent();

string message = "";

while (reader.Read())

{

if (reader.NodeType == XmlNodeType.Text)

{

message = reader.ReadContentAsString();

}

if (reader.NodeType == XmlNodeType.Element && reader.Name == "WGS84")

{

reader.MoveToFirstAttribute();

float lat = float.Parse(reader.Value);

reader.MoveToNextAttribute();

float lon = float.Parse(reader.Value);

ThirdPartyInfo tpi = new ThirdPartyInfo(new ll(lon, lat));

tpi._description = message;

listLL.Add(tpi);

}

}

return listLL;

}

Do you like the way I copied and pasted the above code? Microsoft Development Studio to Microsoft FrontPage. Seamless integration. At least the syntax colouring is right!

The code is "good enough" for a first cut. I have traffic information from the BBC!

My saga with .XSD

I now try getting data from the NTCC.

I successfully use XSD.EXE, that's bundled with Microsoft Development Studio.

The XSD (XML Schema Definition) documents had to be brought down to a local drive from the information provider's server. There doesn't appear to be a way of providing authentication details as a parameter to XSD.EXE.

I chose an appropriate namespace. (You don't want a traffic Point class mixed up with your System.Drawing.Point do you?)

I run xsd.exe:

C:\temp>xsd publication.xsd /c /n:trafficinfo
Microsoft (R) Xml Schemas/DataTypes support utility
[Microsoft (R) .NET Framework, Version 2.0.50727.42]
Copyright (C) Microsoft Corporation. All rights reserved.
Writing file 'C:\temp\publication.cs'.

That looks fine! I get a huge C# file that's 273k big. Here's a small chunkette that blathers about how snow information will be presented in a C# class.

Good, now I need to throw an XML data feed at this class.

This is very similar to the code I presented above for the DTD case. I've added some fluff to do with authentication and validation.

public List<ThirdPartyInfo> Read(string szPath)

{

List<ThirdPartyInfo> listLL = new List<ThirdPartyInfo>();

XmlUrlResolver resolver = new XmlUrlResolver();

resolver.Credentials = new System.Net.NetworkCredential("username", "password", "");

XmlReaderSettings settings = new XmlReaderSettings();

settings.XmlResolver = resolver;

settings.ValidationType = ValidationType.Schema;

settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);

XmlReader reader = XmlTextReader.Create(szPath, settings);

reader.MoveToContent();

XmlSerializer ser = new XmlSerializer(typeof(trafficinfo.SituationPublication));

XmlDeserializationEvents xde = new XmlDeserializationEvents();

xde.OnUnknownAttribute += new XmlAttributeEventHandler(UnknownAttributeHandler);

xde.OnUnknownElement += new XmlElementEventHandler(UnknownElementHandler);

xde.OnUnknownNode += new XmlNodeEventHandler(UnknownNodeHandler);

xde.OnUnreferencedObject += new UnreferencedObjectEventHandler(UnreferencedObjectHandler);

trafficinfo.SituationPublication sit = (trafficinfo.SituationPublication)(ser.Deserialize(reader, xde));

if (sit == null || sit.situation == null)

{

return listLL;

}

foreach(trafficinfo.Situation s in sit.situation)

{

foreach(trafficinfo.SituationElement e in s.situationElement)

{

if (e.elementlocation is trafficinfo.TPEGFramedPoint)

{

trafficinfo.TPEGFramedPoint tp = (trafficinfo.TPEGFramedPoint)e.elementlocation;

ThirdPartyInfo tpi = new ThirdPartyInfo(new ll((float)tp.framedPoint.wgs84.longitude, (float)tp.framedPoint.wgs84.latitude));

foreach (trafficinfo.OtherPointDescriptor str in tp.framedPoint.name)

{

tpi._description += str.descriptor + " ";

}

if (e is trafficinfo.Accident)

{

trafficinfo.Accident a = (trafficinfo.Accident)e;

tpi._description += "Accident ";

}

tpi._description += e.ToString();

listLL.Add(tpi);

}

}

}

return listLL;

}

Excellent stuff! I now have access to all the feed data in a humongous structure that I can easily pick bits out of!

I can walk the structure and see there's an issue with the A64 northbound between A1237 near York (south) and A1036...

This is all very good, and is what I learnt you should be able to do with XML and .NET.

I have traffic information from the NTCC!

But there's a step I missed out! Things did not go so smoothly. The XmlSerializer Deserialize method blew up!

When things go wrong...

... they go very very wrong.

The XmlSerializer Deserialize method threw an exception. It compiled code from the SituationPublication class. See the parameter passed in to the XmlSerializer ctor:

XmlSerializer ser = new XmlSerializer(typeof(trafficinfo.SituationPublication));

When the XmlSerializerPreCompiler came to compile the compiled class... well, it failed:

How do you find out what went wrong? This is code created by the XmlSerializer pre-compiler, not mine!

There is an answer.

Add the following to your app.config file:

<system.diagnostics>

<switches>

<add name="XmlSerialization.Compilation" value="4"/>

</switches>

</system.diagnostics>

With app.config key XmlSerialization.Compilation set to value 4, the source files used to create the faulty code are kept around. Look in your %temp% directory.

There are now enough clues to help fix the problem. (Don't forget to remove the switch: each time the XmlSerializer pre-compiler runs, another megabyte of junk gets left in your temp directory.)

Inspecting the generated code soon gave clues to a fix. Hack the XSD file used by XSD.EXE to generate the classes. Yes, that means breaking the contract between the information provider and my application.

The actual repair was to edit the .xsd

<xs:complexType name="SuplementaryAdvice">
        <xs:sequence>
                <xs:element name="supplementaryInformation" type="SadEnum" minOccurs="0"
maxOccurs="unbounded"/>
        </xs:sequence>
</xs:complexType>

to this:

<xs:complexType name="SuplementaryAdvice">
        <xs:sequence>
                <xs:element name="supplementaryInformation" type="SadEnum" minOccurs="0"/>
        </xs:sequence>
</xs:complexType>

Seems that minOccurs="0" on its own with no other elements in a sequence causes the problem.

This is a bug with the XmlSerializer class: I will diligently report this to Microsoft.

What next?

Take a look at SGen

SGen

XML Serializer Generator Tool (Sgen.exe)

"XmlSerializer generates serialization code and a serialization assembly for each type every time an application is run. To improve the performance of XML serialization startup, use the Sgen.exe tool to generate those assemblies the assemblies in advance. These assemblies can then be deployed with the application."

Is having schema validation helpful?

Yes - for initial development, but I'm not so sure for when for when the application's deployed. If XmlSerializer went off to the XSD on the feed web site, then ran XSD.exe, then compiled - may be. But where does all this clever dynamic compilation get you? XmlSerializer is relying on output from the XSD.EXE file that may be months out of date.

http://www.firstobject.com/dn_xmlvalidation.htm

XML and .NET - Take your pick!

MSDN Magazine - Five ways to emit test results as XML

This article covers 5 ways to generate data to an XML file:

Each is based on a different .NET Framework class.