XML - which is better?

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
Cleverbeans
Posts: 1378
Joined: Wed Mar 26, 2008 1:16 pm UTC

XML - which is better?

Postby Cleverbeans » Thu Aug 05, 2010 3:01 pm UTC

So, I'm working on my first serious bit of XML which will be used to export data from one application for import to another. It's a pretty straight forward task, however now that I'm digging into schemas they seem much more robust than I previously expected so I'm a bit town between two approaches.

Firstly, to populate the data into the receiving application, I need the name of the object to create, and the properties of that object as defined with in the application. A common name would be something like "My Wall" and a common property pair might be "Type Mark" -> "A2". Now, I have several objects that share the same groups of properties. Common properties include stuff like "Manufacturer" and "Model", and "URL". However, some objects have additional properties, and some properties occur in more than one place based on a "group" in the application (not to be confused with a schema group). Now here are two cases I'm considering.

Code: Select all

<property name="Type Mark" value="A2" group="Identity Data"/>
<!-- OR -->
<PropertyGroup name="Identity Data">
    <property name="Type Mark" value="A2"/>
</IdentityData>
<!-- OR -->
<IdentityData>
    <TypeMark>A2</TypeMark>
</IdentityData>
<!-- OR -->
<IdentityData>
    <TypeMark value="A2">
</IdentityData>


So, which of these is best and why? What information would be useful in choosing a scheme? Also, is their a standardized way to reference back to other data within the same document? I was thinking it would make sense to have a "Common Properties" section which will be used to populate all elements with the same data since some items are essentially constant, but there isn't going to be a whole ton of it and if there isn't some nice scheme to parse it I'm probably just going to create the data redundantly for ease of use. Any thoughts? Thanks in advance.
"Labor is prior to, and independent of, capital. Capital is only the fruit of labor, and could never have existed if labor had not first existed. Labor is the superior of capital, and deserves much the higher consideration." - Abraham Lincoln

User avatar
thedufer
Posts: 263
Joined: Mon Aug 06, 2007 2:11 am UTC
Location: Northern VA (not to be confused with VA)
Contact:

Re: XML - which is better?

Postby thedufer » Thu Aug 19, 2010 1:45 am UTC

I would go with the second one (except that the </IdentityData> should be </PropertyGroup>).

If things are grouped, then grouping by xml rather than be an extra attribute is cleaner and thus harder to screw up and easier to parse. This narrows it down to options 2, 3, and 4.

In 3 you have limited the name of your property to be constrained by xml rules (no spaces, etc.) and made it harder to parse (what if the propertygroup needs to contain a few properties but later you add another element type to it?). The same argument rules out number 4.

I understand that some of this is subjective. Other thoughts?

0xBADFEED
Posts: 687
Joined: Mon May 05, 2008 2:14 am UTC

Re: XML - which is better?

Postby 0xBADFEED » Thu Aug 19, 2010 2:24 pm UTC

Barring the typo, the second version is probably the most idiomatic as far as XML is concerned, and probably the best choice. As thedufer points out it's an incredibly bad idea to just create arbitrary tags at runtime. It makes it virtually impossible to write a schema (which you'll thank yourself for doing in a few months when you have to come back to the code). And makes writing a reader a real chore.
Cleverbeans wrote:Also, is their a standardized way to reference back to other data within the same document?

Typically people use unique 'id' attributes and then allow references through these 'id' attributes, e.g.:

Code: Select all

<Manufacturer name="ACME Corp." id="1234">
    <Address>
    <....>
</Manufacturer>
....
<Product name="Widget" manufacturer-id="1234">
    <....>
</Product>

So then you just hook up references through the id attributes after you've parsed the entities. At the point that you start actually needing references you're really moving more into database territory (or any other system that's equally capable of modeling relations). If you find your data has lots of these references XML may not be the best persistent store for your data.

Though, even in the database case XML (or other markup languages) are useful for serialization and this technique comes in handy.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 10 guests