RE: [xml-dev] Normalizing XML [was: XML information modeling best practices]
by Joshua Allen other posts by this author
Apr 30 2002 11:05PM messages near this date
Re: [xml-dev] XML envelopes and headers
|
RE: [xml-dev] Normalizing XML [was: XML information modeling best practices]
And in my opinion, normalization is overrated. I have seen too many
times people fresh from a database theory class who build a system that
is completely normalized, with no thought as to how the data is going to
be coming in or going out. They end up with a system that is
unmanageable and doesn't scale. That is not to say that database
normalization is bad, but normalization is harmful if people are just
following rote guidelines without understanding *why* those guidelines
exist, and more importantly, when those guidelines do *not* apply, and
when other information modeling techniques are more useful.
Joshua Allen
Microsoft WebData XML
425.705.7857
> -----Original Message-----
> From: Michael Rys [mailto:mrys@[...].com]
> Sent: Tuesday, April 30, 2002 3:27 PM
> To: Ronald Bourret; xml-dev@[...].org
>
> Ah, for once an interesting topic.
>
> I don't have much time now, but I would like to point out the
following:
>
> 1. The best definition of normal forms is based on the functional
> dependencies and what constraints the normal forms impose on allowed
> functional dependencies. The one exception is the 1st NF.
>
> 1. 1st NF: All nested relational models are also known as NF^2
> (non-first-normal-form). Since XML is nested and allows repetition, I
> would consider this to also not apply to XML data. However, some of
the
> guidelines given below are certainly good guidelines.
>
> 2. 2nd NF etc: I think they all certainly apply, since FD analysis
> basically allows you to determine what property groups should be
> considered "entities" (ie complex elements) and how they should relate
> to each other (nesting vs IDREF).
>
> Note that this of course assumes that you still apply the notion of
> functional dependencies and some form of ER modeling which may or may
> not be appropriate for your XML domain (ie, markup may not care too
> much).
>
> Best regards
> Michael
>
> PS: I think the above basically agrees with Michael Kay's reply...
>
> > -----Original Message-----
> > From: Ronald Bourret [mailto:rpbourret@[...].com]
> > Sent: Tuesday, April 30, 2002 2:19 AM
> > To: xml-dev@[...].org
> > Subject: [xml-dev] Normalizing XML [was: XML information modeling
best
> > practices]
> >
> > Manos Batsis wrote:
> > > > XML is pretty good for tables, but not so good
> > > > for enforcing relational normalization rules.
> > >
> > > Of course. Enforcing relational normalization rules shuts down
most
> good
> > > reasons to use XML in the first place, with the possible exception
> of
> > > exchanging DB data between servers via http.
> >
> > I don't think this is quite true. One of the questions I wrestled
with
> > in trying to understand native XML databases was what
"normalization"
> > meant. While there's undoubtedly a lot of room for thought here, I
did
> > take a look at the first, second, and third (relational) normal
forms
> > and tried to see what they meant in XML terms.
> >
> > The following example supposes you are viewing a sales order
> hierarchy:
> >
> > <Order>
> > <Number>123</Number>
> > <Date>1/2/03</Date>
> > <Customer>
> > <Number>456</Number>
> > <Name>Customers, Inc.</Name>
> > </Customer>
> > <Item>
> > <Number>1</Number>
> > <Part>1b-10</Part>
> > <Quantity>10</Quantity>
> > </Item>
> > <Item>
> > <Number>2</Number>
> > <Part>zyx23</Part>
> > <Quantity>5</Quantity>
> > </Item>
> > </Order>
> >
> > First normal form:
> > ------------------
> > Data is in first normal form if it (a) has a primary key and (b) has
> no
> > repeating fields.
> >
> > A primary key basically means that there is a set of fields that
> > uniquely identify the other fields in the row. In XML terms, this
> > implies that you only store one "thing" per document -- such as one
> > sales order or one chapter -- unless the collection of "things"
itself
> > has identity. (It's not clear that all "things" stored in XML
> documents
> > can, in fact, be identified by a proper subset of their data.)
> >
> > This is not to say that it's not useful to place multiple "things"
in
> a
> > single XML document -- for example, it is quite useful to batch a
> bunch
> > of sales orders together to ship them over the wire -- just that
such
> > documents are not normal.
> >
> > Repeating fields just means that you don't see field names like
> Author1
> > and Author2. In XML terms, this means you use repeating children (*,
> +)
> > rather than enumerated children.
> >
> > Second normal form:
> > -------------------
> > Data is in second normal form if the entire primary key is needed to
> > predict each field value. The effect is to split the one and many
> parts
> > of a one-to-many relationship into separate tables. For example,
store
> > sales order header information and line item information in separate
> > tables.
> >
> > This form exists in the relational model to avoid duplicate data: if
> you
> > store sales order header and line item data in the same table, the
> > header information gets repeated on each line item row. XML doesn't
> have
> > this problem -- it stores hierarchies quite nicely without duplicate
> > data -- so I don't think the second normal form really applies.
> >
> > Third normal form:
> > ------------------
> > Data is in third normal form if you can't predict one non-key field
> from
> > another non-key field. The effect of this is to split the many and
one
> > parts of a many-to-one relationship into separate tables. For
example,
> > store customer data in a separate table from sales order data.
> >
> > This poses a real problem in the XML world, since many real-world
> > documents contain duplicate data. For example, many sales orders
> contain
> > customer information -- name, address, phone nummber, etc.
> >
> > I think that this does apply to XML, but that you need to decide
when
> it
> > is useful to apply this form. That is, if you want truly normal XML
> > data, you should store this sort of data in a separate document and
> link
> > to it from your main document. For example, store the data for each
> > customer in a separate document and link to it from your sales order
> > documents.
> >
> > However, I also think that this only makes sense if XML is the
primary
> > storage format for your data, since it allows you to avoid update
> > anomalies (as Jonathan Robie pointed out in another email). If XML
is
> a
> > secondary storage format, then you probably don't need to worry
about
> > the duplicate data, since it is really a historical record, not a
set
> of
> > live data.
> >
> > To explain: Consider our sales order documents. It is unlikely that
> the
> > data for these documents lives in XML. More likely, the data lives
in
> a
> > relational database. In this case, the sales order document is a
> > historical record of a given transaction, so the fact that the same
> > customer data is used in multiple sales order documents doesn't
matter
> > -- nobody is going to try to update it and there is no/low risk of
> > update anomalies.
> >
> > Now consider geneological data that I am storing in a native XML
> > database because it is too irregular to fit into a relational
> database.
> > In this case, I probably do want to store shared data in separate
> > documents so it lives in only one place in the database.
> >
> > For example, my documents for each person contain data such as
> > birthplace, birthdate, parents, siblings, career information, etc.,
> but
> > point to separate documents for things like information about the
> agency
> > where the birth certificate is stored and the contact information
for
> > the administrator of the cemetery where the person is buried.
> >
> > Comments?
> >
> > -- Ron
> >
> > -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> > initiative of OASIS <http://www.oasis-open.org>
> >
> > The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> > To subscribe or unsubscribe from this list use the subscription
> > manager: <http://lists.xml.org/ob/adm.pl>
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org> , an
initiative of OASIS <http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>
|