|
This paper will first describe the impact of XML on data management for both well structured and more loosely structured data. Thelongest section outlines the introduction that XML does and does not address, further more about data interoperability, data integration, grids and in the end conclusion of report that tell the whole story of this paper and you may can say the outcomes of this paper.
1.0 Introduction
The abbreviation XML stands for eXtensible (Extended) Markup Language. It is markup language much like HTML. It was design to describe data. Its tags are not predefined you have to define your own tags. It provides a foundation for creating documents and it is XML that transfer documents into some thing new. By using XML, you can define the tags for your markup language. XML facilitate with method or scheme that help in storing the data or information in well structured way. XML specification defines an XML documents as data object that confirm to the rules of well- formed document. A well formed document must meet some standards to be considered as a well formed. A well formed document is not valid until it meets the constraints define by Standard Generalized Markup Language (SGML). XML is derived from SGML; it is modified version of SGML, especially for the designed of Web documents. Even it is called as the subset of SGML. XML documents usually consist of two main parts
- The prologVersion control and Document Type Definition (DTD)
- The body
Contain the rest of markup
DTD contain the markup deceleration that that provide a grammar for a class of documents. This grammar is known as DTD. This declaration can be element type declaration, an attribute list declaration, an entity type declaration, or a notation declaration. There are two main level of XML, first it provides syntax for documents and second it provides syntax for declaring the structures of documents. Data is integrated in XML documents as strings of text, and data is bordered by text markup that explains the data [Dan Suciu 2001]. One particular unit of data and markup is called an element. XML document is made up of one or more elements. Originally designed to meet the challenges of large-scale electronic publishing, but it also provide and increasingly very important role in the exchange of a wide variety of data on the web and elsewhere [Kuchling A, 1998].
Success of XML is based upon three powerful concepts, one of them is to reusability of data in different ways secondly it provides building blocks which can be use to design new languages and finally third is to use way of encoding electronic documents in a standardized format. [Stephan Taylor 2003].
The tool of XML allows developers to create web pages and much more. It is XML that give developer a luxury to set the standards and define the information that should appear in a document, and also in what sequence. Developers and users can work with XML directly without need of any special software. XML provides very friendly environment to both programmers and the person who write the documents. It has very fix rule for documentation that help in reading documents more easily. XML is commonly agreed-upon format for data exchanged between systems and suited to delivery and interoperability over the Web.
One of the big issue for any big or small organization either it is bank or a recruitment company or a showroom or even a small company is the safety of their precious data and their valuable internal information. Using different tools for different parts of the document and allowing different users to access the data raise the security issue in system, which could cause serious problem sometimes. This particular issue is solved efficiently in XML, by limiting the access of different user up to different levels. Every level of users has different level of access to same date with compare to the other levels.
There are many problems in XML with the databases. Internet applications and the new ways in which they handle data are changing the role of database theory and its relationship with practice. Three particular problems XML have with work with databases, XML publishing, XML type checking and XML storage. XML publishing problem is a normalization problem, like relational database is normalized and at the other side XML schema is also formed into tree form, so it may conflict with the internal organizational rules. XML type checking is another problem with XML, because XML schema or DTD involve in type checking dynamically and error chances can be high by run time. XML storage is a problem because the data is labeled tree in XML format and in relational databases the data is in the tables so the problem of storing data in one or several tables is a challenge and it causes problems some times. XML defines structure based on the information being given. This allows authors to define the rules separately for displaying the XML data using Style Sheets [Iraklis Varlamis 2001]. XML give following advantages to its users with many more:
1.1 Advantages of XML
Here are some good points of XML, which had made XML so popular.
- Simplicity
- Extensibility
- Flexibility
- Separation of Data and Display
- Openness
- Reusability of Data
- Data Security
1.2 Problem with XML
XML is a flexible standard data format, but it is not the best data format for all uses and it cannot bring complete interoperability between all applications. There are many problems in XML with the databases as well. Internet applications and the new ways in which they handle data are changing the role of database theory and its relationship with practice. Three particular problems XML have with work with databases, XML publishing, XML type checking and XML storage. XML publishing problem is a normalization problem, like relational database is normalized and at the other side XML schema is also formed into tree form, so it may conflict with the internal organizational rules. XML type checking is another problem with XML, because XML schema or DTD involve in type checking dynamically and error chances can be high by run time. XML storage is a problem because the data is labeled tree in XML format and in relational databases the data is in the tables so the problem of storing data in one or several tables is a challenge and it causes problems some times. Any thing that is made by human is never perfect and it cannot be, then how than XML can be, it has some real problem in followings ways:
- Documents of XML documents are really very huge they are not compact.
- Standards of XML are underdevelopment so the changes are still going on.
- XML business standards will prove elusive.
- XML requires marshalling.
2.0 Grid
Grid computing is a way of organizing computing resources in such a way that they are flexible, easy to access and useful in many other ways as well for users. The objective of Grid computing is to make resources available so they can be more efficiently utilized. The original purpose behind Grid computing was to link together supercomputers spread across wide distances, but the aims have since moved beyond this scope. There are many organizations, which are efficiently maintaining data grid for the help of users. The main core of these organizations is to provide the ease to the users while they are using any sort of data for any purpose. There are some predefined standard for grid. The data grid is a good example of an interoperable system. Many Grid implementations are oriented toward supplying specific types of resources. Grids can be categorized according to these resources. The most common types of Grids are Computational Grids, Data Grids, and Application Grids. There are many data grid which are currently serving a lot number of users across the world e.g. NASA Information Power Grid, AstroGrid, European Data Grid etc.
Grid can be divided into two main categories the way it store data:
Structured Grid
Structured grids we always know which neighbors will be around any grid point.
Unstructured Grid
For unstructured grids the neighboring points are not immediately available.
2.1 Data Grid
The data grid emphasizes its role as a specialization and extension of the Grid that has emerged as an integrating infrastructure for distributed computation. Data grid is a subset of grid, whose basic goal is to give an integrated infrastructure for distributed computing. Access to distributed data is typically as important as access to distributed computational resources.
Term Data Grid is used for different purposes:
- Data Grid terms for a virtual data grid
- The automation of the execution of processes is managed in virtual data grids.
- Data Grid terms for a distributed resilient scalable architecture.
Federated server architecture refers to the ability of distributed servers to talk among themselves without having to communicate through the initiating client.
- Data Grid terms for an information repository abstraction
It is a software system that is used to control combinations of semantic tags and the associated value of data attributes.
- Data Grid terms for a storage repository abstraction
It is a storage system that holds digital entities.
- Data Grid terms for a logical name space
It is a naming convention for grouping digital entities [Moore,R.W.,merzky,A., 2002]
XML is really worked hard to catch the most of capabilities that are require for distributing computing but it still have a lot of room left for improvement in these fields. XML cope well with data integration, data interoperability and grid problems. Grid is very important for heterogeneous computing.
3.0 Data Integration
The most important and recurring problem that XML able to solve is middle tire data integration. This problem is difficult for several reasons. Data coming from different sources can have different formats, and if the application code is exposed directly to these different formats. It is bad enough that a single application has to have code to handle the multiple formats [Reaz Hoque2000].
It plays a very important role in the management of heterogeneous databases. There are many different ways to integrate the individual databases. One is known as system overhaul technique in which a new system is created that consolidate the existing systems into one system. This option can be costly in some cases. The federation technique allows users to choose from the variety of individual database schemas. The composite technique, the mediated technique, the cover up technique, and the data warehousing technique the goal of all these and many more that may come with the passage of time is and will be, the technique should be affordable, quick and easy to handled and that does not require a large amount of investment of capital and other related resources when integrating databases.
XML adopt the composite technique, the method of working of this technique is it creates a virtual data warehouse, which provides the feel of single repository, and it also allows data to remain in its natural distributing settings. XML enables intelligent client side processing; it also has the ability to change presentation of data dynamically without unwieldy and time consuming interaction with server. On middle tire XML ideally suited to address the many of persistent data integration problems that infect the enterprise application. And XML is also very good with data storage and retrieval medium.
4.0 Data Interoperability
There are many definitions available for interoperability. Many group of people look at it in different way. Ability of a system to use the parts or equipment of another system [Loesgen Dictionary 2000] or It is ability to transfer data products between different types of storage systems or share the data across different users or networks. But its function does not change with the change of word in its definition. One of the most important and fundamental feature required for distributed computing in heterogeneous environment. It gives a meaningful way for the presentation of data, and also gives permission to distribute it. XML allow data to be shared across the web not only within local network but also across the networks of the networks in a quite easy way just by giving meaningful structure to the data. It is platform independent and application independent and is well capable of mediating interactions between various Grid components. Types of interoperability
- Supporting multiple devices
- Enterprise:
- Inter application
- Inter process
o Inter departmental
- Inter enterprise
Here are some major achievements of XML in term of interoperability.
XML Interoperability to talk to multiple devices. Its uses SOAP to implement a pub/sub message monitor. The BizTalk initiative and how they combine to facilitate XML interchange.
Conclusion
Although XML was initially envisioned as simply a replacement for HTML, but its impact is turning out to be far grater than that. By giving a healthy affect on the client-tire, the middle-tire, and the backend-tire. It handles every tire very well that is a big plus point of XML. Different other programming languages are using XML now days due to its open end structure. Compared to the object-oriented one, the relational model is not the most appropriate for storing XML data. The only reason is when it just stores XML data then it convert into plain text, so many thing wrong can happened with it, like relations. Technology shows that there are many interesting problems concerned with the use of XML and databases together. The described use of XML and its associated technologies in databases and its importance in data warehousing, we just can not ignore XML’s importance, the only thing which is needed that the future work of XML may make it, a complete data model and the work with other technologies. We can take it as semi structure data, which use Object Exchange Model for data modeling. This model actually used for semi-structure data models and use different approaches to make its use in DBMS. Using XML source for feeding data warehouse systems will become a standard in the next few years. Much of work has been done and still in progress by different people and organizations to make a maximum use of XML with other technologies. XML is brought many changes as compare to HTML; it is really flexible and extensible markup language. In the end I would like to say that XML is flourishing day by day and its use is incredibly increased during last years, because of its flexibility and extensibility. XML real deal well with the issue of data integration, interoperability and grid computing though it is not perfect in these field but progress is still continuous.
XML in a Different View
Abstract
As by the need of time has changed, the different technologies took over previous technologies quickly. XML is a key technology, which is enhancing and Internet capacity and improving its usage within e-commerce and other online applications. It is finding its way into different business and in enterprises. XML took over SGML, which is a complex markup language, and then carry the features of SGML and add new ones to make it more useful and easy. The best thing about XML is its simplicity, reusability and flexibility to work in different environments and variety of applications. Its now a days very much needed language over Internet. The use of it is to integrate data from different sources and present them. It’s a meta-language uses semi-structure data model as well, so the functioning of it with its related technologies in databases and in data warehousing is interesting to understand. Now different databases provide special support for XML in their products as well. It can be used in different kind of programs and also taking over Electronic Data Interchange systems. Traditional data models cannot fully support and handle the new arriving problems in transactions and especially e-business over web and desktop level, so XML taking over traditional system into it, which is used for such kind of data models, like OEM (object exchange model). Different kind of advantages and disadvantages XML has, during its work with those semi-structured technologies. Different other languages actually powerful languages also derived form it. It’s a revolution in web base technologies. W3C make XML easy for the complex programs for web, its actually application and vendor independent and easy to use.
Introduction
XML is structure-based meta-language, which gives the user to define its own tag structure for reusability of data. Now a day it is highly recommended for data transaction on Internet. Like HTML, XML is the sub set of SGML. XML is a fast emerging technology, so the need of understanding XML structure and its work is very much essential for us.
The different things need to be taken under consideration about XML, like is it a complete data model or not, mean is it fulfill the requirement to be a data model? Like, structure, rules and manipulation. The other thing things which is our concern is that, how XML and its associated technologies like DTD, DOM, XHTML, XPATH, XPOINTER, XQUERY, XSL, XML-Schema work with databases and data warehousing.
In this report we have looked into the different ways of usage of XML into different environments. Extensible markup language is a fast emerging as the dominant standard for presenting data on the Internet Jyaval (1999). The Extensible Markup Language (XML) is descriptively identified as an extremely simple dialect (subset) of SGML the goal of which is to enable generic SGML to be served, received, and processed on the Web, Florian Wass (2001), in the way that is now possible with HTML, for which reason XML has been designed for ease of implementation, and for interoperability with both SGML and HTML.
Many organizations use the XML for reusability and flexibility of their data on Internet. Different kind of things considered about XML’s role in databases and data warehousing, which provides the general awareness of its usage. Relational Databases and Object Oriented Databases both can use XML to manage their data effectively and efficiently.
Success of XML is based upon three powerful concepts, one of them is to reusability of data in different ways secondly its provides building blocks which can be use to design new languages and finally third is to use way of encoding electronic documents in a standardized format Stephan Taylor (2003).
The data modeling for XML database is the issue, which is taken into account in this paper, which mean that the structure of XML is completely data model or other than that. Different kinds of problems were seen, while XML implementation comes to discuss, like XML publishing, XML type checking and XML storage.
XML
XML is like its predecessors SGML, XML is a Meta language used to define other languages. However, XML is much simpler and more straightforward than SGML. XML is a markup language that specifies neither tag set nor the grammar for that language. Brett Mclaughlim(2000), “XML by itself is of limited value”. It defines only framework. However all of various technologies that rest upon XML provide developers and content managers the flexibility in data management and transmission. Different technologies work with XML and to make it use of every application. DTD is associated technology, which is data type definition, which establishes a set of constraints for an XML document. DTD is not a specification of its own, it define the way an XML document should be constructed. The other thing is XSL, which is Extensible stylesheet language, which transform and translate XML data from one XML format into another format. It uses XSLT to transformation a textual XML document to merge together. DOM Data object model is use to represent the content and model of document across XML and other programming languages. Using the DOM for XML requires a set of interfaces and classes that defines and implement the DOM itself. There are other technologies, which runs through with XML, we will see them in the actual use in databases and in data warehousing.
XML with Databases and Data warehousing
XML is a fast emerging as the standard for representing data. Sophisticated query engines that allows user to effectively tap the data stored in XML documents will be crucial to exploiting the full power of XML. XML is an important topic in current database systems. XML is increasingly being used as tool for both data storage and data transmission. Over the past few years, XML has sprung up as a topic in databases systems, textbooks web/Internet and computing literature, Paul J Wagner (2003).
XML databases use XML documents as fundaments units. For an XML instance, and store data as either binary code or text files, J. Fong (2003). XML document have been widely used on the Internet for business in both B2B and B2C. That is why the strong need of migrating from relational database into XML documents arise.
XML is now used in major projects as well like common representational and data transmission language between multiple applications, database systems and continuous data stream (satellite down load data or network activity logs).
With the use of XML, communication and information exchange can be established regard less of the underlying storage platform. However, the different applications that communicate using XML have to transform XML to the underlying information model, which is usually a Relational DBMS. Relational data representation in XML, the DBMS technology is the core part of too many applications in use today.
XML is widely used in Relational database, especially ORACLE 9i software that allows a programmer to store and manipulate XML. These extensions appear to be complementary with current XML standards. The simplest extension is an XML native data type that allows a data modeler to specify and store XML as the content of a field. For example, the simple table can allow a named survey specified as an XML document, Thomas K Moore (2003). The most widely supported technologies for describing relational DBMS are DTD and XML-schema. The DTD for relational database is, any document conforming to the DTD can be stored in the relational database and any XML semi-structure query over a document conforming to the DTD can be evaluated over the relational databases instance. When converting an XML DTD to relations, it is tempting to map each element in the DTD to a relation and map attributes of the element to attributes of the relation, Kristen Tufte (1999). However, there is no correspondence between elements and attributes of DTDs, entities and attributes of the ER-Model. DTD represent the attributes of and entities of the ER-Model. The different techniques use in XML for relational database like, simplifying DTDs, Schema conversion, inlining technique, shared inlining technique and hybrid inlining technique, David Dewitt (1998). The thing which is important to understand is that, how the SQL queries generated from XML queries. SQL queries can converted into XML complex documents. This is a drawback in using current relational technology to provide XML querying, David Dewitt (1998). With this constructing arbitrary XML result is difficult, so the problem and drawback is to convert the relational results to XML document.
Keeping data in XML file is the file base approach of saving data, which is not recommended approach in some cases. While using multiple documents it will be very difficult to pass around these documents. To find information in such documents is a big problem and time consuming job as well. Using different tools for different parts of the document and allowing different users to access the data raise the security issue in system, which could cause serious problem sometimes. Another problem that a user can face is the data concurrency problem. Due to file base structure XML file can be access by one user at a time. So if one is modifying data other users have to wait until first finishes. It’s big problems which cause lose of data chain some times and cause the data corruption in the system. Although data can be recovered through old versions but in case of XML data versioning is also a problem. We are unable to keep the track of different versions of a document. Problems with XML are due to his file base system. Its an old system for storing data which has the problem in data security, data access and data navigation. So it’s not only XML who is facing these problems, all the data storage techniques working on file base system face the same problem. A schema describes the structure of an XML document. It indicates which elements appear in the document and which sub-elements, attributes and relations are allowed within each element, Iraklis Varlamis(2001). The schema can be used to validate the structure of XML document automatically and also to decompose an XML document to the pieces of information it comprise of. We can use XSL (XML query language) in relational databases for tag variable, simple structure, grouping, element construction, heterogeneous results and nested queries, Eric T Ray (2002). However the complex element construction remain problem in this. For databases, DOM is widely used technology with XML. When the selected XML tree is mapped into an XML-Schema in the form of DTD, then load joined tables into DOMs, integrate them into a DOM, and transform them it into an XML document.
With the Object Oriented database there are many advantages if we use of XML-Schema, Iraklis Varlamis (2001). It offers a strict way of defining the structure of interchanged information, once the information model is described in an object-orientated notation it can be mapped into an XML-Schema and consequentially into database schema, and it is easier for object-orientated database file to export the information structure to an XML-Schema file.
A large amount of data needed in decision-making processes is stored in XML data format, which is widely used for E-commerce and Internet based information exchange. As more organizations view the web as an integral part of their communication and business, the importance of integrating XML data in data warehousing environments is becoming increasingly high, Matteo Golfarelli (2001) Multidimensional design for data warehouses can be carried out starting directly from an XML source. DTD and XML-Schema use mostly in data warehousing with XML.
Alagic in his paper ‘Institutions’ (2002) provide strong ground work for an approach to integrating the two major technologies that are XML and databases, he consider this as challenge to integrate both of them together. In his paper he present a general model theory. The paper also presents some examples, a survey of tools as well as potential application areas for the proposed model.
There are many problems in XML with the databases as well. Internet applications and the new ways in which they handle data are changing the role of database theory and its relationship with practice. Three particular problems XML have with work with databases, XML publishing, XML typechecking and XML storage, Dan Suciu (2001). XML publishing problem is a normalization problem, like relational database is normalized and at the other side XML schema is also formed into tree form, so it may conflict with the internal organizational rules. XML type checking is another problem with XML, because XML schema or DTD involve in typechecking dynamically and error chances can be high by run time. XML storage is a problem because the data is labelled tree in XML format and in relational databases the data is in the tables so the problem of storing data in one or several tables is a challenge and it cause problems some times.
XML as Data Model
The concept of data model is that it should have structure, rules to implement, and data manipulation. With the structure aspect information modeling come into consideration, which is about understanding the structure and meaning of the information carried out in documents. As by, Richard et el (a, 2000) the first rule of information modeling is to focus on the real-world, not the technology. . An information model is a description of the information used in an organization, independent of any IT systems. Object Exchange Model is used for semi-structure data. By definition in XML there is not very specific information modeling that is why we considered it as a semi-structure. For the rules specification, document design is the term we chose as by Richard et el (b, 2000) describe, which is about translating your information model into set of rules for creating actual document, we also called schema as rules for XML. There are also different approaches use for making DMMS for semi-structure data like LORE and LOREL (Thomas et al., 2002). We have two kinds of data in the system in XML, data stores that hold long-term persistence data for reference purpose, and the other is massage flow, which move transient information from one subsystem to another. We have DTD and XML-Schema for rules implementation in XML, and it works quite normal as by the definition of rules in data modeling.
There are several issues that commonly arise. The first issue that most traditional conceptual and implementation models (LDS) and ER relational implementation models require data items to be single valued, Paul J (2003). The XML does not enforce this. In XML there can be multi element instances for each element, only a single data value can be specified for a given attribute, while there can be multiple instances of an element in XML. Multi-valued relationship is also a problem in XML. As XML doesn’t provide the direct access in documents, if the requirement is to access a specific portion of data in a document, it will parse the whole document and if the document is containing huge amount of data it can take hours to parse the whole document. Creating multiple document files with some specific data can solve this problem. To link all the documents in an order index file can be use and XSLT is the best solution for creating such kind of applications. In case of relational databases the database can be use to hold the XML file reference. XML Server is another option to use in relation databases. It holds the data not in raw unparsed form but in form of a persistent DOM. An XML-Schema is often not in first normal form. DOM is used to manipulate an XML text file, the first thing it does is parse the file, breaking the file out into individual elements, attributes, comments, and so on. The manipulation and rules implementation served better in DOM. Working with relational databases and XML primary and foreign keys can be used but XML will treat the whole data in same manner, means without considering the primary and foreign key concept. For using keys concept XML has ID and IDREF attributes. ID works as a primary key in documents and IDREF works as a foreign key in documents.
XML has many advantages over other transmission formats such as flat files and database dumps. One of the advantages is XML files are platform-independent, an XML file can be read and understand by virtually any system simply by reading the text and parsing it into a nod tree using the DOM implementation on the system. Secondly XML files are self-describing, a well designed XML file needs little external documentation to decipher. Finally it shows the hierarchical information in a natural way via nod tree.
Wuwongse (2003) suggest, XML should generalize of an element by incorporation of variable for representation of implicit information and enhancement of its expressive power (variable-free expression). Axioms and relationship among elements in the collection as well as the structural integrity constraints fare formalized, complex query support. As considering, weightier XML is a data model or not, so I think the model of XML is semi-structure as stated above and it is not a complete data model, but it can act like data model and rightly so, to some extents.
Conclusion
Different other programming languages are using XML now days due to its open-end structure. Compared to the object-oriented one, the relational model is not the most appropriate for storing XML data. The only reason is when it just stores XML data then it convert into plain text, so many thing wrong can happened with it, like relations. Technology shows that there are many interesting problems concerned with the use of XML and databases together. The described use of XML and its associated technologies in databases and its importance in data warehousing, I come to know that we just can not ignore XML’s importance, the only thing which is needed that the future work of XML may make it as a complete data model and the work with other technologies. We can take it as semi-structure data, which use Object Exchange Model for data modeling. This model actually used for semi-structure data models and use different approaches to make its use in DBMS. Using XML source for feeding data warehouse systems will become a standard in the next few years. Much of work has been done and still in progress by different people and organizations to make a maximum use of XML with other technologies. The only reason is of its success is reusability and flexibility.
|