nr 16 Współczesne trendy w informatyce ekonomicznej
Tytuł artykułu
Warianty tytułu
Information integration from the Web
Języki publikacji
Rosnąca rola informacji w gospodarce nie podlega obecnie dyskusji. Dla opisania jej obecnego wpływu ukuto nawet pojęcia: nowa ekonomia, ekonomia sieciowa czy też ekonomia informacji. Prace noblistów Akerlofa czy Stiglitza, powstające od lat 70., poświęcone są oddziaływaniu informacji na rynki. Im więcej informacji jest wymienianych, tym bardziej uwidacznia się ich wpływ na gospodarkę. Stało się to szczególnie zauważalne w okresie rozwoju Internetu - kolejnego medium, dzięki któremu dostępność informacji i jej wymiana wzniosła się na zupełnie nowy poziom. Ze względu na rozległość i zwielokrotnienie informacji występującej w Internecie ważne staje się jej filtrowanie oraz dostęp do właściwych źródeł w możliwie krótkim czasie. Ma to istotny wpływ na podejmowanie decyzji przez podmioty gospodarcze, a przez to na całokształt stosunków ekonomicznych.(abstrakt oryginalny)
One of the most important issues when processing information obtained from the Web is its integration. It is vital for both companies and customers using Internet environment for commercial purposes. The knowledge emerging from comparison and aggregation requires integration of several heterogeneous information sources. Currently existing retrieval systems index only the surface of the Web, while the wealth of data lies in the "hidden Web" - pages that are generated from databases and accessible only via web forms or more sophisticated interfaces. The research conducted by Kevin Chang and his colleagues in 2004 show that only 5% of this "invisible Web" is properly indexed by most popular retrieval engines, while the amount of information buried in the "invisible Web" is estimated to exceed the accessible Web pages many times. The problem of accessing the "hidden Web" resources is closely related to that of integration of databases. The latter has been extensively studied in the database community for years. The main problem there is the heterogeneity of data sources. It maybe defined on the level of database interface, database schema, data formats, data model (relational, hierarchical or object), data scope and subject, or even emerging from software platform and concurrency handling within particular systems. The approaches to information integration may be split into two categories: vertical integration and horizontal integration. Solutions taking the former try to bring together the sources which deal with similar topics and whose scope cover. The latter is devoted to combining sources of different information into single, broader view. The problem of integration or relational sources (most popular category) maybe reduced to the problem of finding a mapping between elements of the sources' schemas. Such a mapping may be defined for physical, logical and presentation layers. Apart from well-known problems of data integration, the Web environment has its own peculiarities. Therefore the problem of information integration from the Web has several issues: identification of the sources, discovering their schemas, integration of the schemas, defining the source query capabilities and restrictions, the choice of sources when answering the query, query translation for all the selected sources, data extraction from Web pages, and consolidation of the result. (original abstract)
Opis fizyczny
- Akkiraju R., Farrell J., Miller J., Nagarajan M., Schmidt M., Sheth A., Verma K., Web Service Semantics - WSDL-S, " A joint UGA-IBM Technical Note" 2005.
- Batini C., Lenzerini M., Navathe S.B., A comparative analysis of methodologies for database schema integration, "ACM Computing Surveys" 1986, Vol. 18, s. 323-364.
- Chang K.C., He B., Li C., Patel M., Zhang Z., Structured databases on the web: observations and implications, "SIGMOD Record" 2004, Vol. 33, s. 61-70.
- Cohen W.W., Data integration using similarity joins and a word-based information representation language, "ACM Trans. Inf. Syst." 2000, Vol. 18, s. 288-321.
- Dimopoulos Y., Kakas A., Information Integration and Computational Logic, "Computational Logic, Special Issue: Technological Roadmap for CL" 2001, s. 105-135.
- Gravano L., Papakonstantinou Y., Mediating and Metasearching on the Internet, "Data Engineering Bulletin" 1998, Vol. 21, s. 28-36.
- Hull R., Managing Semantic Heterogeneity in Databases: a theoretical prospective, ACM Press s. 51-61.
- lizuka Y., Tsunakawa M., Seo S., Ikeda T., An approach to integration of Web information source search and Web information retrieval, ACM Press 2000, s. 289-293.
- Knoblock C A., Minton S., Ambite J.L., Ashish N., Muslea I., Philpot A., Tejada S., The Ariadne Approach to Web-Based Information Integration International, "Journal of Cooperative Information Systems" 2001, Vol. 10, s. 145-169.
- Raghavan S., Molina H.G., Crawling the Hidden Web, w: Proceedings of the 27th International Conference on Very Large Databases 2001.
- Rahm E., Bernstein P.A., A survey of approaches to automatic schema matching, "VLDB Journal" 2001, Vol. 10, No 4, s. 334-350.
- Widom J., Integrating heterogeneous databases: lazy or eager?, "ACM Computing Surveys" 1996, Vol. 28, s. 91.
- Wiederhold G., Huhns M.N., Singh M.P., Mediators in the Architecture of Future Information Systems, "IEEE Computer" 1992, s. 39-49.
Typ dokumentu
Identyfikator YADDA