|
GLOSSARY of DATABASE TERMS
ARANGEN
3GL
B2B
B2C
B2P
BI
CDI
CICS
CORBA
CRM
CRP
CSS
CWM
DBMS
DCOM
DSS
EAI
EII
EIS
ERP
ETL
GIOP
IIOP
JDBC
METADATA
MDM
MOF
MRP
ODBC
OMG
ORB
OTLP
RPC
SFA
SIC
SOA
SOAP
SQL
STP
UML
XMI
XML
ARANGEN (Middle English) to put into proper order or into a correct or suitable sequence, relationship, or adjustment. [Merriam-Webster’s Ninth New Collegiate Dictionary, 1986]
3GL A third generation language (3GL) is a programming language designed to be easier for a
human to understand, including things like named variables. A fragment might be:
let b = c + 2 * d
Fortran, ALGOL and COBOL are early examples of this sort of language. Most
"modern" languages (BASIC, C, C++, Delphi, Java, and including COBOL, Fortran,
ALGOL) are third generation. Most 3GLs support structured programming.
B2B Business-to-business electronic commerce (B2B) typically takes the form of automated
processes between trading partners and is performed in much higher volumes than
business-to-consumer (B2C) applications. For example, a company that makes chicken
feed would sell it to a chicken farm, another company, rather than directly to consumers.
An example of a B2C transaction would be a consumer buying grain-fed chickens at a
grocery store. B2B can also encompass marketing activities between businesses, and not
just the final transactions that result from marketing. B2B also is used to identify sales
transactions between business. For example a company selling photocopiers would likely
be a B2B sales organisation as opposed to a B2C (business to consumer) sales
organisation.
B2C Business-to-consumer (B2C), also business-to-customer, describes activities of
commercial organizations serving the end consumer with products and/or services.
B2P Business-to-Partner (B2P), describes the activities of commercial organizations providing
access to their on-line resources for their partners.
BI Business intelligence (BI) relates to the intelligence as information valued for its currency
and relevance. It is expert information, knowledge and technologies efficient in the
management of organizational and individual business. Therefore, in this sense, business
intelligence is a broad category of applications and technologies for gathering, providing
access to, and analyzing data for the purpose of helping enterprise users make better
business decisions. The term implies having a comprehensive knowledge of all of the
factors that affect your business. It is imperative that you have an in depth knowledge
about factors such as your customers, competitors, business partners, economic
environment, and internal operations to make effective and good quality business
decisions. Business intelligence enables you to make these kinds of decisions.
CDI (Customer Data Integration) is the combination of the technology, processes and services
needed to create and maintain an accurate, timely and complete & comprehensive
representation of a customer across multiple channels, business lines, and enterprises
typically where there are multiple sources of associated data in multiple application
systems and databases. CDI is commonly used in Master Data Management, and enables
access to information describing everything known about a customer including all
attributes and cross references, along with the critical definition and identification
necessary to uniquely differentiate one similar customer from another. Customer Data
Integration relies heavily on the standardization of data and overall data quality.
Therefore, large corporations and those with large amounts of data often set up data
governance teams to manage the CDI process.
CICS (Customer Information Control System) is a transaction server that runs primarily on
IBM mainframe systems under z/OS or z/VSE. CICS is available for other operating
systems, notably i5/OS, OS/2, and as the closely related IBM TXSeries software on AIX,
Windows, and Linux, among others. The z/OS implementation is by far the most popular
and significant.
CICS is a transaction processing system designed for both online and batch activity. On
large IBM zSeries and System z9 servers, CICS easily supports thousands of transactions
per second, making it a mainstay of enterprise computing. CICS applications can be
written in numerous programming languages, including COBOL, PL/I, C, C++,
Assembler, REXX, and Java.
CORBA In computing, Common Object Request Broker Architecture (CORBA) is a standard for
software componentry, created and controlled by the Object Management Group (OMG).
It defines APIs, communication protocol, and object/service information models to
enable heterogeneous applications written in various languages running on various
platforms to interoperate. CORBA therefore provides platform and location transparency
for sharing well-defined objects across a distributed computing platform.
In a general sense CORBA “wraps” code written in some language into a bundle
containing additional information on the capabilities of the code inside, and how to call it.
The resulting wrapped objects can then be called from other programs (or CORBA
objects) over the network. In this sense, CORBA can be considered as a
machine-readable documentation format, similar to a header file but with considerably
more information.
CORBA uses an interface definition language (IDL) to specify the interfaces that objects
will present to the world. CORBA then specifies a “mapping” from IDL to a specific
implementation language like C++ or Java. This mapping precisely describes how the
CORBA data types are to be used in both client and server implementations. Standard
mappings exist for Ada, C, C++, Lisp, Smalltalk, Java, and Python. There are also
non-standard mappings for Perl and Tcl implemented by ORBs written for those
languages.
CRM The generally accepted purpose of Customer Relationship Management (CRM) is to
enable organizations to better manage their customers through the introduction of reliable
processes and procedures for interacting with those customers.
In today's competitive business environment, a successful CRM strategy cannot be
implemented by only installing and integrating a software package designed to support
CRM processes. A holistic approach to CRM is vital for an effective and efficient CRM
policy. This approach includes training of employees, a modification of business
processes based on customers' needs and an adoption of relevant IT-systems (including
soft- and maybe hardware) and/or usage of IT-Services that enable the organization or
company to follow its CRM strategy. CRM-Services can even replace the acquisition of
additional hardware or CRM software-licences.
The term CRM is used to describe either the software or the whole business strategy (or
lack of one) oriented on customer needs. The second one is the description which is
correct. The main misconception of CRM is that it is only software, instead of whole
business strategy.
Major areas of CRM focus on service automated processes, personal information
gathering and processing, and self-service. It attempts to integrate and automate the
various customer serving processes within a company.
Architecture of CRM
There are three parts of application architecture of CRM:
* operational - automation to the basic business processes (marketing, sales, service)
* analytical - support to analyze customer behavior, implements business intelligence
alike technology
* co-operational - ensures the contact with customers (phone, email, fax, web, sms, post,
in person)
Operational CRM
Operational CRM means supporting the so-called "front office" business processes,
which include customer contact (sales, marketing and service). Tasks resulting from these
processes are forwarded to employees responsible for them, as well as the information
necessary for carrying out the tasks and interfaces to back-end applications are being
provided and activities with customers are being documented for further reference.
Operational CRM provides the following benefits:
* Delivers personalized and efficient marketing, sales, and service through multi-channel
collaboration
* Enables a 360-degree view of your customer while you are interacting with them
* Sales people and service engineers can access complete history of all customer
interaction with your company, regardless of the touch point
According to Gartner Group, the operational part of CRM typically involves three general
areas of business:
* Sales force automation (SFA): SFA automates some of the company's critical sales and
sales force management functions, for example, lead/account management, contact
management, quote management, forecasting, sales administration, keeping track of
customer preferences, buying habits, and demographics, as well as sales staff
performance. SFA tools are designed to improve field sales productivity. Key
infrastructure requirements of SFA are mobile synchronization and integrated product
configuration.
* Customer service and support (CSS): CSS automates some service requests,
complaints, product returns, and information requests. Traditional internal help desk and
traditional inbound call-center support for customer inquiries are now evolved into the
"customer interaction center" (CIC), using multiple channels (Web, phone/fax,
face-to-face, kiosk, etc). Key infrastructure requirements of CSS include computer
telephony integration (CTI) which provides high volume processing capability, and
reliability.
* Enterprise marketing automation (EMA): EMA provides information about the
business environment, including competitors, industry trends, and macroenviromental
variables. It is the execution side of campaign and lead management. The intent of EMA
applications is to improve marketing campaign efficiencies. Functions include
demographic analysis, variable segmentation, and predictive modeling occur on the
analytical (Business Intelligence) side.
Integrated CRM software is often also known as "front office solutions." This is because
they deal directly with the customer.
Many call centers use CRM software to store all of their customer's details. When a
customer calls, the system can be used to retrieve and store information relevant to the
customer. By serving the customer quickly and efficiently, and also keeping all
information on a customer in one place, a company aims to make cost savings, and also
encourage new customers.
CRM solutions can also be used to allow customers to perform their own service via a
variety of communication channels. For example, you might be able to check your bank
balance via your WAP phone without ever having to talk to a person, saving money for
the company, and saving you time.
Analytical CRM
In analytical CRM, data gathered within operational CRM are analyzed to segment
customers or to identify cross- and up-selling potential. Data collection and analysis is
viewed as a continuing and iterative process. Ideally, business decisions are refined over
time, based on feedback from earlier analysis and decisions. Business Intelligence offers
some more functionality as separate application software.
Collaborative CRM
Collaborative CRM facilitates interactions with customers through all channels (personal,
letter, fax, phone, web, e-mail) and supports co-ordination of employee teams and
channels. It is a solution that brings people, processes and data together so companies can
better serve and retain their customers. The data/activities can be structured,
unstructured,conversational, and/or transactional in nature.
Collaborative CRM provides the following benefits:
* Enables efficient productive customer interactions across all communications channels
* Enables web collaboration to reduce customer service costs
* Integrates call centers enabling multi-channel personal customer interaction
* Integrates view of the customer while interaction at the transaction level
Improving customer service
CRMs are to improve customer service. Proponents say they can improve customer
service by facilitating communication in several ways:
* Provide product information, product use information, and technical assistance on web
sites that are accessible 24 hours a day, 7 days a week.
* Help to identify potential problems quickly, before they occur.
* Provide a user-friendly mechanism for registering customer complaints (complaints that
are not registered with the company cannot be resolved, and are a major source of
customer dissatisfaction).
* Provide a fast mechanism for handling problems and complaints (complaints that are
resolved quickly can increase customer satisfaction).
* Provide a fast mechanism for correcting service deficiencies (correct the problem before
other customers experience the same dissatisfaction).
* Identify how each individual customer defines quality, and then design a service
strategy for each customer based on these individual requirements and expectations.
* Use internet cookies to track customer interests and personalize product offerings
accordingly.
* Use the Internet to engage in collaborative customization or real-time customization
* Provide a fast mechanism for managing and scheduling followup sales calls to assess
post-purchase cognitive dissonance, repurchase probabilities, repurchase times, and
repurchase frequencies.
* Provide a fast mechanism for managing and scheduling maintenance, repair, and
on-going support (improve efficiency and effectiveness).
* Provide a mechanism to track all points of contact between a customer and the
company, and do it in an integrated way so that all sources and types of contact are
included, and all users of the system see the same view of the customer (reduces
confusion).
* The CRM can be integrated into other cross-functional systems and thereby provide
accounting and production information to customers when they want it.
Improving customer relationships
CRMs are also claimed to be able to improve customer relationships . Proponents say this
is so because:
* CRM technology can track customer interests, needs, and buying habits as they
progress through their life cycles, and tailor the marketing effort accordingly. This way
customers get exactly what they want as they change.
* The technology can track customer product use as the product progresses through its
life cycle, and tailor the service strategy accordingly. This way customers get what they
need as the product ages.
* In industrial markets, the technology can be used to micro-segment the buying centre
and help coordinate the conflicting and changing purchase criteria of its members.
* When any of the technology-driven improvements in customer service (mentioned
above) contribute to long-term customer satisfaction, they can ensure repeat purchases,
improve customer relationships, increase customer loyalty, decrease customer turnover,
decrease marketing costs (associated with customer acquisition and customer “training”),
increase sales revenue, and thereby increase profit margins.
Technical functionality
A CRM solution is characterised by the following functionality:
* scalability - the ability to be used on a large scale, and to be reliably expanded to
whatever scale is necessary.
* multiple communication channels - the ability to interface with users via many different
devices (phone, WAP, internet, etc)
* workflow - the ability to trigger a process in the backoffice system, e. g. Email
Response, ...
* assignment - the ability to assign requests (Service Requests, Sales Opportunities) to a
person or group.
* database - the centralised storage (in a data warehouse) of all information relevant to
customer interaction
* customer privacy considerations, e.g. data encryption and the destruction of records to
ensure that they are not stolen or abused
Privacy and ethical concerns
CRMs are not however considered universally good - some feel it invades customer
privacy and enable coercive sales techniques due to the information companies now have
on customers - see persuasion technology. However, CRM does not necessarily imply
gathering new data, it can be used merely to make "better use" of data the corporation
already has. But in most cases they are used to collect new data.
Some argue that the most basic privacy concern is the centralised database itself, and that
CRMs built this way are inherently privacy-invasive. See the commercial version of the
debate over the carceral state, e.g. Total Information Awareness program of the United
States federal government.
Setting up a framework for CRM
* When you start setting up your CRM segment for your business you first want to see
what profile aspects you feel are relevant to your business. Which information will
provide you the keys to serve your customers in the best way possible? If you can look at
your financial history for this information then what would you have liked to know about
your customers in the past? What would have been the effects? And what information is
not useful? Being able to eliminate unwanted information is a big aspect in implementing
your CRM systems
* When designing your CRM's structure, always remember who your primary customers
are. You want to keep more extensive information on them because they are your
high-margin customers. You can keep less extensive details on the clients you identify as
“low-margin”.
CRM in Business
In this day and age the use of internet sites and specifically e-mail, in particular, are
touted as less expensive communication methods, compared to traditional methods like
telephone calls. This revolutionary type of service can be very helpful, but it is
completely useless if you are having trouble reaching your customers. It has been
determined by some major companies that the majority of clients trust other means of
communication, like telephone, more than they trust e-mail. Clients, however, are not the
ones to blame because it is often the manner of connecting with consumers on a personal
level making them feel as though they are cherished as customers. It is up to the
companies to focus on reaching every customer and developing a relationship.
CRM software can run your entire business. From prospect and client contact tools to
billing history and bulk email management. The CRM system allows you to maintain all
customer records in one centralized location that is accessible to your entire organization
through password administration. Front office systems are set up to collect data from the
customers for processing into the data warehouse. The data warehouse is a back office
system used to fulfill and support customer orders. All customer information is stored in
the data warehouse. Back office CRM makes it possible for a company to follow sales,
orders, and cancellations. Special regressions of this data can be very beneficial for the
marketing division of a firm.
CRP Capacity Requirements Planning is a computerized technique for projecting resource
requirements for critical work stations. It is a tool for:
determining capacity that is available and required.
Alleviating bottleneck work centers.
Helping planners make the right decisions on scheduling before problems
develop.
It verifies that you have sufficient capacity available to meet the capacity requirements
for MRP plans.
CSS Customer service and support (CSS): CSS automates some service requests, complaints,
product returns, and information requests. Traditional internal help desk and traditional
inbound call-center support for customer inquiries are now evolved into the "customer
interaction center" (CIC), using multiple channels (Web, phone/fax, face-to-face, kiosk,
etc). Key infrastructure requirements of CSS include computer telephony integration
(CTI) which provides high volume processing capability, and reliability.
CWM The Common Warehouse Metamodel (CWM) is a specification for modeling metadata
for relational, non-relational, multidimensional systems, and most other objects found in
a data warehousing environment. In addition, CWM models enable users to trace the
lineage of data – CWM provides objects that describe where the data came from and
when and how the data was created. Instances of the metamodel are exchanged via XMI
(XML Metadata Interchange) documents.
DBMS A database management system (DBMS) is a computer program (or more typically, a suite of them) designed to manage a database (a large set of structured data), and run
operations on the data requested by numerous clients. Typical examples of DBMS use
include accounting, human resources and customer support systems. Originally found
only in large organizations with the computer hardware needed to support large data sets,
DBMSs have more recently emerged as a fairly standard part of any company back
office.
DBMS's are found at the heart of most database applications. Sometimes DBMSs are
built around a private multitasking kernel with built-in networking support although
nowadays these functions are left to the operating system.
DCOM Distributed Component Object Model (DCOM) is a Microsoft proprietary technology for
software components distributed across several networked computers to communicate
with each other. It extends Microsoft's COM, and provides the communication substrate
under Microsoft's COM+ application server infrastructure. It has been deprecated in favor
of Microsoft .NET.
The addition of the "D" to COM was due to extensive use of DCE/RPC - more
specifically Microsoft's enhanced version, known as MSRPC.
In terms of the extensions it added to COM, DCOM had to solve the problems of
* Marshalling - serializing and deserializing the arguments and return values of method
calls "over the wire".
* Distributed garbage collection - ensuring that references held by clients of interfaces are
released when, for example, the client process crashed, or the network connection was
lost.
One of the key factors in solving these problems is the use of DCE/RPC as the underlying
RPC mechanism behind DCOM. DCE/RPC has strictly defined rules regarding
marshalling and who is responsible for freeing memory.
DCOM was a major competitor to CORBA. Proponents of both of these technologies saw
them as one day becoming the model for code and service-reuse over the Internet.
Ironically, however, the difficulties involved in getting either of these technologies to
work over Internet firewalls, and on unknown and insecure machines, meant that normal
http requests in combination with web browsers won out over both of them. This despite
Microsoft's attempts to add an extra transport - Network Computing Architecture,
Connection-based, over HTTP aka ncacn_http - to DCE/RPC, which was made available
seamlessly to DCOM services.
Alternate versions and implementations
The Open Group have a DCOM implementation called COMsource. The source code is
available for COMsource, along with full and complete documentation, sufficient to use
and also sufficient to implement an interoperable version of DCOM. According to that
documentation, COMsource comes directly from the Windows NT 4.0 source code, and
even includes the source code for a Windows NT Registry Service.
The Wine Team are also implementing DCOM. They are doing so for binary
interoperability purposes, and are not currently interested in the networking side of
DCOM, which is provided by MSRPC. They are restricted to implementing NDR
(Network Data Representation) through Microsoft's API, but are committed to making it
as compatible as possible with MSRPC.
DSS Decision support systems are a class of computerized information systems that support
decision making activities.
EAI Enterprise Application Integration (EAI) is the use of software and architectural
principles to bring together (integrate) a set of enterprise computer applications. It is an
area of computer systems architecture that gained wide recognition from about 2004
onwards. EAI is related to middleware technologies such as message-oriented
middleware MOM, and data representation technologies such as XML. Newer EAI
technologies involve using web services as part of service-oriented architecture as a
means of integration. Enterprise Application Integration tends to be data centric. In
coming years it will come to include content integration and business processes.
Without integration, enterprise computing often takes the form of islands of automation,
where the value of individual systems is not maximised because they are working in
partial or full isolation. However if integration is carried out without following a
structured EAI approach, many point-to-point connections grow up across an
organisation. Dependencies are added on an ad-hoc basis, resulting in a tangled
unmaintainable mess, commonly referred to as spaghetti, a comparison to the
programming equivalent known as spaghetti code.
The number of n connections needed to have a fully-meshed point-point connections is
given by (n * (n-1)) / 2. Thus for 10 applications to be fully integrated point-to-point,
(10 * 9) / 2, or 45 point-to-point connections are needed.
Current thinking is that the best approach to EAI is to use an Enterprise service bus
(ESB) to connect numerous separate systems together. Other approaches have been
explored, connecting at the database level or at the user-interface level. However, the
ESB approach has generally been adopted as the strategic winner. Individual applications
can publish messages to the bus, and also subscribe to receive certain messages from the
bus.
With EAI each application only requires one connection, which is to the bus. Attending
to EAI involves looking at the system of systems. Such message bus approaches can be
extremely scalable, and also highly evolvable.
EAI is not just about sharing data between applications. EAI focuses on sharing both
business data and business process.
EII is the industry acronym for Enterprise Information Integration. It describes the process of
using data abstraction to address the data access challenges associated with data
heterogeneity and data contextualization. Data is the foundation upon which the
"Information Age" and critical components such as the burgeoning Web 2.0 and a future
Semantic Web are being built. Uniform data access and uniform information
representation are critical aspects of this journey.
Data takes many forms within an enterprise, but it is safe to identify the following forms
as most dominant:
* SQL - as result of the prominence of Relational Databases in modern business
applications
* Non SQL Data - most dominant in legacy mainframe environments with a variety of
proprietary storage, indexing, and data access methods.
Irrespective of data form, the issue of data access is pivotal en route to producing
Information; hence the emergence of standardized Data Access APIs such as ODBC,
JDBC, OLE DB, and more recently ADO.NET.
Standardization of Data Access APIs and the emergence of XML as a universal
representation format, collectively provide a foundation for Information creation,
persistence, and dissemination. It is this capability expressed via a software offering that
describes an EII product.
Product Characteristics
An EII product offers virtualization of heterogeneous data where data takes the form of
SQL, XML, Data-returning Web services, and other URI-referencable resources. Such
SQL data is typically accessible via ODBC, JDBC, ADO.NET, OLE DB. XML is
generally URI based, and is thus accessible via WebDAV.
EII, Virtual Database, and Universal Server products are more alike than different. In all
cases, they provide single -- homogenous -- data representations and/or access points
(SQL, ODBC, JDBC, ADO.NET, XML, or Web Services) for disparate data sources. For
instance, a single JDBC or ODBC or XML resource URI could provide access to data in
several relational database tables, each associated with a different database engine, from a
different database vendor, and associated with a myriad of enterprise applications.
EII products enable loose coupling between homogenous-data consuming client
applications and services and heterogeneous-data stores. Such client applications and
services include Desktop Productivity Tools (Spreadsheets, Word processors,
Presentation Software, etc.), Development Environments and Frameworks (J2EE, .NET,
Mono, SOAP or RESTian Web services, etc.), business intelligence (BI), business
activity monitoring (BAM) software, enterprise resource planning (ERP), Customer
Relationship Management (CRM), Business Process Management (BPM and/or BPEL)
Software, Web Content Management.
Utilization Mechanics
The steps that follow are common across all EII product offerings. Naturally, the
implementation specifics will differ on a per vendor basis.
1. Determine shape and form of information to be processed
2. Identify associated data sources
3. Create EII product references (data source linking process) for respective data sources
4. Process information - for instance via a discrete or composite Web Service, Dynamic
HTML/XHTML/XML Web Page, XML transformation (e.g RSS/Atom/RDF feed) etc.
EIS An Executive Information System (EIS) is a computer-based system intended to facilitate
and support the information and decision making needs of senior executives by providing
easy access to both internal and external information relevant to meeting the strategic
goals of the organization. It is commonly considered as a specialized form of Decision
Support System (DSS).
The emphasis of EIS is on graphical displays and easy-to-use user interfaces. They offer
strong reporting and drill-down capabilities. In general, EIS are enterprise-wide DSS that
help top-level executives analyze, compare, and highlight trends in important variables so
that they can monitor performance and identify opportunities and problems. EIS and data
warehousing technologies are converging in the marketplace.
ERP Enterprise resource planning systems (ERPs) are management information systems that
integrate and automate many of the business practices associated with the operations or
production aspects of a company.
Overview
Enterprise resource planning is a term derived from manufacturing resource planning
(MRP II) that followed material requirements planning (MRP). ERP systems typically
handle the manufacturing, logistics, distribution, inventory, shipping, invoicing, and
accounting for a company. Enterprise Resource Planning or ERP software can aid in the
control of many business activities, like sales, delivery, billing, production, inventory
management, and human resources management.
ERPs are often called back office systems indicating that customers and the general
public are not directly involved. This is contrasted with front office systems like customer
relationship management (CRM) systems that deal directly with the customers, or the
eBusiness systems such as eCommerce, eGoverment, eTelecom, and eFinance, or
supplier relationship management (SRM) systems that deal with the suppliers.
ERPs are cross-functional and enterprise wide. All functional departments that are
involved in operations or production are integrated in one system. In addition to
manufacturing, warehousing, logistics, and Information Technology, this would include
accounting, human resources, marketing, and strategic management.
In the early days of business computing, companies used to write their own software to
control their business processes. This is an expensive approach. Since many of these
processes occur in common across various types of businesses, common reusable
software may provide cost-effective alternatives to custom software. Thus some ERP
software caters to a wide range of industries from service sectors like software vendors
and hospitals to manufacturing industries and even to government departments.
Implementation
Because of their wide scope of application within the firm, ERP software systems rely on
some of the largest bodies of software ever written. Implementing such a complex and
huge software system in a company usually involves an army of analysts, programmers,
and users, and often comprises a very expensive project in itself for bigger companies,
especially transnationals.
Enterprise resource planning systems are often closely tied to supply chain management
and logistics automation systems. Supply chain management software can extend the
ERP system to include links with suppliers.
To implement ERP systems, companies often seek the help of an ERP vendor or of
third-party consulting companies. Consulting in ERP involves two levels, namely
business consulting and technical consulting. A business consultant studies an
organization's current business processes and matches them to the corresponding
processes in the ERP system, thus 'configuring' the ERP system to the organisation's
needs. Technical consulting often involves programming. Most ERP vendors allow
changing their software to suit the business needs of their customer.
ETL Extract, transform, and load (ETL) is a process in data warehousing that involves
* extracting data from outside sources,
* transforming it to fit business needs, and ultimately
* loading it into the data warehouse.
ETL is important, as it is the way data actually gets loaded into the warehouse. This
article assumes that data is always loaded into a data warehouse, whereas the term ETL
can in fact refer to a process that loads any database.
Extract
The first part of an ETL process is to extract the data from the source systems. Most data
warehousing projects consolidate data from different source systems. Each separate
system may also use a different data organization / format. Common data source formats
are relational databases, and flat files, but other source formats exist. Extraction converts
the data into records and columns (aka fields).
Transform
The transform phase applies a series of rules or functions to the extracted data to derive
the data to be loaded. Some data sources will require very little manipulation of data.
However, in other cases any combination of the following transformations types may be
required:
* Select only certain columns to load (or if you prefer, null columns not to load)
* Translate coded values (e.g. If the source system stores M for male and F for female but
the warehouse stores 1 for male and 2 for female)
* Derive a new calculated value (e.g. sale_amount = qty * unit_price)
* Join together data from multiple sources (e.g. lookup, merge, etc)
* Summarize multiple rows of data (e.g. total sales for each region)
* Generate a Surrogate_key value
* Transpose / cross tabulate (turn multiple columns into multiple rows or vice versa)
Load
The load phase loads the data into the data warehouse. Depending on the requirements of
the organization, this process ranges widely. Some data warehouses merely overwrite old
information with new data. More complex systems can maintain a history and audit trail
of all changes to the data.
Challenges
ETL processes can be quite complex, and significant problems can occur. Improperly
designed ETL systems or an unexpected change in format of one of the source systems
can cause serious problems in the ETL process potentially destroying or corrupting
significant amounts of data in the target system. An additional difficulty is making sure
the data being uploaded is relatively consistent. Since multiple source databases all have
different update cycles (some may be updated every few minutes, while others may take
days or weeks), an ETL system may be required to hold back certain data until all sources
are synchronized.
Tools
While an ETL process can be created using almost any programming language, creating
them from scratch is quite complex. Increasingly, companies are buying ETL tools to
help in the creation of ETL processes.
A good ETL tool must be able to communicate with the many different relational
databases and read the various file formats used throughout an organization. ETL tools
have started to migrate into Enterprise Application Integration, or even Enterprise Service
Bus, systems that now cover much more than just the extraction transformation and
loading of data. Many ETL vendors now have data profiling, data quality and metadata
capabilities.
GIOP In distributed computing, GIOP (General Inter-ORB Protocol) is the abstract protocol by
which Object request brokers (ORBs) communicate. Standards associated with the
protocol are maintained by the Object Management Group (OMG).
IIOP (Internet Inter-Orb Protocol) is the implementation of GIOP for TCP/IP.
JDBC Java Database Connectivity, or JDBC, is an API for the Java programming language that
defines how a client may access a database. (To be strictly correct, JDBC is not an
acronym.) It provides methods for querying and updating data in a database. JDBC is
oriented towards relational databases.
JDBC allows multiple implementations to exist and be used by the same application. The
API provides a mechanism for dynamically loading the correct Java packages and
registering them with the JDBC Driver Manager. The DriverManager is used as a
connection factory for creating JDBC connections.
JDBC connections support creating and executing statements. These statements may be
update statements such as SQL INSERT, UPDATE and DELETE or they may be query
statements using the SELECT statement. Additionally, stored procedures may be invoked
through a statement. Statements are one of the following types:
* Statement - the statement is sent to the database server each and everytime.
* PreparedStatement - the statement is compiled on the database server allowing it to be
executed multiple times in an efficient manner.
* CallableStatement - used for executing stored procedures on the database.
Update statements such as INSERT, UPDATE and DELETE return an update count that
indicates how many rows were affected in the database. These statements do not return
any other information.
Query statements return a JDBC row result set. The row result set is used to walk over the
result set. Individual columns in a row are retrieved either by name or by column number.
There may be any number of rows in the result set. The row result set has metadata that
describes the names of the columns and their types.
There is an extension to the basic JDBC API that allows for scrollable result sets and
cursor support among other things. Refer to the SUN documentation [2] for more details.
METADATA (Greek: meta-+ Latin: data "information"), literally "data about data", is information that describes another set of data. A common example is a library catalog
card, which contains data about the contents and location of a book: It is data about the
data in the book referred to by the card. Other common contents of metadata include the
source or author of the described dataset, how it should be accessed, and its limitations.
Other machine generated data about data, such as the reversed index created by a free-text
search engine is generally not considered as metadata. Another important type of data
about data is the links or relationship among data. Some metadata scheme attempts to
embrace this concept (such as Dublin Core element link). Since metadata is also data, it is
possible to have "metadata of the metadata of data".
The metadata which is embedded with content is called embedded metadata. A data
repository typically stores the metadata detached from the data.
MDM Master Data Management (MDM), also known as Reference Data Management, is a
discipline in Information Technology (IT) that focuses on the management of reference or
master data that is shared by several disparate IT systems and groups. MDM is required
to warrant consistent computing between diverse system architectures and business
functions.
Large companies often have IT systems that are used by diverse business functions (e.g.,
finance, sales, R&D, etc.) and span across multiple countries. These diverse systems
usually need to share key data that is relevant to the parent company (e.g., products,
customers, and suppliers). It is critical for the company to consistently use these shared
data elements through various IT systems.
MDM also becomes important when two or more companies want to share data across
corporate boundaries. In this case, MDM becomes an industry issue such as is the case
with the Finance industry and the required STP (Straight Through Processing) or [[T+1]].
In the Y computing model, MDM is one of three computing types (OLTP transactional
computing (typically ERP), DSS (Decision Support Systems) and MDM). These types
range from operational reporting to EIS (Executive Information Systems). Master data
management is not only required to coordinate different ERP systems, but also necessary
to supply meta-data for aggregating and integrating transactional data. This use of MDM
is necessary for Data Warehouse projects typically incorporated in Decision Support
Systems. For this reason, MDM systems sometimes provide a meta-data abstraction
layer. This design provides an entity relationship (ER)-scheme for systems that use the
master data.
MOF the Meta-Object Facility, is an Object Management Group (OMG) standard. MOF
originated in the Unified Modeling Language (UML); the OMG was in need of a
Meta-Modeling architecture to define the UML. MOF is designed as a four-layered
architecture. It provides a meta-meta model at the top layer, aka the M3 layer. This
M3-model is the language used by MOF to build meta-models, called M2-models. The
most prominent example of a Layer 2 MOF model is the UML meta-model, the model
that describes the UML itself. These M2-models describe elements of the M1-layer, and
thus M1-models. These would be, for example, models written in UML. The last layer is
the M0-layer or data layer. It is used to describe application data, and are thus instances
of M1-models.
Beyond the M3-model, MOF describes the means to create and manipulate
(meta-)models by defining CORBA interfaces that describe those operations. Because of
the similarities between the MOF M3-model and UML structure models, MOF
meta-models are usually modeled as UML class diagrams. A supporting standard of MOF
is XMI, which defines an XML-based exchange format for models on the M3-, M2-, or
M1-Layer.
MOF is a closed meta-modelling architecture; it defines an M3-model, which is a model
(or instance) of itself. MOF is a strict meta-modelling architecture; every model element
on every layer is strictly an instance of a model element of the layer above. MOF only
provides a means to define the structure, or abstract syntax of a languages or of data.
Simplified, MOF uses the notion of classes, as known from object orientation, to define
concepts (model elements) on a meta-layer. These classes (concepts) can then be
instantiated through objects (instances) of the model layer below. Due to the fact that an
element on the M2 layer is an object (instance of an M3 model element) as well as a class
(it is an M2 layer concept) the notion of a clabject is used. Clabject is a merge of the
words class and object.
MRP Material Requirements Planning (MRP) is a software based production planning and
inventory control system used to manage manufacturing processes. Although it is not
common nowadays, it is possible to conduct MRP by hand as well.
An MRP system is intended to simultaneously meet 3 objectives:
* Ensure materials and products are available for production and delivery to customers.
* Maintain the lowest possible level of inventory.
* Plan manufacturing activities, delivery schedules and purchasing activities.
The scope of MRP in manufacturing
All manufacturing organizations, whatever it is they produce, face the same daily
practical problem - that customers want products to be available in a shorter time than it
takes to make them. This means that some level of planning is required.
Companies need to control the types and quantities of materials they purchase, plan
which products are to be produced and in what quantities and ensure that they are able to
meet current and future customer demand, all at the lowest possible cost. Making a bad
decision in any of these areas will lose the company money. A few examples are given
below:
* If a company purchases insufficient quantities of an item used in manufacturing, or the
wrong item, they may be unable to meet contracts to supply products by the agreed date.
* If a company purchases excessive quantities of an item, money is being wasted - the
excess quantity ties up cash while it remains as stock and may never even be used at all.
This is a particularly severe problem for food manufacturers and companies with very
short product life cycles. However, some purchased items will have a minimum quantity
that must be met, therefore, purchasing excess is necessary.
* Beginning production of an order at the wrong time can mean customer deadlines being
missed.
MRP is used by many organizations as a tool to deal with these problems. The questions
it provides answers for are: WHAT items are required, HOW MANY are required and
WHEN are they required by. This applies to items that are bought in and to
sub-assemblies that go into more complex items.
ODBC Open Database Connectivity (ODBC) is a standard software API specification for using database management systems (DBMS). ODBC is designed to be independent of
programming language, database system and operating system.
ODBC is an API specification for using SQL queries to access data. An implementation
of ODBC will contain one or more applications, a core ODBC library, and one or more
"database drivers". The core library is independent of the applications and DBMSes, and
acts as an "interpreter" between the applications and the database drivers. The
DBMS-specific details are contained in the database drivers. Thus, it is possible to write
applications that use standard types and features without concern for the specifics of each
DBMS that might be used. Likewise, database driver implementors need only know how
to attach to the core library. This makes ODBC modular.
To write ODBC code that exploits DBMS-specific features requires more advanced
programming. An application must use introspection, calling ODBC metadata functions
that return information about supported features, available types, syntax, limits, isolation
levels, driver capabilities and other information.
ODBC is the foremost example of ubiquitous data access because there are hundreds of
ODBC drivers for a large variety of data sources. ODBC is available for a variety of
operating systems and there are drivers for non-relational data such as spreadsheets, text
and XML files. Because ODBC dates back more than ten years, it offers connectivity to a
wider variety of data sources than other data access APIs. There are more drivers for
ODBC than drivers or providers for newer APIs such as OLE DB, JDBC and ADO.NET.
Despite the benefits of ubiquitous connectivity and platform independence, ODBC has
certain drawbacks. Administering a large number of client machines can involve a
diversity of drivers and DLLs. This complexity can increase system administration
overhead. Large organizations with thousands of PCs have often turned to ODBC server
technology to simplify the administration problem.
The layered architecture of ODBC can introduce a minor performance penalty. The
overhead of executing an additional layer of code is generally insignificant compared to
network latency and other factors that influence query performance. Driver architecture is
also a consideration. Many first-generation ODBC drivers operated with database client
libraries supplied by a DBMS vendor. An ODBC driver for Oracle, for example, would
use Oracle's network library (SQL*Net, Oracle Net) and OCI (Oracle Call Interface)
client library. Similarly, a driver for Sybase or Microsoft SQL Server would use a
vendor-supplied network library to emit Tabular Data Stream (TDS) packets. Those
earlier drivers have been largely supplanted by wire protocol drivers that do not use
database client libraries. The newer type of driver communicates using protocols such as
TDS, TNS (Oracle Transparent Network Substrate), and DRDA without needing database
client libraries.
Differences between drivers and driver maturity are also important issues. Newer ODBC
drivers are often less stable than drivers that have been in production for years. Years of
testing and deployment mean a driver is less likely to contain bugs.
To use DBMS-specific features with ODBC, a developer must understand adaptive
programming techniques such as introspection and writing interoperable SQL statements.
Even when using adaptive techniques, however, some advanced DBMS features might
not be available with ODBC. The ODBC 3.x API is well-suited to traditional SQL
applications such as OLTP but it has not evolved to support richer types introduced by
SQL:1999 and SQL:2003.
Developers needing features or types not accessible with ODBC can use other SQL APIs.
When platform independence is not a goal, developers can use proprietary APIs. If
creating portable, platform-independent code is a goal, developers can use the JDBC API.
OMG Object Management Group (OMG) is a consortium aimed at setting standards in
object-oriented programming as well as system modeling. In 1989, this consortium,
which included Hewlett-Packard Company, IBM Corporation, Apple Computer Inc. and
Sun Microsystems Inc., mobilised to create a cross-compatible distributed object
standard. The goal was a common binary object with methods and data that work using
all types of development environments on all types of platforms. Using a committee of
organisations, OMG set out to create the first Common Object Request Broker
Architecture (CORBA) standard which appeared in 1991. As of March 2003, the latest
standard is CORBA 3.0.
ORB In distributed computing, an object request broker (ORB) is a piece of middleware
software that allows programmers to make program calls from one computer to another,
via a network. An important special case of this is client-server computing, where a client
program calls a server program over a network. ORBs handle the transformation of
in-process data structures to the byte sequence which is transmitted over the network (of
course also the reverse transformation). This is called marshalling or serialization.Some
ORBs, such as CORBA-compliant systems, use an Interface Description Language (IDL)
to describe the data which is to be transmitted on remote calls. Before object-oriented
programming became mainstream, a similar technology called RPC (Remote Procedure
Call) was popular.
In addition to marshalling data, ORBs often expose many more features, such as
distributed transactions, directory services or realtime scheduling.
OTLP (Online Transaction Processing) is a form of transaction processing conducted via
computer network. Some applications of OLTP include electronic banking, order
processing, employee time clock systems, e-commerce, and eTrading.
In large applications, efficient OLTP may depend on sophisticated transaction
management software (such as CICS) and/or database optimization tactics to facilitate the
processing of large numbers of concurrent updates to an OLTP-oriented database.
For even more demanding decentralized database systems, OLTP brokering programs can
distribute transaction processing among multiple computers on a network. OLTP is often
integrated into service-oriented architecture and Web services.
The term Online Transaction Processing is somewhat ambiguous: some understand
"transaction" as a reference to computer or database transactions, while others (such as
the Transaction Processing Performance Council) define it in terms of business or
commercial transactions.
RPC A remote procedure call (RPC) is a protocol that allows a computer program running on
one host to cause code to be executed on another host without the programmer needing to
explicitly code for this. When the code in question is written using object-oriented
principles, RPC is sometimes referred to as remote invocation or remote method
invocation.
RPC is an easy and popular paradigm for implementing the client-server model of
distributed computing. An RPC is initiated by the caller (client) sending a request
message to a remote system (the server) to execute a certain procedure using arguments
supplied. A result message is returned to the caller. There are many variations and
subtleties in various implementations, resulting in a variety of different (incompatible)
RPC protocols.
SFA Sales force management systems are information systems used in marketing and
management that automate some sales and sales force management functions. They are
frequently combined with a marketing information system, in which case they are often
called customer relationship management systems.
Advantages to sales people
Proponents claim that sales force automation systems can improve the productivity of
sales personnel. Here are some examples:
* Rather than write-out sales reports, activity reports, and/or call sheets, sales people can
fill-in prepared e-forms. This saves time.
* Rather than printing out reports and taking them to the sales manager, sales people can
use the company intranet to transmit the information. This saves time.
* Rather than waiting for paper based product inventory data, sales prospect lists, and
sales support information, they will have access to the information when they need it.
This could be useful in the field when answering prospects’ questions and objections.
* The additional tools could help improve sales staff morale if they reduce the amount of
record keeping and/or increase the rate of closing. This could contribute to a virtuous
spiral of beneficial and cumulative effects.
* These sales force systems can be used as an effective and efficient training device. They
provide sales staff with product information and sales technique training without them
having to waste time at seminars.
* Better communication and co-operation between sales personnel facilitates successful
team selling.
* More and better qualified sales leads could be automatically generated by the software.
* This technology increases the sales person’s ratio of selling time to non-selling time.
Non-selling time includes activities like report writing, travel time, internal meetings,
training, and seminars.
Advantages to the sales manager
Sales force automation systems can also affect sales management. Here are some
examples:
*The sales manager, rather than gathering all the call sheets from various sales people and
tabulating the results, will have the results automatically presented in easy to understand
tables, charts, or graphs. This saves time for the manager.
* Activity reports, information requests, orders booked, and other sales information will
be sent to the sales manager more frequently, allowing him/her to respond more directly
with advice, product in-stock verifications, and price discount authorizations. This gives
management more hands-on control of the sales process if they wish to use it.
* The sales manager can configure the system so as to automatically analyze the
information using sophisticated statistical techniques, and present the results in a
user-friendly way. This gives the sales manager information that is more useful in :
o Providing current and useful sales support materials to their sales staff
o Providing marketing research data : demographic, psychographic, behavioural,
product acceptance, product problems, detecting trends
o Providing market research data : industry dynamics, new competitors, new
products from competitors, new promotional campaigns from competitors,
macro-environmental scanning, detecting trends
o Co-ordinate with other parts of the firm, particularly marketing, production, and
finance
o Identifying your most profitable customers, and your problem customers
o Tracking the productivity of their sales force by combining a number of
performance measures such as : revenue per sales person, revenue per territory,
margin by product category, margin by customer segment, margin by customer,
number of calls per day, time spent per contact, revenue per call, cost per call,
entertainment cost per call, ratio of orders to calls, revenue as a percentage of
sales quota, number of new customers per period, number of lost customers per
period, cost of customer acquisition as a percentage of expected lifetime value of
customer, percentage of goods returned, number of customer complaints, and
number of overdue accounts. More complex models like the PAIRS model (by
Parasuraman and Day) and the Call Plan model (by Lodish) can also be used.
Advantages to the marketing manager
It is also claimed to be useful for the marketing manager. It gives the marketing manager
information that is useful in :
* Understanding the economic structure of your industry
* Identifying segments within your market
* Identifying your target market
* Identifying your best customers
* Doing marketing research to develop profiles (demographic, psychographic, and
behavoural) of your core customers
* Understanding your competitors and their products
* Developing new products
* Establishing environmental scanning mechanisms to detect opportunities and threats
* Understanding your company's strengths and weaknesses
* Auditing your customers' experience of your brand in full
* Developing marketing strategies for each of your products using the marketing mix
variables of price, product, distribution, and promotion
* Co-ordinating the sales function with other parts of the promotional mix (such as
advertising, sales promotion, public relations, and publicity)
* Creating a sustainable competitive advantage
* Understanding where you want your brands to be in the future, and providing an
empirical basis for writing marketing plans on a regular basis to help you get there
* Providing input into feedback systems to help you monitor and adjust the process
Strategic advantages
Sales force automation systems can also create competitive advantage. Here are some
examples:
* As mentioned above, productivity will increase. Sales staff will use their time more
efficiently and more effectively. The sales manager will also become more efficient and
more effective.(see above) This increased productivity can create a competitive advantage
in three ways: it can reduce costs, it can increase sales revenue, and it can increase market
share.
* Field sales staff will send their information more frequently. Typically information will
be sent to management after every sales call (rather than once a week). This provides
management with current information, information that they will be able to use while it is
still valuable. Management response time will be greatly reduced. The company will
become more alert and more agile.
* These systems could increase customer satisfaction if they are used with wisdom. If the
information obtained and analyzed with the system is used to create a product that
matches or exceeds customer expectations, and the sales staff use the system to service
customers more expertly and diligently, then customers should be satisfied with the
company. This will provide a competitive advantage because customer satisfaction leads
to increased customer loyalty, reduced customer acquisition costs, reduced price elasticity
of demand, and increased profit margins.
Disadvantages
Detractors claim that sales force management systems are:
* difficult to work with
* require additional work inputting data
* dehumanize a process that should be personal
* require continuous maintenance, information updating, and system upgrading
* costly
* difficult to integrate with other management information systems
Encouraging use
For all the reasons stated above many organisations have found it difficult to persuade
sales people to enter data into the system. For this reason many have questioned the value
of the investment. Recent developments have embedded sales process systems that give
something back to the seller within the CRM screens. Because these systems help the
sales person plan and structure their selling in the most effective way they give a reason
to use the CRM.
SGML The Standard Generalized Markup Language (SGML) is a metalanguage in which
one can define markup languages for documents. SGML is a descendant of IBM's
Generalized Markup Language (GML), developed in the 1960s by Charles Goldfarb,
Edward Mosher and Raymond Lorie (whose surname initials also happen to be GML).
SGML should not be confused with the Geography Markup Language (GML) developed
by the Open GIS Consortium; cf, or the Game Maker scripting language, GML.
SGML provides a variety of markup syntaxes that can be used for many applications. By
changing the SGML Declaration one does not even need to use "angle brackets" although
they are the norm, the so-called concrete reference syntax.
SGML was originally designed to enable the sharing of machine-readable documents in
large projects in government, legal and the aerospace industry, which have to remain
readable for several decades—a very long time in information technology. It has also
been used extensively in the printing and publishing industries, but its complexity has
prevented its widespread application for small-scale general-purpose use.
SGML is an ISO standard: "ISO 8879:1986 Information processing—Text and office
systems—Standard Generalized Markup Language (SGML)".
SIC (code) The Standard Industrial Classification was a United States government system for classifying industries by a four-digit code. Established in the 1930s, it was supplanted by
the six-digit North American Industry Classification System in 1997.
SOA In computing, the term Service-Oriented Architecture (SOA) expresses a software
architectural concept that defines the use of services to support the requirements of
software users. In a SOA environment, nodes on a network[1] make resources available
to other participants in the network as independent services that the participants access in
a standardized way. Most definitions of SOA identify the use of Web services (i.e. using
SOAP or REST) in its implementation. However, one can implement SOA using any
service-based technology. The OASIS SOA Reference Model Technical Committee is
working on defining SOA independent of any specific technologies.
Unlike traditional point-to-point architectures, SOAs comprise loosely coupled, highly
interoperable application services. These services interoperate based on a formal
definition independent of the underlying platform and programming language (e.g.,
WSDL) . The interface definition encapsulates (hides) the vendor and language-specific
implementation. A SOA is independent of development technology (such as Java and
.NET). The software components become very reusable because the interface is defined
in a standards-compliant manner. So, for example, a C# (C Sharp) service could be used
by a Java application.
SOA provides a methodology and framework for documenting enterprise capabilities and
can support integration and consolidation activities.
SOAP is a protocol for exchanging XML-based messages over a computer network, normally
using HTTP. SOAP forms the foundation layer of the web services stack, providing a
basic messaging framework that more abstract layers can build on. SOAP facilitates the
Service-Oriented architectural pattern.
There are several different types of messaging patterns in SOAP, but by far the most
common is the Remote Procedure Call (RPC) pattern, where one network node (the
client) sends a request message to another node (the server), and the server immediately
sends a response message to the client.
SOAP originally was an acronym for Simple Object Access Protocol, but the acronym
was dropped in Version 1.2 of the SOAP specification. Originally designed by Dave
Winer, Don Box, Bob Atkinson, and Mohsen Al-Ghosein in 1998 with backing from
Microsoft (where Atkinson and Al-Ghosein worked at the time), the SOAP specification
is currently maintained by the XML Protocol Working Group of the World Wide Web
Consortium.
Transport methods
HTTP was chosen as the primary application layer protocol for SOAP since it works well
with today's Internet infrastructure; specifically, SOAP works well with network
firewalls. This is a major advantage over other distributed protocols like GIOP/IIOP or
DCOM which are normally filtered by firewalls.
XML was chosen as the standard message format because of its widespread acceptance
by major corporations and open source development efforts. Additionally, a wide variety
of freely available tools significantly ease the transition to a SOAP-based
implementation.
The somewhat lengthy syntax of XML can be both a benefit and a drawback. Its format is
easy for humans to read, but can be complex and slow down processing times. For
example, CORBA, GIOP and DCOM use much shorter, binary message formats. On the
other hand, hardware appliances are available to accelerate processing of XML messages.
Binary XML (the use of the word "XML" is controversial here) is also being explored as
a means for streamlining the throughput requirements of XML.
Structure of a SOAP message
A SOAP message is contained in an envelope. Within this envelope are two additional
sections: the header and the body of the message. SOAP messages use XML namespaces.
The header contains relevant information about the message. For example, a header can
contain the date the message is sent, or authentication information. It is not required, but,
if present, must always be included at the top of the envelope.
SQL (short for Structured Query Language) is the most popular computer language used to
create, modify and retrieve data from relational database management systems. The
language has evolved beyond its original purpose to support object-relational database
management systems. It is an ANSI/ISO standard.
STP Straight Through Processing (STP) enables the entire trade process for capital markets
and payments transactions to be conducted electronically without the need for re-keying
or manual intervention, subject to legal and regulatory restrictions. The concept has also
been transferred into other asset classes including energy (oil, gas) trading.
Presently, the entire trade lifecycle, from initiation to settlement, is a complex labyrinth
of manual processes, taking several days. STP is at least 'same-day' or faster, ideally
minutes or even seconds. The goal to minimise settlement risk is for the execution of a
trade and its settlement and clearing to occur simultaneously. However, for this to be
achieved, multiple market participants must realise high levels of STP. In particular,
transaction data would need to be made available on a just-in-time basis which is a
considerably harder goal to achieve for the financial services community than the
application of STP alone. After all, STP itself is merely an efficient utilisation of
computer-based technology to transaction processing.
Historically, STP solutions were needed to help financial markets firms meet the move to
one-day trade settlement of equities transactions, as well as to meet the global demand
that had resulted from the explosive growth of online trading. Now the concepts of STP
are applied to reduce systemic and operational risk and to improve certainty of settlement
and minimize operational costs.
When fully realized, STP will provide asset managers, broker/dealers, custodians and
other financial services players with tremendous benefits, including greatly shortened
processing cycles, reduced settlement risk and lower operating costs. Some industry
analysts believe that STP is not an achievable goal in the sense that firms are unlikely to
find the cost/benefit to reach 100% automation. Instead they promote the idea of
improving levels of internal STP within a firm while encouraging groups of firms to work
together to improve the quality of the automation of transaction information between
themselves, either bilaterally or as a community of users (external STP).
UML Unified Modeling Language (UML) is a non-proprietary, object modeling and
specification language used in software engineering.
UML is not restricted to modeling software. As a graphical notation, UML can be used
for modeling hardware (engineering systems) and is commonly used for business process
modeling, systems engineering modeling, and representing organizational structure.
UML was designed to be used to specify, visualize, construct, and document the artifacts
of an object-oriented software-intensive system under development. It represents an
integrated compilation of best engineering practices that have proven to be successful in
modeling large, complex systems, especially at the architectural level.
XMI The XML Metadata Interchange (XMI) is an OMG standard for exchanging metadata
information via Extensible Markup Language (XML). It can be used for any metadata
whose metamodel can be expressed in Meta-Object Facility (MOF). The most common
use of XMI is as an interchange format for UML models, although it can also be used for
serialization of models of other languages (metamodels).
In the OMG vision of modeling, data is split into abstract models and concrete models.
The abstract models represent the semantic information, whereas the concrete models
represent visual diagrams. Abstract models are instances of arbitrary MOF-based
modeling languages such as UML. For diagrams, the Diagram Interchange (DI, XMI[DI])
standard is used. At the moment there are severe incompatibilities between different
modeling tool vendor implementations of XMI, even between interchange of abstract
model data. The usage of Diagram Interchange is almost nonexistent. Unfortunately this
means exchanging files between UML modeling tools using XMI is rarely possible.
XML The Extensible Markup Language (XML) is a W3C-recommended general-purpose
markup language for creating special-purpose markup languages, capable of describing
many different kinds of data. It is a simplified subset of SGML. Its primary purpose is to
facilitate the sharing of data across different systems, particularly systems connected via
the Internet. Languages based on XML (for example, RDF/XML, RSS, MathML,
XHTML, SVG, and cXML) are defined in a formal way, allowing programs to modify
and validate documents in these languages without prior knowledge of their form.
Features of XML
XML provides a text-based means to describe and apply a tree-based structure to
information. At its base level, all information manifests as text, interspersed with markup
that indicates the information's separation into a hierarchy of character data,
container-like elements, and attributes of those elements. In this respect, it is similar to
the LISP programming language's S-expressions, which describe tree structures wherein
each node may have its own property list.
The fundamental unit in XML is the character, as defined by the Universal Character Set.
Characters are combined in certain allowable combinations to form an XML document.
The document consists of one or more entities, each of which is typically some portion of
the document's characters, encoded as a series of bits and stored in a text file.
The ubiquity of text file authoring software (word processors) facilitates rapid XML
document authoring and maintenance, whereas prior to the advent of XML, there were
very few data description languages that were general-purpose, Internet protocol-friendly,
and very easy to learn and author. In fact, most data interchange formats were proprietary,
special-purpose, "binary" formats (based foremost on bit sequences rather than
characters) that could not be easily shared by different software applications or across
different computing platforms, much less authored and maintained in common text
editors.
By leaving the names, allowable hierarchy, and meanings of the elements and attributes
open and definable by a customizable schema, XML provides a syntactic foundation for
the creation of custom, XML-based markup languages. The general syntax of such
languages is rigid — documents must adhere to the general rules of XML, assuring that
all XML-aware software can at least read (parse) and understand the relative arrangement
of information within them. The schema merely supplements the syntax rules with a set
of constraints. Schemas typically restrict element and attribute names and their allowable
containment hierarchies, such as only allowing an element named 'birthday' to contain 1
element named 'month' and 1 element named 'day', each of which has to contain only
character data. The constraints in a schema may also include data type assignments that
affect how information is processed; for example, the 'month' element's character data
may be defined as being a month according to a particular schema language's
conventions, perhaps meaning that it must not only be formatted a certain way, but also
must not be processed as if it were some other type of data.
In this way, XML contrasts with HTML, which has an inflexible, single-purpose
vocabulary of elements and attributes that, in general, cannot be repurposed. With XML,
it is much easier to write software that accesses the document's information, since the
data structures are expressed in a formal, relatively simple way.
XML makes no prohibitions on how it is used. Although XML is fundamentally
text-based, software quickly emerged to abstract it into other, richer formats, largely
through the use of datatype-oriented schemas and object-oriented programming
paradigms (in which the document is manipulated as an object). Such software might
only treat XML as serialized text when it needs to transmit data over a network, and some
software doesn't even do that much. Such uses have led to "binary XML", the relaxed
restrictions of XML 1.1, and other proposals that run counter to XML's original spirit and
thus garner an amount of criticism.
Strengths and weaknesses
Some features of XML that make it well-suited for data transfer are:
* its simultaneously human- and machine-readable format;
* it has support for Unicode, allowing almost any information in any human language to
be communicated;
* the ability to represent the most general computer science data structures: records, lists
and trees;
* the self-documenting format that describes structure and field names as well as specific
values;
* the strict syntax and parsing requirements that allow the necessary parsing algorithms to
remain simple, efficient, and consistent.
XML is also heavily used as a format for document storage and processing, both online
and offline, and offers several benefits:
* its robust, logically-verifiable format is based on international standards;
* the hierarchical structure is suitable for most (but not all) types of documents;
* it manifests as plain text files, unencumbered by licenses or restrictions;
* it is platform-independent, thus relatively immune to changes in technology;
* it and its predecessor, SGML, have been in use since 1986, so there is extensive
experience and software available.
For certain applications, XML also has the following weaknesses:
* Its syntax is fairly verbose and partially redundant. This can hurt human readability and
application efficiency, and yields higher storage costs. It can also make XML difficult to
apply in cases where bandwidth is limited, though compression can reduce the problem in
some cases. This is particularly true for multimedia applications running on cell phones
and PDAs which want to use XML to describe images and video.
* Parsers should be designed to recursively handle arbitrarily nested data structures and
must perform additional checks to detect improperly formatted or differently ordered
syntax or data (this is because the markup is descriptive and partially redundant, as noted
above). This causes a significant overhead for most basic uses of XML, particularly
where resources may be scarce - for example in embedded systems. Furthermore,
additional security considerations arise when XML input is fed from untrustworthy
sources, and resource exhaustion or stack overflows are possible.
* Some consider the syntax to contain a number of obscure, unnecessary features born of
its legacy of SGML compatibility. However, an effort to settle on a subset called
"Minimal XML" led to the discovery that there was no consensus on which features were
in fact obscure or unnecessary.
* The basic parsing requirements do not support a very wide array of data types, so
interpretation sometimes involves additional work in order to process the desired data
from a document. For example, there is no provision in XML for mandating that
"3.14159" is a floating-point number rather than a seven-character string. XML schema
languages add this functionality.
* Modeling overlapping (non-hierarchical) data structures requires extra effort.
* Mapping XML to the relational or object oriented paradigms is often cumbersome.
* Some have argued that XML can be used as a data storage only if the file is of low
volume, but this is only true given particular assumptions about architecture, data,
implementation, and other issues.
|