A classical systems and
software engineering approach is recommended to assure the development of a
management information system that is fully responsive to a client's
performance objectives and resource constraints. This approach includes the
following major components:
o Systems analysis, which
includes information needs assessment, requirements analysis, and requirements
specification
o Systems design, which
includes synthesis of alternatives, cost-effectiveness analysis of
alternatives, specification of criteria for selecting a preferred alternative,
selection of a preferred alternative, top-level design, and detailed design
o Systems implementation,
which includes forms development, specification of data collection and entry
procedures, development of editing and quality control procedures, software
coding and testing, development of training materials and training, integration
of the software components with other system components (e.g., personnel,
communications, data transfer and assembly, report preparation and
distribution, feedback), and system-level testing
o Systems operation and
support, which includes not only routine operating procedures but also
provision for on-going system financing and management, quality control,
software maintenance and updating, personnel training, and system maintenance
and improvement (including periodic review of system performance and diagnosis
and correction of problems) While the preceding system development phases are
completed in sequence, there is some time overlap between them. The following
paragraphs discuss aspects of each of the above major components. Our approach
to management information system design is based on the modern
2. Software/system
engineering discipline, which consists of structured analysis and structured
design (top-down design). (See the list of references for several books on the
modern systems and software engineering discipline.) The first step in an MIS
development task is the development of an MIS management plan, which
describes the major tasks
and schedule of work for the MIS activity.
Systems
Analysis
Systems analysis includes
a review of the present information system to assess its capabilities and
shortcomings; specification of system goals, objectives, and constraints; a
survey of potential system users to assess their information needs;
identification and analysis of alternative system concepts; specification a
system concept; and system requirements analysis and specification. This phase
includes an analysis of major system functions and the development of a system
architecture (identification of the major system components and their
interrelationships). Heavy emphasis is placed on end-user requirements. It is
essential to involve the end-user in the system requirements activity, to
insure the development of a system that is fully responsive the user's needs.
The review of the current system and survey of potential users can be done by a
variety of means, including review of documentation, site visits, questionnaire
surveys, interviews, and focus-group discussions.
Systems
Design
The systems design phase
is generally broken into two subphases, top-level design and detailed design.
Top-level design consists of the identification of the major system components
and their functions. In order to specify the top-level design, a number of
alternative system design concepts are synthesized and evaluated in terms of a
variety of selection criteria, which include cost (implementation, operation
and maintenance), performance, satisfaction of requirements, development risk,
flexibility for expansion/upgrading, and political acceptability. The important
aspect of top-level design is to present several feasible solutions to the
system managers and users, to describe their advantages and disadvantages, and
to obtain a consensus on a preferred design concept. An example of a design
decision is the decision concerning which functions should be implemented using
computers and which should be manual (e.g., should data collected at a regional
level and needed at a central level be transmitted via the Internet (e.g.,
virtual private network or e-mail) or hand-carried on a memory stick).
Detailed design consists
of specifying all of the system components and functions in detail. In the
detailed design phase, decisions are made concerning what data elements are to
be collected, how they are to be coded, how frequently they are to be
collected, and at what levels of detail they are to be aggregated. A critical
design decision concerns the "units of analysis" -- the item on which
the data are to be measured, such as an individual, a household, a school, a
clinic, a farm, a village, or a region. The decision on the unit of analysis
has a significant impact on both the cost of the system operation (especially
the data collection burden) and on the flexibility of ad-hoc reporting. This
design decision is particularly important. While it is an easy matter to revise
a data entry screen or report format, it is not possible to produce a desired
report about a particular type of unit if data on that unit are not included in
the data base. For example, if it is desired to produce
a report about the
frequency distribution of villages by some characteristic, village-level data
must be included in the data base (or capable of being constructed by
aggregation of lower-level units). If it is desired to produce a frequency
distribution of facilities by size, facility data must be included. If it is
desired to produce distributions of families with particular characteristics,
data on families must be included in the data base. It may not be practical to
include data in the database on all of the members of a population. If a
population is very large (e.g., families, households), consideration should be
given to the use of sampling procedures for data collection and reporting. A
major advantage of most management information systems, however, is that they
include the total population of interest, so that statistical analysis is not
required to analyze the data. For a software subsystem, the structured analysis
/ structured design approach involves the use of techniques such as data flow
diagrams, functional decompositions, and structure charts. Since we
recommend making heavy use
of fourth-generation database management software, the amount of detail
depicted in the detailed software design is generally minimal.
The detailed design phase
also identifies the initial reports to be produced by the system (reporting
levels, frequency, content, and format). With fourth-generation database software
it is an easy matter to change reports or develop new reports, so the
specification of the output reports is not critical (since it will almost
surely change over time). The amount of effort expended in detailed software
design for management information systems is often not very great, for two
reasons. First, through the use of fourth-generation software it is relatively
easy to implement modifications of report content and format (as long as the
necessary data are available). Second, it is recommended to adopt a
rapid-prototyping approach to the
software development.
Fourth-generation languages are ideally suited to this approach, which consists
of developing an initial version of the software, testing it, modifying it, and
then producing a second, improved, version. This iterative process is repeated
one or more times until a desired version is obtained. With the
rapid-prototyping approach, the "design" is continually evolving, and
a minimum amount of effort is expended in documenting each of the prototypes.
The system design phase specifies what computer equipment is to be used.
Because of the very high computing power (fast speed, large memory, long
word-length) of current-day
microcomputers, the
large capacity of hard drives, the
tremendous variety and capabilities of available application software, and the
reasonable cost of hardware and software, current microcomputer-based systems
will be able to accomplish many of the desired system requirements at
acceptable cost and level of complexity. Because of the large diversity of
choice, however, and because the acquisition and training costs are not
negligible, it is necessary to carefully consider the alternatives and make a
good selection. Experience with a wide variety of software and hardware is a
valuable asset in guiding the hardware/software selection process. The
significant processing capabilities of microcomputers makes them appropriate
candidates for many practical MIS applications. Major categories of software
involved in a microcomputer-based
MIS system are database
software, spreadsheet / presentation graphics, and statistical analysis packages. Depending on the system size and
number and location of users, networking may be a useful option. Larger applications
may exceed the processing capabilities of microcomputer-based systems, in which
case larger (e.g., minicomputer-based) systems may be appropriate.
The following paragraphs
mention some practical aspects of system design. System Size. Modern
microcomputers are so powerful that most MIS applications can be done using
commercial off-the-shelf (COTS) microcomputers (e.g., those using the Microsoft
Windows operating system). Previously, a major constraint on system size was
the computer word length.
For example,
microcomputers using a 32-bit word length could express single-precision
integers
only as large as
2,147,483,648, and the maximum file size was also this size (two gigabytes).
Scientific computers used
64-bit word lengths back in the 1960s (e.g., Control Data Corporation
computers), whereas
commercial computers (e.g., IBM, Sun) have been very slow to use large
word lengths. Now that
desktop microcomputers have moved to 64-bit word lengths, the word
length is no longer the
constraint that it once was. A desktop computer is now sufficiently powerful
to handle almost all MIS
applications, from the point of view of data precision, file size, and
processing power. About
the only reason for using a large database management information
system such as Oracle is
advanced features, such as security. Small systems such as Microsoft
Access or SQL server can
accommodate most applications – they are just as fast as the larger
systems and far less
expensive. A major factor affecting database design is the organizational
scale of the application –
whether it is a “desktop” application or an “enterprise” application.
Data Models. When someone
speaks of a “database” today, he is almost always referring to one
based on a “relational”
model. There are several types of database models, including indexedsequential,
network, and relational.
For general-purpose applications, such as management
information systems, the
relational model is almost always used. Relational databases are based
on a mathematical
framework (“relational calculus”) that makes it easy to maintain data integrity
and perform ad-hoc
queries. The only serious drawback of relational databases is that they are
not very fast. They are
generally not good choices for large-scale, real-time transaction
processing systems, such
as an airline reservation system. For a manager sitting at his desk, the
few seconds required to
execute a query or generate a report is not a problem, but for a massive
system with thousands of
concurrent users, the system simply would not function.
In order to use a relational
database system, such as Microsoft Access or Oracle, it is necessary
to structure the data in a
certain form. This process is called “normalization.” There are several
degrees of normalization,
but the one most commonly used is called “third normal form.” The data
are stored in a set of
tables. The table rows are generally called “records,” and the table columns
are generally called
“fields.” It is necessary that the entities stored in the database have unique
identifiers, called
“keys.” For a database to be in first normal form, each field is “indivisible”
– it
contains a unique type of
information, and there are no “repeating groups” of information. For
example, one field would
not contain a person’s complete address (street address plus city) and
another field his city –
the address would be split into two fields, one containing the street address
and another containing the
city. Furthermore, a record would not contain multiple fields
representing successive
product sales – each sale would be stored in a “Sales” table, identified by
an appropriate key
(linking it to other data tables). For a database to be in second normal form,
each table must have a
unique identifier, or primary key, that is comprised of one or more fields in
the table. In the simplest
case, the key is a single field that uniquely identifies each record of the
table. (When the value of
a primary key of a table is stored in another table, it is called a
“secondary key.”) The
non-key fields provide information only about the entity defined by the key
and the subject of the
table, and not about entities in other tables. For example, data about the
population of the city in
which a person lives would not be included in a table of house addresses,
because city population is
an attribute of cities, not of house addresses. For a database to be in
third normal form, each
non-key field must be functionally dependent on the key, that is, the values
of the key completely
determine the values of the non-key fields. Said another way, each non-key
field is relevant to the
record identified by the key and completely describes the record, relative to
5
the topic of the table.
Most commercial relational database systems require that the database be
in third normal form. In
some cases, to improve efficiency, a database may be denormalized (e.g.,
by storing an element of
data in two fields), but this should be rarely done (and code should be
included in the database
to ensure that data integrity is maintained (e.g., in this example by
updating the redundant
data on a regular basis, and making sure that all queries and reports use
the latest data)).
Selection of Software. For
most applications, the choice is between using a very expensive
system, such as Oracle, or
a very inexpensive one, such as Microsoft Access or SQL Server.
Another option is to use
open-source (free) software, such as MySQL.
For most applications, the
Microsoft Access database system is a good choice. Microsoft
introduced its Access
database management system in 1997, and introduced major upgrades in
2000 and 2003. The 2000
version was prone to fail (catastrophic “freezes” with no way to repair
the system but to go back
to a much earlier version) for larger applications, but the 2003 version
was quite stable. The
newer versions introduced in 2005 and 2007 are very similar to the 2003
version. Until recently,
Microsoft included Access as one of the modules of its Office suite (with
Word word processor, Excel
electronic spreadsheet, PowerPoint presentation slides, and Outlook
e-mail system). The cost
of the entire suite was only a few hundred dollars, so the cost of the
Access system was
essentially zero (since few purchasers used this module, and purchased the
suite for the other
programs). Recently, Microsoft has “unbundled” the Access system, so that it
must be purchased
separately in some instances. Even so, the cost is just a few hundred dollars,
for a standalone or
small-network system.
Recently, Microsoft has
decided to discontinue further development of Access, with the goal of
shifting users to the SQL
Server. A free version of SQL Server (SQL Server 2007 Express
Edition) is available, but
the cost of the full product is much higher than Access. This is a very
unfortunate move for
consumers, given the tremendous power and ease-of-use of Access. After
ten years of development
(from its introduction in 1997), it had evolved to a reliable, easy-to-use
system, and it is a shame
to see Microsoft withdrawing support for it. SQL Server is not a good
substitute for Access. It
may be more powerful in some respects, but it is much less easy to use
(and more expensive).
Although open-source
software is free to acquire, there are other costs involved. There are two
major drawbacks associated
with it. First, the major open-source database system is MySQL, and
it is a very “primitive”
system. It is programmed using the SQL language, and does not contain the
useful database
development tools that Microsoft and other proprietary systems contain. It is
hard
to use. The second
drawback is that the pool of personnel who are familiar with open-source
software such as MySQL is
much smaller than the pool of personnel who are familiar with
Microsoft systems such as
Access. In a developing-country context, this can present serious
problems. A firm will
typically have a more difficult time recruiting information-technology staff
who
are familiar with
open-source systems than with Microsoft systems. This situation will result in
higher staffing and
training costs.
Query Design. The major
factor affecting database performance is the quality of the design of the
queries (procedures used
to retrieve data). In a relational database system, queries are
implemented using the SQL
programming language. Systems such as Access have automated
tools for constructing
queries, called “query builders.” If a query is not properly constructed, the
time to retrieve the data
could be many hours instead of a few seconds. I was once asked to
consult on a project in
Egypt, where the performance of a database had declined over the two
years since its
installation to the point where routine queries that had once taken seconds to
do
6
now took hours. The
problem was that the developer had tested the system on a small data set,
and had not examined the
system performance for large (simulated) data sets. As the set
increased over the years
(as data were added to the database), the time required to retrieve data
increased to completely
unacceptable levels. I had a similar experience in Malawi, where the
simple process of entering
data (done by a few data-entry clerks) was bringing the system (a
state-of-the art dual
processor computer and an Informix application) to its knees.
There are two things that
must be done, in order to keep query times fast. The first is to create
“indexes” for keys that are
used in the processing of data. The second is to perform all “selects”
and “aggregates” before
performing any table “joins.” (A “select” is the process of selecting
records from a table; an
“aggregation” is the process of aggregating data in a table, such as
calculating totals or
means of subsets; a “join” is the process of combining data from different
tables (which are related
by keys). The process of joining tables can be very time-consuming for
large tables. In an
uncritical approach to constructing a query, the user may construct a single
SQL statement that
includes the selection, aggregation and joining. For large tables, such queries
can require much time,
even if the keys are indexed. It is important to break the query into a
series of simpler queries,
such that the join is made on as small a table as possible. Most queries
do not involve accessing
all of the data in the database. If indexing is done and if selections and
aggregations are performed
before joins, the tables to be joined are often small, and the query is
very fast. Expensive
database systems such as Oracle make a big deal out of “optimizing” SQL
statements, and they
charge massive amounts of money for such systems. This is wasted money.
In all cases, an
analytically minded database user can break a query into a series of simpler
queries and accomplish the
same level of performance.
There are many other
factors to be considered in conducting the detailed design of a management
information system, such
as whether data should be archived year by year or included in a single
large database (to
facilitate time series analysis); whether the data should be entered at a
central
facility or on-line via
many distributed workstations (e.g., in different districts); procedures to be
used for primary data
collection (e.g., questionnaires); data-cleaning routines; and security.
Systems
Implementation
Systems implementation
consists of developing all of the system components -- data collection forms;
data collection, transfer and processing procedures; data entry procedures and
screens (including on-line edit checking); software; report forms; report
distribution; quality control procedures. As mentioned, we recommend the use of
an iterative, rapid-prototyping approach to the software implementation. It is
highly recommended to field-test major systems in a single geographic area
before going full scale. This field testing involves not only software, but all
aspects of the system (e.g., data collection procedures, training, quality
control). The importance of allowing for prototyping and field testing cannot
be minimized. A key problem faced in developing countries is data integrity.
From previous experience, we know that users become much more interested in
data integrity after seeing the data they reported in a printout. The system
procedures will allow for regular feedback of data to the source levels, for
review and correction. In a developing country, a considerable amount of time
must be allowed for implementation. Data that are taken for granted in a highly
developed country often do not exist in a developing country. For example, in a
recent MIS development effort in Egypt, there did not exist a correct, complete
list of settlements. Since
settlement-level statistics were desired (e.g., percentage of settlements
without piped water), it was necessary to construct a settlement list before
the data collection effort could begin. Because of the many uses to which an
MIS may be applied, all of the collected data may not be included in a single
file. Instead, there may be a variety of data files, involving a variety of
units of analysis. For example, there may be national-level, regional-level,
and local-level files; facility files, personnel files, program files and
client files. Data from these files may be aggregated and combined as desired,
as long as there is a "key" (identifier) linking the file records. An
essential element in linking geographically-oriented data files is a the
availability of a geographic code, or "geocode." With a suitably
defined geocode (containing subparts referring to the political subdivisions),
it is possible to perform a wide variety of geographic-oriented analyses. Many
useful analyses can be readily performed with a standard data base management.
Many more can be efficiently conducted if a microcomputer-based geographic
information system (GIS) is available. The decision to require the use of
geocodes is an example of a system design decision that has a profound effect
on the kind of questions that can be answered (i.e., the type of ad-hoc reports
that can be generated) after the system implementation effort is completed.
System
Operation and Support
System support includes
all of the resources required to operate, maintain, and improve the system. A
crucial aspect of the system operation phase is the fact that the MIS design
team will, at some point, be ending its involvement in the MIS. At that point,
it is important that the client have the personnel know-how and resources to
continue operation of the system, maintain it, and improve it, without further
assistance from the contractor. Because of the known eventual departure of the
system development contractor, it is necessary to develop a system that is easy
to use and improve, and that will not collapse with the departure of a key
government person. To accomplish this goal, training materials and procedures
will be developed that can be used to assure the continued operation of the
system in the future.
No comments:
Post a Comment