MDP on A Spreadsheet and Statistical Software Approach for Business Analysis and Optimization

MDP on A Spreadsheet and Statistical Software Approach for Business Analysis and Optimization
MDP on A Spreadsheet and Statistical Software Approach for Business Analysis and Optimization

Spreadsheet

spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. The term spreadsheet may also refer to one such electronic document.

Spreadsheet users can adjust any stored value and observe the effects on calculated values. This makes the spreadsheet useful for “what-if” analysis since many cases can be rapidly investigated without manual recalculation. Modern spreadsheet software can have multiple interacting sheets and can display data either as text and numerals or in graphical form.

Besides performing basic arithmetic and mathematical functions, modern spreadsheets provide built-in functions for common financial accountancy and statistical operations. Such calculations as net present value or standard deviation can be applied to tabular data with a pre-programmed function in a formula. Spreadsheet programs also provide conditional expressions, functions to convert between text and numbers, and functions that operate on strings of text.

Spreadsheets have replaced paper-based systems throughout the business world. Although they were first developed for accounting or bookkeeping tasks, they now are used extensively in any context where tabular lists are built, sorted, and shared.

Business analysis

Business analysis is a professional discipline focused on identifying business needs and determining solutions to business problems. Solutions may include a software-systems development component, process improvements, or organizational changes, and may involve extensive analysis, strategic planning and policy development. A person dedicated to carrying out these tasks within an organization is called a business analyst or BA.

Business analysts are not found solely within projects for developing software systems. They may also work across the organisation, solving business problems in consultation with business stakeholders. Whilst most of the work that business analysts do today relates to software development / solutions, this is due to the ongoing massive changes businesses all over the world are experiencing in their attempts to digitize.

Although there are different role definitions, depending upon the organization, there does seem to be an area of common ground where most business analysts work. The responsibilities appear to be:

  • To investigate business systems, taking a holistic view of the situation. This may include examining elements of the organisation structures and staff development issues as well as current processes and IT systems.
  • To evaluate actions to improve the operation of a business system. Again, this may require an examination of organisational structure and staff development needs, to ensure that they are in line with any proposed process redesign and IT system development.
  • To document the business requirements for the IT system support using appropriate documentation standards.

In line with this, the core business analyst role could be defined as an internal consultancy role that has the responsibility for investigating business situations, identifying and evaluating options for improving business systems, defining requirements and ensuring the effective use of information systems in meeting the needs of the business.

Software development process

In software engineering, a software development process is a process of dividing software development work into smaller, parallel, or sequential steps or sub-processes to improve design and/or product management. It is also known as a software development life cycle (SDLC). The methodology may include the pre-definition of specific deliverables and artifacts that are created and completed by a project team to develop or maintain an application.

Most modern development processes can be vaguely described as agile. Other methodologies include waterfall, prototyping, iterative and incremental development, spiral development, rapid application development, and extreme programming.

A life-cycle “model” is sometimes considered a more general term for a category of methodologies and a software development “process” a more specific term to refer to a specific process chosen by a specific organization. For example, there are many specific software development processes that fit the spiral life-cycle model. The field is often considered a subset of the systems development life cycle.

Prototyping

Software prototyping is about creating prototypes, i.e. incomplete versions of the software program being developed.

The basic principles are:

  • Prototyping is not a standalone, complete development methodology, but rather an approach to try out particular features in the context of a full methodology (such as incremental, spiral, or rapid application development (RAD)).
  • Attempts to reduce inherent project risk by breaking a project into smaller segments and providing more ease-of-change during the development process.
  • The client is involved throughout the development process, which increases the likelihood of client acceptance of the final implementation.
  • While some prototypes are developed with the expectation that they will be discarded, it is possible in some cases to evolve from prototype to working system.

A basic understanding of the fundamental business problem is necessary to avoid solving the wrong problems, but this is true for all software methodologies.

Free statistical software

Free statistical software is a practical alternative to commercial packages. Many of the free to use programs aim to be similar in function to commercial packages, in that they are general statistical packages that perform a variety of statistical analyses. Many other free to use programs were designed specifically for particular functions, like factor analysis, power analysis in sample size calculations, classification and regression trees, or analysis of missing data.

Many of the free to use packages are fairly easy to learn, using menu systems. Many others are command-driven. Still others are meta-packages or statistical computing environments, which allow the user to code completely new statistical procedures. These packages come from a variety of sources, including governments, universities, and private individuals.

Limitations of packages

Most of the packages have limitations of some sort.

Several of the programs, including Easyreg, Epidata and Instat, do not appear to handle missing data or do not handle it well. While EpiInfo has many statistical procedures, correlation is not one of them. Rather correlation is found by regression. This means that EpiInfo will not produce a single table showing correlations among multiple variables. According to the Zelig installation manual, use of Zelig requires that R and several of its libraries already be installed, and the installation also requires some degree of background in R. One limit of MicrOsiris is in handling the output. When calculations are complete, the output pages through the results, but various menu boxes also appear over the results, and so the results cannot be accessed. The output can be saved, though, as a text file and then used.

One limitation is specific to programs that were developed by individuals. Support for these programs is limited to the time that the author has available. While the authors may, and often do, respond fairly quickly when there are few people asking questions, if too many people ask questions or the author is otherwise busy, support would correspondingly be slower.

R is both written by and used by a large number of people all over the world, and many forums and other internet facilities can be used to get support from other users. While R is powerful, the learning curve can be rather steep for those not already familiar with other kinds of scientific programming.

Analytics

Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It can be valuable in areas rich with recorded information; analytics relies on the simultaneous application of statistics, computer programming, and operations research to quantify performance.

Organizations may apply analytics to business data to describe, predict, and improve business performance. Specifically, areas within analytics include descriptive analytics, diagnostic analytics, predictive analytics, prescriptive analytics, and cognitive analytics. Analytics may apply to a variety of fields such as marketing, management, finance, online systems, information security, and software services. Since analytics can require extensive computation (see big data), the algorithms and software used for analytics harness the most current methods in computer science, statistics, and mathematics. According to IDC, global spending on big data and business analytics (BDA) solutions is estimated to reach $215.7 billion in 2021. As per Gartner, the overall analytic platforms software market grew by $25.5 billion in 2020.

Analytics vs analysis

Data analysis focuses on the process of examining past data through business understanding, data understanding, data preparation, modeling and evaluation, and deployment. It is a subset of data analytics, which takes multiple data analysis processes to focus on why an event happened and what may happen in the future based on the previous data. Data analytics is used to formulate larger organization decisions.

Data analytics is a multidisciplinary field. There is extensive use of computer skills, mathematics, statistics, the use of descriptive techniques and predictive models to gain valuable knowledge from data through analytics. There is increasing use of the term advanced analytics, typically used to describe the technical aspects of analytics, especially in the emerging fields such as the use of machine learning techniques like neural networks, decision trees, logistic regression, linear to multiple regression analysis, and classification to do predictive modeling. It also includes unsupervised machine learning techniques like cluster analysis, Principal Component Analysis, segmentation profile analysis and association analysis.

Challenges

In the industry of commercial analytics software, an emphasis has emerged on solving the challenges of analyzing massive, complex data sets, often when such data is in a constant state of change. Such data sets are commonly referred to as big data. Whereas once the problems posed by big data were only found in the scientific community, today big data is a problem for many businesses that operate transactional systems online and, as a result, amass large volumes of data quickly.

The analysis of unstructured data types is another challenge getting attention in the industry. Unstructured data differs from structured data in that its format varies widely and cannot be stored in traditional relational databases without significant effort at data transformation. Sources of unstructured data, such as email, the contents of word processor documents, PDFs, geospatial data, etc., are rapidly becoming a relevant source of business intelligence for businesses, governments and universities. For example, in Britain the discovery that one company was illegally selling fraudulent doctor’s notes in order to assist people in defrauding employers and insurance companies is an opportunity for insurance firms to increase the vigilance of their unstructured data analysis.

These challenges are the current inspiration for much of the innovation in modern analytics information systems, giving birth to relatively new machine analysis concepts such as complex event processing, full text search and analysis, and even new ideas in presentation. One such innovation is the introduction of grid-like architecture in machine analysis, allowing increases in the speed of massively parallel processing by distributing the workload to many computers all with equal access to the complete data set.

Analytics is increasingly used in education, particularly at the district and government office levels. However, the complexity of student performance measures presents challenges when educators try to understand and use analytics to discern patterns in student performance, predict graduation likelihood, improve chances of student success, etc. For example, in a study involving districts known for strong data use, 48% of teachers had difficulty posing questions prompted by data, 36% did not comprehend given data, and 52% incorrectly interpreted data. To combat this, some analytics tools for educators adhere to an over-the-counter data format (embedding labels, supplemental documentation, and a help system, and making key package/display and content decisions) to improve educators’ understanding and use of the analytics being displayed.