Innovation Goals in Software Development for Business Applications

Having been in touch with technical side of many companies in various sectors, we know industry is facing a period of unprecedented change. We have compiled a list of common technological challenges in the sector that companies are facing in adapting to the change. The purpose of this contribution is to discuss and communicate areas where technological challenges in the Software Development Sector lie. We see this kind of inventory beneﬁcial to the academic community as it provides an account of industry challenges. The analysis would be necessary to really assist players in the sector in being better prepared for formalizing and documenting their learning and know-how development. Beside knowledge management beneﬁts, the analysis also would help in taking advantage of research and development funding and attracting investors; both academic and industrial organizations can take advantage of this aspect. We are calling this type of analysis “capabilities analysis”. A single company might not solve the world’s technical problems, but just being aware of them and measuring steps taken to advance, even slightly, in the direction of solving the problems presented in this analysis would make the company stand out. When data is formulated this way, strong evidence is created that the project goes beyond standard engineering by distinguishing risk that can be eliminated through experiment from standard engineering risk.


Introduction
Companies are facing challenges to access new technology, deliver process improvements, create added value products and develop new business models.At the R&D group of KPMG, we are always interested in assessing the emerging trends in the industry.We work with a number of many major companies in various sectors.We help our clients understand emerging trends, build for the future and stay ahead of their competitors by taking optimal advantage of available governmental funding mechanisms.
This contribution is intended to serve as an account of technological uncertainties, challenges and difficulties in the computing science and software development industrial research and development.Having worked with the R&D side of hundreds of IT product development and systems integrators and IT departments within companies it is amazing to see that number of common technological challenges in the sector is limited.Companies are trying to solve similar problems, which clearly define state of the art in the sector; but in many cases they do not share their solutions with each other because of IP ownership issues.Such a list of current technological problems in the industry sector would be helpful to both academics and students focus on more industry relevant projects, which would entail more interest and investment from industry and governmental granting agencies.Some of the funding agencies/programs require industrial partners and/or partial funding through industry and for some others formal expression of interest from industry/companies in form of a letter of interest is required (e.g.some of NSERC funding mechanisms).The authors would be pleased to put research groups in touch with possible industrial partners.Last but not least would be better absorption of students in the industry job market.
It has been shown that technology projects for companies could be grouped into 4 categories (see Table 1): Here we would focus on 3 of the above listed areas.Two categories of approaches are out there for solving problems in these challenging areas: 1) There are some good approaches to challenging problems that just need use of routine engineering approaches to tackle with (such as trial and error); 2) There are approaches that require moving beyond standard methods and merit research and development.These problems are the focus of this contribution.
Stat., Optim.Inf.Comput.Vol.The Table 2 below evaluates technical risks that can be mitigated through standard engineering approaches (first category).Where a standard engineering approach is not completely effective in mitigating the specified risk, further non-routine engineering may be necessary and we have noted this by adding the tag, "see below" to the mitigation.

Authorization & authentication General
Transport layer security is a common approach for mitigating authorization and authentication risks.This may be sufficient for simple applications.
Internal configuration details may be hidden by using XML to rewrite URLs and other information exposed to the web (filtering).

Authorization & authentication Denial of service attacks
Detail of service attacks may be thwarted by using a proxy XML security gateway which check and limits messages on the basis of connection duration, and message size.In addition to schema validation, checks should be made on formed-ness, identity or resource references, protocol (e.g.SOAP) validity and other message validity checks.

Message Repudiation
By signing messages, to prevent modification of content, message repudiation is prevented, and the transaction history (augmented by synchronizing all network nodes using the NTP (Network Time Protocol) will be useful for verifying the sequence of steps comprising a transaction.Although this is effective,it may not be efficient as the processing requirements may be quite high (see below).

Eavesdropping
Message encryption can be accelerated XML encryption computations using either hardware or software XML-accelerated devices, however, the general approach is to parse the XML transaction first, select the portions to encrypt and then to apply a set of XML and crypto functions.

General attacks
Practices include use of secure sockets layer for sensitive messages limiting connect permissions to specific users or groups, firewalls, explicitly disabling or dropping endpoints and using endpoint defaults which limit processing of unexpected messages.Kerberos authentication is one of the best practices for securing XML Web services.

Revocation of credentials
Method of synchronizing revocation of credentials among jurisdictions in a federation, which is able to handle non-responding jurisdictions and similar errors, has not been developed.Issues exist in the area of distributed synchronization Method for securely distributing configuration files and federation keys has not been developed.

Revocation of federation keys on withdrawal of a jurisdiction
The browser is the weak spot

Routine Risk Mitigation
Attacks Cross site scripting and potential theft of cookies Attacks DACS relies on cookies See Table 3 2

.1. Experimental development approaches
The Table 3 below outlines current research in the area, including any conclusions and limitations on those conclusions.Metrics for the problem are given.A first further research step is outlined.Problems in second category represent current research in the area.As noted above, the approaches outlined in Table 3 should not be amenable to standard engineering techniques i.e. appearing in the Table 3.

Current
Transport layer security is a common approach used by many which may be research sufficient for less complicated uses.The standard practice is to use a security architecture which is multi-tiered.A web client sends a request to a middle-tier application which is redirected to a distributed access control system.The middle tier then re-directs the request with the token received to the web service.Issues exist in the area of distributed synchronization, Revocation of federation keys on withdrawal of a jurisdiction and weakness at the level of the browser open the system to cross site scripting and potential theft of cookies.Modification of web services to accept cookies for the purposes of authorization is non uniform and complex.
Other models of SOA security are detailed in the literature.Hunhs et.al. discuss the relationship between service oriented computing and multi-agent systems and present a research agenda for the next 15 years on service-oriented multi-agent systems [1].The computing environment resembles the grid based computing environment [2] shown in Figure 1 below.Confidentiality and message integrity may be promoted in more complex environments characterized by more than two parties, or multiple web services.
Messages or portions of messages may be signed and encrypted and tokens may also be added to messages to assert claims, such as those made about the identity of the message sender by a trusted authority [3].
Access control models using the concept of stacking or delegation of authority (distribution of a capability), which is done by passing a WSDL description, creating a new authority by generating a restricted capability based on an original capability, (stacking an object on an original object), have been presented, but a application in the domains of interest that the writers are familiar with has not been demonstrated [4].

Metrics
The metrics that would be appropriate would be the same metrics that would be applicable to code created using one architectural approach vs. another.These would include the following; parameters, exceptions thrown, cyclomatic complexity, lines of code, paths, static invocations, and anonymous classes.When designing distributed web services, it has been suggested that there are three properties that are commonly desired: consistency, availability, and partition tolerance.It has also been hypothesized that it is impossible to achieve all three and the authors of the referenced paper have gone on to prove this conjecture in the asynchronous network model, and then discuss solutions to this involving a relaxation of requirements to a partially synchronized model [5].

First test
We will implement a system whereby users will be attached to user proxies and resources attached to resource proxies via long lived credentials.Resource proxies will be attached to user proxies and other resource proxies through short-lived credentials.The tests of the architecture will be compared with the architecture of the current distributed access control system.

Creating Secure Operating Systems From Insecure Components
Current A new operating system has been proposed which prevents confidential processes research from conveying any information to non-confidential processes [6].Message passing systems require communication with trusted user level services such as file servers in order to perform data transfer operations making it difficult to enforce one-way information flow.

Application Deployment Automation, Release Automation
The Table 4 below evaluates all technical risks that can be mitigated through standard engineering approaches (first category).See Table 5 The Table 5 below outlines current research in the area, including any conclusions and limitations on those conclusions.Metrics for the problem are given.A first further research step is outlined.Problems in second category represent current research in the area.

Current
Recent literature describes an algorithm for automatic test input generation for research database applications and offers an analysis based a view with two languages or layers: Java programs and the database.It is noted that most enterprise applications are built in several different layers, including JavaScript code, browser forms, and a server such as Tomcat that mediates data flow.The second recent piece of academic B. ARNOLD AND M. REZA SHADNAM work describes the application of automated environment generation to commercial software [7].Environment generation is found to be a non-trivial task.A test environment should be general enough to subject the module under test to a range of conditions, in combination, yet limited so that both the development of the test environment and the testing itself could be carried out in a limited amount of time with limited resources.It would not be surprising to find that a portal type system which funneled goods and payments, including partial deliveries and payments could generate as many as three billion test states such that it would be impractical to address each one.Constraints are introduced to promote a scalable solution.The static analyses of a typical test environment generator do not take into account all possible dependencies between a module and their environment and because of this the testing would not be safe.Java.sql has also been used to implement the test bed and JDBC were modeled as empty stubs, a significant limitation on environment generation.

Metrics
A metric would be the number of errors discovered or in the case of verified code, test coverage.

First test
Developers wanting to compare their current hand coded test environment versus automatically generated test environment would therefore measure the number of errors discovered or in the case of verified code, test coverage.The numbers for states and transitions, path coverage, and branch coverage would also be measured.

Current
This problem is particularly acute in asynchronous systems; notably GUIs or B2B. research The general reference article [7] alludes to two problems of testing these types of systems. 1) The number of features is large and 2) the non-sequential nature of a transaction leading to a large number of test scenarios.A third problem lies in the regression testing of GUIs whereby, over time, the input-output mapping does not remain constant over successive versions of the software [8].The standard approach in which generation of test cases which model all states becomes a difficult endeavor when a GUI is the module under test.The approach is only as feasible as the degree to which the number of states can be limited.
The next article also alludes to complexity of the path leading to a testable state as being one of the problems with GUI testing as a consequence of the multiple dialog sequences available concurrently in a GUI-based application.Both the application program and the test set grow when concurrent dialog resulting from exits from uncompleted functions, and interleaved or asynchronous user events are involved.Approaches based on UI Management System (UIMS) and formal process specifications were bypassed because of the need to reverse engineer a UIMS or formal model from an existing system in order to generate tests automatically [9].A solution is proposed comprised of i) a way of simulating user inputs and binding Stat., Optim.Inf.Comput.Vol. 2, December 2014.the input generated by genetic algorithms to the UI during execution), ii) processing logic layer to capture the state of the user interface during application execution, iii) a method to allow the tester to generate and save naive scripts.Although the ability to use a small number of inputs to generate a large number of na ïve test scripts was demonstrated, and the ability of this approach to simulate the perambulations of the na ïve user were subjectively better than for automated test tools and expert test scripts, the applications under test were small.

Metrics
The metric would be the number of errors discovered or in the case of verified code, test coverage.As with other test technologies measures of the numbers for states and transitions, path coverage, and branch coverage are relevant.

First test
A first test would be to create three test platforms using hand coded, versus two forms of automated test environment; one based on formal process specifications or UIMS and another which would require building a driver into the GUI so that commands or events could be sent to the software from another program [10].The metric will be the number of errors discovered or in the case of verified code, test coverage.We will measure the numbers for states and transitions, path coverage, and branch coverage.
Test Tools for Service Oriented Architectures Current Most testing tools are incapable of building composite interdependent tests across research technology platforms, languages and systems.Testing may be considered part the development of service oriented architectures, however, if system testing means end-to-end testing, this is made difficult by the distribution of components, the diversity of implementation platforms, the high number of system states, deployment on distributed platforms, some of which may not event be available at the time of testing [11].Some researchers have found that although tools are available to perform testing at multiple levels; testing for qualities such as availability, performance, and security, the tools that they have analyzed cannot perform composite tests across technologies.The test tools that the authors are aware of, presuppose control over all components of the service-oriented system and do not contemplate fall back to the interfaces.They suggest overcoming the limitation of access to interfaces only through the use of gray-box testing and this is the subject of current research.
Limitations of As above the tools that have been considered cannot perform compositetests current across technologies, and assume as a pre-condition of control over all constituent research parts of the system under test, not just the interfaces to the parts of the system and although current research is being carried out with respect to the applicability of gray-box testing to the problem of testing SOAs, namely, simulation of service-oriented system environments and practices for exception handling.There are more research avenues to be followed; dynamic testing in distributed, heterogeneous environments; service certification and the possibility of service repositories that B. ARNOLD AND M. REZA SHADNAM provide test cases for services; test-aware interfaces.

Metrics
As with the other areas of test platform research, the metric would be the number of errors discovered though testing compared with the number found in the "wild" or in operation or in the case of verified code, test coverage.As with other test technologies measures of the numbers for states and transitions, path coverage, and branch coverage are relevant.
First test SOA test platform developers wanting to compare their test environment versus a test environment utilizing techniques such as gray-box would measure the number of errors discovered though testing compared with the number found in the "wild" or in operation.The numbers for states and transitions, path coverage, and branch coverage would also be measured.

Data Warehouse/Big Data Design: Data Quality
The Table 6 below evaluates all technical risks that can be mitigated through standard engineering approaches (first category).

Data quality
Need to decompose the data subjects into data entities comprising facts and multiple dimensions.
Dimensional (logical) model (cube).Entity-relationship model, star schemas, snowflake schemas, fact-constellation schemas, persistent multidimensional stores, summary tables.The challenge, given a set of databases that are already decomposed into data entities is to transpose this model to a star schema arrangement of facts and dimensions without losing the meaning of the data.
Another risk is having multiple sources of the same data.
Mitigation is through analysis to find the best source, and in the longer term, build synchronization functions or work towards consolidation to a single "source of truth" database.
Missing or incorrect data leads to errors in information derived from data Profiling to discover anomalies, validity assessment by measuring conformance with business rules, and accuracy assessment through sampling.to a single "source of truth" database.

Missing or incorrect data leads to errors in information derived from data
Assign trust factors to each data source and include a "decay factor."As data gets older, the trust factor declines.(Sanlam gets a new look on life with customer data) The Table 7 below outlines current research in the area, including any conclusions and limitations on those conclusions.Metrics for the problem are given.A first further research step is outlined.Problems in second category represent current research in the area.

Table 7. Experimental development approaches
Data Quality

Current research
Statistical process control (SPC) can be used for early identification of data anomalies; an automated statistical control framework could be developed to detect when the process variables or inputs to the second stage of the data warehouse in this case were out of control.Although bands of routine variation for both the individual values and the moving ranges, the choice of which processes to monitor, and the frequency, natural batch and which variables to observe has not been studied much in practice [10].Some authors feel that no tools for data mining exist or that the existing tools are not effective and propose a rethinking of methodologies, models, and techniques and of course a set of requirements for the technology for implementing the data flow process [11].The main components introduced are described as; an integrator which integrates in a coordinated fashion data from operational databases, from the DW, and from other data streams; a repository capable of storing short-term data for quick retrieval, for the purpose of rule application and mining; a module that computes hierarchies of indicators to feed dashboards and reports; tools for extracting patterns out of the data streams; a module which monitors the events and sends messages to the users.Data latency or the interval between an event through a transforming process of the data coming from databases as well as from data input streams is a process variable that we seek to minimize.Although this might be achieved through the techniques of dynamic integration, such as by query writing on heterogeneous sources which has been reported to have been implemented in prototype, however it is conjectured by the authors of one paper that most of the cleaning techniques devised (purge/merge and duplicate detection) rely on a materialized integrated level.

Limitations of current research
As stated above, although reduction in data latency might be achieved through the techniques of dynamic integration, it is conjectured that most of the cleaning techniques devised (purge/merge and duplicate detection) rely on a materialized integrated level.
Notable events are not limited to patterns which may be detected in short term information.Events of interest include time dependant patterns deviate fromthe norm by definition arise over time i.e. processes which may arise over time, and may only be detected by relying on some historical data.Work on high-performance time series mining has been carried out; however, the problem of storing data for fast retrieval arises.Since data will be accessed in different ways by different modules concurrently, straightforward buffering techniques will not be work in this context.

Metrics
Measures of technologies for ETL, specifically, data cleaning are therefore the latency in the cleaning process and the depth of the sample in which patterns that deviate from the norm may be detected.

First test
A next step would be to prototype, in memory, merge/purge and duplicate detection, measuring the latency in the process.A technique for buffering the input data stream to a certain depth could also be prototyped.ETL -Web Services, Realtime data

Current research
One paper suggests that web services might be used to pack a data warehouse containing real time data of an electric power company [16] This work touches on data transformation and flow (ETL) with the update strategy based on message queue and XML.

Limitations
Although it seems that strategies capable of ETLing real time data would be superior, the message from this research is that the traditional methods (above) are increasingly inadequate as volumes and immediacy of data increase.It is not clear whether the strategies could be implemented in the domain of interest and how.

Metrics
The number of edge cases (non-linear behavior) associated with a specific domain.

Current research
Applications of data mining are ubiquitous in the textbooks and, the metrics vary with each application.For example, many papers explore the classification problem; a major sub-field of data mining, through the use of mathematical programming based methods.The paper we choose to reference offers a good comparison of methods, including LDA, Decision Tree, SVMLight, and LibSVM which is compared with the proposed proposes a Multi-criteria Convex Quadratic Programming model (MCCQP).Although the experimental results indicate that the proposed MCCQP model achieves as good as or even better classification accuracies than other methods, the work does not address the objective of the work which was also reviewed,namely the development of a learning system, which allows the non-confidential aspects of the multiple data sets to be shared.Problems of this nature are more likely in the future.
Statistical process control (SPC) can be used for early identification of data anomalies, an automated statistical control framework could be developed at the beginning of the information chain to monitor the inputs.

Limitations
Although the limits on the data can define bands of routine variation for both the individual values and the moving ranges, the choice of which processes to monitor, and the frequency, natural batch and which variables to observe has not been studied much.[17] Metrics Metrics in one of the following areas will be chosen depending on the most likely use for the dataset; 1) Predictive accuracy 2) Classification accuracy 3) Rank accuracy and 4) Non-accuracy.[18] First test Datasets used for classification such as Medical:Appendicitis, Breast cancer (Wisconsin), Heart disease(Cleveland) and Other datasets:Ionosphere, Satellite image dataset (Statlog version), Sonar, Telugu, Vovel will be used to compare classification methods, broadening the work above.Metrics will be chosen as above.

Data Warehouse/Big Data Design: Analytics
The Table 8 below evaluates all technical risks that can be mitigated through standard engineering approaches (first category).with each application.For example, many papers explore the classification problem; a major sub-field of data mining, through the use of mathematical programming based methods.The paper we choose to reference offers a good comparison of methods, including LDA, Decision Tree, SVMLight, and LibSVM which is compared with the proposed proposes a Multi-criteria Convex Quadratic Programming model (MCCQP).Although the experimental results indicate that the proposed MCCQP model achieves as good as or even better classification accuracies than other methods, the work does not address the objective of the work which was also reviewed, namely the development of a learning system, which allows the non-confidential aspects of the multiple data sets to be shared.Problems of this nature are more likely in the future.
Statistical process control (SPC) can be used for early identification of data anomalies; an automated statistical control framework could be developed at the beginning of the information chain to monitor the inputs [10].Although the limits on the data can define bands of routine variation for both the individual values and the moving ranges, the choice of which processes to monitor, and the frequency, natural batch and which variables to observe has not been studied much.
Golfarelli et al [11] characterize the BPM solutions proposed by software vendors as classical OLAP tools with some specialized ETL and data integration systems and cite two examples [12], [13].They go on to say that such solutions or permutations of them will not solve the problem.
They suggest a complete functional framework for synthesizing Business Intelligence (BI), one which combines the functionality of Data Warehousing (DW) with Business Activity Monitoring (BAM) [14].The main components introduced by BAM are: • Right-Time Integrator (RTI) that integrates at right-time data from all sources; operational databases, DW, and from data streams; • Dynamic Data Store (DDS), short term data store, to support rule inference • KPI manager that computes variables to feed dashboards and reports; • mining tools, to extract relevant patterns out of the data streams; • rule engine that monitors the events raised by the RTI or mining tools and sends messages to users The authors claim that although BAM reduces data latency through the provision of the above functionality, chiefly the RTI, the adoption of dynamic techniques, raises Stat., Optim.Inf.Comput.Vol. 2, December 2014.
Figure 1.Computational Grid Security Architecture

Table 3 .
Experimental development approaches

Table 4 .
Standard engineering approaches

Table 5 .
Experimental development approaches

Table 6 .
Standard engineering approaches

Table 8 .
Standard engineering approaches

Table 9 .
Experimental development approaches AnalyticsCurrentApplications of data mining are ubiquitous in the textbooks and, the metrics vary research