1 Introduction

This report analyzes the answers from a survey conducted through an online questionnaire in the period from October 2022 until December 2022. The aim of the survey was to empirically validate elements of a previously created quality model for cloud-native application architectures. More specifically, the quality model consists of so-called factors which represent architectural characteristics of software systems and so-called quality aspects which represent higher-level quality attributes mainly adopted from the ISO 25010 standard. Factors and quality aspects are connected through impacts which describe how the presence of a factor in a software architecture impacts the respective quality aspect. Through this, a hierarchical quality model can be created. The survey focused on these impacts with the goals of:

Confirming the impacts stated in the initially formulated, literature-based quality model
Identifying additional impacts not considered yet in the quality model
Evaluating the strength of impacts

In the following, the setup of the survey is described in section 2 and some general statistics on how the survey was answered by participants are reported in section 3 together with demographic information in section 4. The main results of the survey regarding the validation of the quality model are presented in section 5 and a conclusion with a short discussion of these results is given in section 6.

2 Survey setup

2.1 Survey Design and Interface

Because the quality model is intended to cover the breadth of the topic of cloud-native applications and considers multiple quality aspects in combination, it has a large number of factors, currently 76, which are arranged in an hierarchical graph with the quality aspects they impact at the top. To validate all these factors in a survey is challenging, therefore we:

considered only the impacts from factors on quality aspects, excluding impacts from factors on other factors within the hierarchy,
excluded factors merely serving as mediating factors between factors and quality aspects,
excluded factors which are less specific to cloud-native applications.

However, this still led to 45 factors for the survey together with 24 quality aspects which can be impacted. To enable participants to state any impact without being biased, all potential impacts, that means 45 * 24 = 1080, need to be considered, leading to the problem of how to design such a survey.

To tackle this, we developed a custom survey frontend which aims to make the rating of impacts from factors on quality aspects simple and intuitive. At the core, a participant is presented with a factor explained by a short description and the quality aspects grouped by their high-level quality aspects. By clicking on a quality aspect, it is possible to rate the impact from that factor on that quality aspect. Therefore, in each question a participant has to consider only one factor. Because the question page is structurally the same for each factor, the participant should get used to the format reasonably quickly.

In addition, we decided that participants could freely choose for how many factors they wanted to provide an answer. In an ideal case, each participant would provide an answer to each factor (45), but the time and effort one is willing to invest in such a survey is limited which limits the number of factors for which a participant is willing to provide an answer. Then again, there are differences between participants: some interested participants might be willing to provide more answers, others are not and might even abort the survey if the number of questions is too high. Thus, instead of asking for a random number of factors, we designed the survey flexible so that each participant could choose the factors to be answered based on their topics of interest and also their available time. Also, for each factor it was possible to choose any number of quality aspects that would be impacted by this factor.

2.2 Survey distribution & participation

The intended target audience for the survey were IT professionals who have practical experience with implementing and deploying web applications on cloud infrastructures. This covers a rather broad range of professionals which is why we also distributed the survey in a broader approach. To ask for participation, we:

contacted personally known professionals via email
contacted professionals who had published articles on a fitting topic online
contacted professionals who held a talk with a fitting topic at industry conferences
contacted professionals who had published with a fitting topic at scientific conferences
posted the request for participation on social media, including community groups specifically considering cloud computing

On the welcome page for the survey we stated the purpose of the survey and who would be eligible to participate so that interested persons could decide for themselves whether they wanted to fill out the survey. During the period in which the survey was available online, we received 42 complete submissions. The resulting data from these submissions as well as this report can be found online: https://github.com/r0light/qmsurvey-results

3 General statistics

3.1 Amount and distribution of answers

Figure 3.1: No. of factors answered by participants and No. of ratings stated across all factors

Because of the flexible survey design, it is interesting to analyze how participants made use of this flexibility, that means for how many factors each participant provided an answer and how many quality aspects they selected to rate for each factor. This can be seen in Figure 3.1. In the left boxplot we can see that the median for how many factors were answered per participant is 5.5 which is quite low compared to the number of 45 factors in total. It shows the difficulty of validating larger quality models using such an online questionnaire. Nevertheless, there were also participants who provided answers to significantly more factors which is helpful and would not have been possible with a non-flexible survey design in which the number of factors to answer was fixed beforehand. The right boxplot shows for how many quality aspects ratings were stated across all factors. For a better understanding of this data it has to be noted that for each quality aspect participants could specify the impact based on the following scheme:

the factor has a positive impact (++) on the quality aspect
the factor has a slightly positive impact (+) on the quality aspect
the factor has no impact (0) on the quality aspect
the factor has a slightly negative impact (-) on the quality aspect
the factor has a negative impact (–) on the quality aspect

The impact type no impact was selected as a default for each quality aspect, if no other impact was explicitly stated. The boxplot on the right of Figure 3.1 shows per factor, for how many of the 24 quality aspects a participant stated impacts explicitly. In the instructions of the survey we told participants that we would expect “typically between one and five” quality aspects for which impacts are stated. As the median of 3 shows, most participants stayed within this range, but there are numerous outliers where participants stated impacts for many quality aspects. In one case, a participant stated impacts for all 24 quality aspects which might be the result of a misunderstanding of how to fill out the survey.

In addition, it is interesting to analyze which factors were selected by participants. Figure 3.2 shows for each factor, by how many participants it was selected to state impacts on quality aspects. While it is difficult to interpret why certain factors where chosen more often than others, it can be seen that there are significant differences between the factors.

Potential reasons might be that certain factors are easier to understand, such as Service replication while others such as Communication partner abstraction are more abstract and therefore more difficult to evaluate. The number of times answers have been provided for a factor will also be important for the interpretation of the results in section 5, because the results are only interpretable based on the set of participants who provided an answer which is nor representative for the whole set of participants. Additionally, only a larger number of answers leads to more confidence in the interpretation of results.

Figure 3.2: Number of times the individual factors have been answered

3.2 Answer times

Figure 3.3: Time spent for answering individual factors

Finally, the survey tool also captured a timestamp each time a factor was answered by a participant. This way, we can to some extent evaluate how much time participants spent on thinking about factors. However, an exact evaluation of such answer times would only be possible in a controlled experiment while with the online distribution of the survey we do not know how participants filled out the survey. For example, it was also possible to start the survey and continue at a different time, because the progress was stored locally in the participant’s browser. Therefore, some answer times that were captured showed values of 188262 or 10861 seconds which are the result of such behaviour and therefore need to be excluded. As a countermeasure we firstly excluded obviously erroneous answer time values greater than 3600 seconds and based on the remaining data points, we secondly removed outliers which were greater than the doubled standard deviation. The remaining data is plotted in Figure 3.3. The median is at 46 seconds, so slightly below one minute. The lower quartile is at 28 seconds and the upper quartile is at 82 seconds. The measured times seem reasonable, although there is also a significant amount of outliers which might be the result of factors that are difficult to answer. This data could be useful for planning future surveys with similar types of questions.

4 Demographics

In addition to the main part of the survey, we also asked some final demographics questions including the primary area to which participants would assign their current job to, their current job title, as well as the years of experience they have with software development in general on the one hand and experience with cloud platforms on the other hand. It has to be noted that answering these questions was optional and only 34 participants reported demographic data at least partly. Nevertheless, in Figure 4.1 it can be seen that nearly half of the participants work in the industry area, in contrast to the other half stemming from academia, which provides a diversified field of participants. And as Figure 4.2 shows, the participants also have significant experience. An expectable, but also interesting fact is that the experience with cloud computing is consistently lower than general software development experience, due to the relative novelty of the technology.

Figure 4.1: Job areas and job titles stated by participants

Figure 4.2: Years of experience reported by participants

5 Results

5.1 Data characteristics

The main results of our survey are the aggregated ratings stated by the participants per factor-quality aspect combination. Overall, there is a quite broad distribution of impacts stated by participants. For 629 out of the 1080 potential factor-quality aspect combinations there was at least one impact explicitly stated by participants.

To illustrate how the results for these combinations look like, some exemplary combinations with the aggregated data from the survey are shown in the Figures 6-11. The graphs in these figures show for a single combination, how many times each possible impact rating has been stated by participants. As already mentioned, the impact type 0 was set as a default for all ratings. This means that there is a certain probability that a quality aspect has not been considered for a factor by a participant, just because he or she prioritized other quality aspects, but the rating was nevertheless captured as 0.

Based on these combinations, we want to derive which impacts exist between the different factors and quality aspects. As stated in the introduction, we want to use these results to validate the initially stated impacts of the quality model for cloud-native application architectures as well as to include additional impacts. To prepare the analysis, we calculated the following measures for each factor-quality aspect combination:

weightedMean (numeric): The mean value of ratings weighted by the number of times they have been stated. To enable this calculation we assigned values to impact types in the following way: -- is interpreted as the value -2, - as the value -1, 0 as the value 0, + as the value +1, and ++ as the value +2.
aboveThreshold (logical): Is TRUE if the total number of ratings provided that factor-quality aspect combination is at least the threshold we defined to show some significance. For the analysis we set this threshold to 5.
pValue (numeric): Probability of the observed distribution under the null hypothesis. See the following explanation.

To have an indicator for the significance of a result, we used the Exact multinomial test of goodness-of-fit which is suitable when there are multiple values of one nominal variable (the different types of impact) and the sample size is small. Additionally, the individual observations (answers) need to be independent which we can assume, because participants did not see answers from other participants and did the survey presumably alone. For this test we need to formulate a theoretically expected distribution which represents the null hypothesis. We can then calculate the probability of getting the actually observed distribution under the null hypothesis to assess whether the null hypothesis can be rejected. As a distribution for the null hypothesis, we assume that the impact of a factor on a quality aspect is undecidable: That means that all possible ratings are equally likely with an exception of the rating no impact (0). We consider the rating 0 as being twice as likely as the other ratings, because it is selected as the default if the participant did not explicitly state a different rating. For the ratings --:-:0:+:++, we thus assume a ratio of 1:1:2:1:1 under the null hypothesis. For each factor-quality aspect combination, we can therefore calculate the p value using this test, which we added as pValue to each combination. Again, it has to be noted, that these values depend on the number of participants who decided to provide an answer for the respective factor and are therefore not representative for the sample as a whole.

In section 5.3, we describe how we validated the initially stated impacts from the quality model using these measures. And in section 5.4, we explore additionally stated impacts to integrate them in the quality model.

5.2 Examples

Exemplary factor-quality aspect combinations with the results from the survey to illustrate the data.

5.3 Validation of previously stated impacts

To assess whether the impacts stated in our initial quality model are confirmed and therefore validated by the survey, we mapped the results for each factor-quality aspect combination to the factor-quality aspect combination with a hypothesized impact from the initial quality model. For each combination, we then used the following R function to decide on whether the hypothesized impact can be validated or not:

validateImpact <- function(statedImpact, weightedMean, 
                           aboveThreshold, pValue) {
  
  pThreshold <- 0.1
  validationResult <- ""
  
  if(weightedMean < 0.5 & weightedMean > -0.5) {
    validationResult <- if(aboveThreshold & pValue < pThreshold) "✗" else "(✗)"
  } else {
    impactTrend <- if(weightedMean > 0) "positive" else "negative"
    if (aboveThreshold & pValue < pThreshold) {
      validationResult <- if(impactTrend == statedImpact) "✓" else "⇄"
    } else {
      validationResult <- if(impactTrend == statedImpact) "(✓)" else "(⇄)"
    }
  }
  return(validationResult)
}

The function receives the following input parameters:

statedImpact (character): The impact stated in the initial quality model. Is either positive or negative
weightedMean, aboveThreshold, and pValue as described in section 5.1

and returns a character value describing the validation result.

In essence, an impact is considered valid if the weightedMean value is beyond a limit of |0.5| and has the same leading sign as expected based on the hypothesized impact. Additionally, we used a significance level of ɑ = 0.1. If the p Value is not below ɑ or the number of answers for that factor is below the set threshold, results are marked as only probable, using brackets. The p Value additionally works as an indicator for how probable a result is. Finally, it is also possible that the analysis of the results shows an impact inverse to the hypothesized one. The possible outcomes are also listed in Table 1.

Possible results for the validation of impacts.
Result	Explanation
✓	The impact is valid.
(✓)	The impact is potentially valid.
✗	The impact is invalid.
(✗)	The impact is potentially invalid.
⇄	The impact is inverse.
(⇄)	The impact is potentially inverse.

In the following, we have grouped the outcomes by the individual factors for a better overview and have further separated the factors based on the validation results.

Table 5.1 includes all factors for which at least one hypothesized impact could be confirmed. Table 5.2 lists remaining factors with at least potentially valid impacts and finally Table 5.3 lists those factors for which no hypothesized impacts could be validated or where the result is unclear. Especially the results which are controversial, such as the impact from Resource limits on Resource utilization in Table 5.3, indicate that these combinations need further investigation. Possibly, the descriptions provided for these factors and quality aspects can be interpreted in different ways or the factors cover different characteristics which impact quality aspects differently. In any case these factors need to be revised either by describing more clearly what characteristics they cover or by exchanging them with separate factors that cover individual characteristics which have differing impacts on quality aspects.

Overall, several impacts stated in the initial quality model can be validated with the survey and for many there is at least a tendency towards their validation. By tendency we mean that for these impacts individual ratings supported the initially stated impacts, but there were simply not enough answers to rate the impact as validated based on our evaluation.

However, many impacts also could not be validated and might need to be removed from the quality model. One reason why impacts that were stated in the initial quality model could now not be validated might be that we did not consider mediating factors in the survey. Mediating factors are those factors in a hierarchical quality model which impact quality aspects, but are also impacted by other factors. In a sense they “explain” the relationship between lower-level factors and higher-level quality aspects. But when they are missing, an indirect impact from a lower-level factor on a quality aspect might be less obvious. Furthermore, especially the factors in Table 5.3 consider more abstract characteristics which might be less intuitive to understand and therefore more difficult to rate. Nevertheless, it can make the quality model simpler and therefore better understandable if impacts that could not be validated are removed.

Table 5.1: Factors with a successful validation of initially stated impacts
Factor	Quality Aspect	hypothesized	-	0	+	++	Impact	Validation	p-Value
Authentication delegation	Authenticity	positive	0	0	2	3	++	✓	0.0288066
Automated infrastructure	Modifiability	positive	2	12	1	5	+	✗	0.0224132
Automated infrastructure	Recoverability	positive	0	7	3	10	+	✓	0.0005842
Automated restarts	Recoverability	positive	0	2	2	8	++	✓	0.0006659
Built-in autoscaling	Elasticity	positive	0	3	2	12	++	✓	0.0000051
Built-in autoscaling	Resource utilization	positive	0	3	3	11	+	✓	0.0000271
Cloud vendor abstraction	Adaptability	positive	0	4	1	6	+	✓	0.0175511
Consistent centralized metrics	Analyzability	positive	0	1	3	3	+	✓	0.0559842
Consistent centralized metrics	Recoverability	positive	0	6	0	1	+	(✗)	0.1163980
Distributed tracing of invocations	Analyzability	positive	0	1	2	5	++	✓	0.0121694
Distributed tracing of invocations	Recoverability	positive	0	7	0	1	+	✗	0.0422811
Health and readiness Checks	Analyzability	positive	0	5	1	3	+	(✓)	0.2329247
Health and readiness Checks	Recoverability	positive	0	5	0	4	+	✓	0.0414333
Infrastructure abstraction	Adaptability	positive	0	2	4	2	+	✓	0.0969603
Infrastructure abstraction	Modifiability	positive	0	5	2	1	+	(✓)	0.3182442
Infrastructure abstraction	Recoverability	positive	0	6	2	0	+	(✗)	0.1054955
Managed infrastructure	Resource utilization	positive	0	7	4	3	+	✓	0.0765035
Managed infrastructure	Simplicity	positive	0	11	1	2	+	✗	0.0092780
Physical service distribution	Availability	positive	0	1	1	8	++	✓	0.0001016
Service replication	Time-behaviour	positive	0	4	4	6	+	✓	0.0140215
Use infrastructure as code	Modifiability	positive	0	12	3	1	+	✗	0.0047745
Use infrastructure as code	Recoverability	positive	0	8	3	5	+	✓	0.0378335
Use infrastructure as code	Installability	positive	0	8	1	7	+	✓	0.0039382
Vertical data replication	Time-behaviour	positive	0	1	0	4	++	✓	0.0288066

Table 5.2: Factors with potentially valid initially stated impacts
Factor	Quality Aspect	hypothesized	--	-	0	+	++	Impact	Validation	p-Value
Access control management consistency	Integrity	positive	0	0	4	4	2	+	(✓)	0.1548783
Account separation	Accountability	positive	0	0	4	0	3	+	(✓)	0.1243999
API-based communication	Interoperability	positive	0	0	6	1	3	+	(✓)	0.1655474
API-based communication	Testability	positive	0	0	4	4	2	+	(✓)	0.1548783
Circuit breaked communication	Fault tolerance	positive	0	0	3	1	4	+	(✓)	0.1455047
Configuration stored in specialized services	Adaptability	positive	0	0	1	0	1	+	(✓)	1.0000000
Configuration stored in specialized services	Availability	positive	0	1	1	0	0	-	(⇄)	1.0000000
Consistent centralized logging	Analyzability	positive	0	0	1	2	3	+	(✓)	0.1184842
Consistent centralized logging	Recoverability	positive	0	0	5	0	1	+	(✗)	0.1718107
Dynamic scheduling	Modifiability	positive	0	0	6	0	0	0	✗	0.0696159
Dynamic scheduling	Recoverability	positive	0	0	6	0	0	0	✗	0.0696159
Dynamic scheduling	Resource utilization	positive	0	0	2	1	3	+	(✓)	0.3158436
Immutable artifacts	Replaceability	positive	0	0	3	0	1	+	(✓)	0.6296296
Limited data scope	Modularity	positive	0	0	2	2	1	+	(✓)	0.7222222
Logical grouping	Modifiability	positive	0	0	3	0	1	+	(✓)	0.6296296
Logical grouping	Co-existence	positive	0	0	4	0	0	0	(✗)	0.2160494
Managed backing services	Resource utilization	positive	0	0	7	2	0	+	✗	0.0432623
Managed backing services	Simplicity	positive	0	0	4	3	2	+	(✓)	0.3169439
Mostly stateless services	Modularity	positive	0	0	5	1	3	+	(✓)	0.2329247
Mostly stateless services	Replaceability	positive	0	0	6	2	1	+	(✗)	0.2329247
Mostly stateless services	Elasticity	positive	0	0	4	1	4	+	(✓)	0.1325017
Physical data distribution	Availability	positive	0	0	2	1	4	+	(✓)	0.1099966
Retries for safe invocations	Fault tolerance	positive	0	0	1	2	2	+	(✓)	0.2695473
Rolling upgrades enabled	Availability	positive	0	0	0	0	3	++	(✓)	0.0185185
Secrets stored in specialized services	Confidentiality	positive	0	0	2	2	2	+	(✓)	0.4547325
Secrets stored in specialized services	Availability	positive	1	1	4	0	0	-	(⇄)	0.6399177
Separation by gateways	Reusability	positive	0	0	2	0	0	0	(✗)	1.0000000
Separation by gateways	Availability	positive	0	0	0	2	0	+	(✓)	0.1111111
Separation by gateways	Integrity	positive	0	0	2	0	0	0	(✗)	1.0000000
Sharded data store replication	Time-behaviour	positive	0	0	3	0	1	+	(✓)	0.6296296

Table 5.3: Factors with (potentially) invalid and unclear initially stated impacts
Factor	Quality Aspect	hypothesized	--	-	0	+	++	Impact	Validation	p-Value
Addressing abstraction	Interoperability	positive	0	0	5	0	1	+	(✗)	0.1718107
Addressing abstraction	Modularity	positive	0	0	6	0	0	0	✗	0.0696159
Asynchronous communication	Modularity	positive	0	0	5	1	1	+	(✗)	0.3458505
Backing service decentralization	Modifiability	positive	0	0	5	0	1	+	(✗)	0.1718107
Backing service decentralization	Co-existence	positive	0	0	6	0	0	0	✗	0.0696159
Command Query Responsibility Segregation	Modularity	positive	0	0	5	2	0	+	(✗)	0.1430041
Communication partner abstraction	Interoperability	positive	0	0	3	0	0	0	(✗)	0.4444444
Communication partner abstraction	Modularity	positive	0	0	2	1	0	+	(✗)	1.0000000
Communication partner abstraction	Analyzability	negative	0	0	3	0	0	0	(✗)	0.4444444
Contract-based links	Interoperability	positive	0	0	5	0	0	0	(✗)	0.1100823
Contract-based links	Testability	positive	0	0	5	0	0	0	(✗)	0.1100823
Horizontal data replication	Time-behaviour	positive	0	0	7	0	1	+	✗	0.0422811
Limited request trace scope	Modifiability	positive	0	0	6	0	0	0	✗	0.0696159
Limited request trace scope	Co-existence	positive	0	0	5	1	0	+	(✗)	0.1718107
Mediated communication	Interoperability	positive	0	0	4	1	0	+	(✗)	0.3518519
Mediated communication	Modularity	positive	0	0	4	1	0	+	(✗)	0.3518519
Persistent communication	Modularity	positive	0	0	4	1	0	+	(✗)	0.3518519
Persistent communication	Recoverability	positive	0	0	3	2	0	+	(✗)	0.3518519
Resource limits	Resource utilization	positive	2	2	2	3	2	+	(✗)	0.7243259
Specialized stateful services	Modularity	positive	0	0	2	0	0	0	(✗)	1.0000000
Specialized stateful services	Replaceability	positive	0	0	2	0	0	0	(✗)	1.0000000
Specialized stateful services	Elasticity	positive	1	0	1	0	0	-	(⇄)	1.0000000
Usage of existing solutions for non-core capabilities	Reusability	positive	0	0	4	3	0	+	(✗)	0.1243999

5.4 Additionally found impacts

Next, we analyzed the survey results for impacts that were not previously considered in the quality model. As written before, there is a broad range of factor-quality aspect combinations for which ratings were stated by the participants. The challenge therefore is to filter out the significant ones which can potentially be included in the quality model.

To determine the type and strength of each impact, we again used the metrics described in section 5.1. We categorized the different impacts according to their type and probability by comparing the weightedMean value with the same limits as in the previous validation analysis and the p-Value with the same significance level as in the previous analysis.

The resulting factor-quality aspect combinations can be seen in the Tables 5.4 and 5.5. However we only included impacts which were not already considered in the quality model (and therefore the validation step) and from those only impacts showing a positive or negative trend.

The impacts are grouped by the factors as well as the high-level quality aspects of the quality aspects impacted by factors. Therefore, it can be seen that for several factors, impacts on quality aspects of the same high-level quality aspect have been rated. This shows a potential difficulty of using the quality aspects from the ISO 25000 standard: Because the quality aspects which we used for the survey are strongly related based on their common high-level quality aspect, it is more difficult to clearly distinguish between them in terms of how they are affected by software characteristics. For example, the factor Managed infrastructure in Table 5.5 has been rated as positively impacting all quality aspects grouped under the Reliability high-level aspect. In a hierarchical quality model, however, it would be ideal if there is only one path from a factor to a single top-level quality aspect. For the quality model this case could thus mean that either the quality aspects as considered in the survey should be used as the top-level aspects, excluding the high-level aspects or that the factor Managed infrastructure might need to be separated into separate other factors which then exclusively impact the different quality aspects. The survey results therefore show aspects which need to reconsidered in the quality model and can guide the future work on it to develop a conceptually sound quality model.

Table 5.4: Factors with new impacts (potentially) considerable for the quality model
Factor	Quality Aspect	High-level Aspect	-	0	+	++	Impact	p-Value
Access control management consistency	Authenticity	Security	0	6	2	2	(+)	0.2302288
Access control management consistency	Confidentiality	Security	0	4	2	4	(+)	0.1548783
Account separation	Confidentiality	Security	0	4	2	1	(+)	0.5498971
Account separation	Integrity	Security	0	2	1	4	(+)	0.1099966
Addressing abstraction	Modifiability	Maintainability	0	4	1	1	(+)	0.6399177
Addressing abstraction	Simplicity	Maintainability	0	3	3	0	(+)	0.1322016
Addressing abstraction	Replaceability	Portability	0	3	2	1	(+)	0.6399177
API-based communication	Modularity	Maintainability	1	2	0	7	+	0.0022373
API-based communication	Reusability	Maintainability	0	4	1	5	+	0.0480364
API-based communication	Simplicity	Maintainability	1	5	2	2	(+)	0.6684242
API-based communication	Replaceability	Portability	0	7	0	3	+	0.0376729
Asynchronous communication	Elasticity	Performance efficiency	0	5	0	2	(+)	0.1430041
Asynchronous communication	Resource utilization	Performance efficiency	0	3	2	2	(+)	0.5498971
Asynchronous communication	Fault tolerance	Reliability	0	4	2	1	(+)	0.5498971
Automated infrastructure	Modularity	Maintainability	0	13	2	5	+	0.0031067
Automated infrastructure	Reusability	Maintainability	0	11	2	7	+	0.0028087
Automated infrastructure	Testability	Maintainability	2	9	3	6	+	0.0996570
Automated infrastructure	Capability	Performance efficiency	0	14	1	5	+	0.0007953
Automated infrastructure	Elasticity	Performance efficiency	0	11	2	7	+	0.0028087
Automated infrastructure	Resource utilization	Performance efficiency	0	13	2	5	+	0.0031067
Automated infrastructure	Time-behaviour	Performance efficiency	0	13	2	5	+	0.0031067
Automated infrastructure	Installability	Portability	1	8	2	9	+	0.0050431
Automated infrastructure	Availability	Reliability	0	7	3	10	+	0.0005842
Automated infrastructure	Fault tolerance	Reliability	0	11	1	8	+	0.0005992
Automated restarts	Availability	Reliability	1	0	3	8	++	0.0000560
Automated restarts	Fault tolerance	Reliability	0	3	3	6	+	0.0137452
Backing service decentralization	Elasticity	Performance efficiency	0	4	1	1	(+)	0.6399177
Backing service decentralization	Availability	Reliability	0	3	1	2	(+)	0.6399177
Backing service decentralization	Fault tolerance	Reliability	0	3	0	3	(+)	0.1322016

Table 5.5: Factors with new impacts (potentially) considerable for the quality model continued
Factor	Quality Aspect	High-level Aspect	--	-	0	+	++	Impact	p-Value
Built-in autoscaling	Capability	Performance efficiency	0	0	7	3	7	+	0.0095213
Built-in autoscaling	Time-behaviour	Performance efficiency	0	0	8	4	5	+	0.0291854
Built-in autoscaling	Availability	Reliability	0	0	8	5	4	+	0.0291854
Built-in autoscaling	Fault tolerance	Reliability	0	0	8	5	4	+	0.0291854
Circuit breaked communication	Recoverability	Reliability	0	0	5	1	2	(+)	0.3182442
Cloud vendor abstraction	Interoperability	Compatibility	0	0	6	3	2	(+)	0.1908058
Cloud vendor abstraction	Reusability	Maintainability	0	0	6	1	4	+	0.0804314
Cloud vendor abstraction	Installability	Portability	0	1	3	3	4	(+)	0.2344083
Cloud vendor abstraction	Replaceability	Portability	0	0	6	1	4	+	0.0804314
Command Query Responsibility Segregation	Simplicity	Maintainability	0	4	3	0	0	-	0.0559842
Command Query Responsibility Segregation	Elasticity	Performance efficiency	0	0	4	2	1	(+)	0.5498971
Command Query Responsibility Segregation	Availability	Reliability	0	0	4	1	2	(+)	0.5498971
Command Query Responsibility Segregation	Fault tolerance	Reliability	0	0	4	2	1	(+)	0.5498971
Consistent centralized logging	Testability	Maintainability	0	0	4	0	2	(+)	0.3158436
Consistent centralized logging	Accountability	Security	0	0	3	2	1	(+)	0.6399177
Contract-based links	Adaptability	Portability	0	0	3	0	2	(+)	0.3518519
Dynamic scheduling	Testability	Maintainability	1	1	4	0	0	(-)	0.6399177
Dynamic scheduling	Capability	Performance efficiency	0	0	1	0	5	++	0.0026578
Dynamic scheduling	Elasticity	Performance efficiency	0	0	3	0	3	(+)	0.1322016
Dynamic scheduling	Time-behaviour	Performance efficiency	0	0	4	0	2	(+)	0.3158436
Dynamic scheduling	Adaptability	Portability	0	0	4	1	1	(+)	0.6399177
Health and readiness Checks	Testability	Maintainability	0	0	6	1	2	(+)	0.2329247
Health and readiness Checks	Availability	Reliability	0	0	3	1	5	+	0.0414333
Health and readiness Checks	Fault tolerance	Reliability	0	0	3	4	2	(+)	0.1325017
Horizontal data replication	Elasticity	Performance efficiency	0	0	5	0	3	(+)	0.1054955
Horizontal data replication	Availability	Reliability	0	0	2	3	3	(+)	0.1455047
Horizontal data replication	Fault tolerance	Reliability	0	0	4	2	2	(+)	0.4862826
Infrastructure abstraction	Interoperability	Compatibility	0	0	5	2	1	(+)	0.3182442
Infrastructure abstraction	Simplicity	Maintainability	0	0	4	3	1	(+)	0.2798354
Infrastructure abstraction	Installability	Portability	0	0	5	1	2	(+)	0.3182442
Infrastructure abstraction	Replaceability	Portability	0	0	4	2	2	(+)	0.4862826
Limited data scope	Analyzability	Maintainability	0	0	2	3	0	(+)	0.1923868
Limited data scope	Testability	Maintainability	0	0	3	1	1	(+)	0.8456790
Limited request trace scope	Analyzability	Maintainability	0	1	2	1	2	(+)	0.8868313
Limited request trace scope	Time-behaviour	Performance efficiency	0	0	3	3	0	(+)	0.1322016
Managed backing services	Availability	Reliability	0	1	4	2	2	(+)	0.8023548
Managed infrastructure	Co-existence	Compatibility	0	1	8	2	3	(+)	0.2434166
Managed infrastructure	Elasticity	Performance efficiency	0	0	7	4	3	+	0.0765035
Managed infrastructure	Installability	Portability	0	1	8	2	3	(+)	0.2434166
Managed infrastructure	Availability	Reliability	0	0	4	3	7	+	0.0073683
Managed infrastructure	Fault tolerance	Reliability	0	0	9	2	3	+	0.0498688
Managed infrastructure	Maturity	Reliability	0	0	8	3	3	+	0.0765035
Managed infrastructure	Recoverability	Reliability	0	0	8	3	3	+	0.0765035
Mediated communication	Time-behaviour	Performance efficiency	0	3	2	0	0	(-)	0.1923868
Mediated communication	Replaceability	Portability	0	0	3	1	1	(+)	0.8456790
Mostly stateless services	Analyzability	Maintainability	0	1	4	2	2	(+)	0.8023548
Mostly stateless services	Reusability	Maintainability	0	0	4	3	2	(+)	0.3169439
Mostly stateless services	Testability	Maintainability	0	0	2	0	7	++	0.0006255
Mostly stateless services	Resource utilization	Performance efficiency	0	0	4	2	3	(+)	0.3169439
Mostly stateless services	Recoverability	Reliability	0	0	6	1	2	(+)	0.2329247
Persistent communication	Elasticity	Performance efficiency	0	0	3	1	1	(+)	0.8456790
Persistent communication	Availability	Reliability	0	0	2	2	1	(+)	0.7222222
Persistent communication	Fault tolerance	Reliability	0	0	1	2	2	(+)	0.2695473
Physical data distribution	Resource utilization	Performance efficiency	1	4	2	0	0	(-)	0.1099966
Physical data distribution	Fault tolerance	Reliability	0	0	1	1	5	++	0.0095165
Physical data distribution	Recoverability	Reliability	0	1	3	1	2	(+)	0.9039780
Physical service distribution	Fault tolerance	Reliability	0	0	2	3	5	+	0.0232529
Physical service distribution	Maturity	Reliability	0	0	6	1	3	(+)	0.1655474
Physical service distribution	Recoverability	Reliability	0	2	3	2	3	(+)	0.6204132
Retries for safe invocations	Availability	Reliability	0	0	2	1	2	(+)	0.7222222
Retries for safe invocations	Recoverability	Reliability	0	0	3	0	2	(+)	0.3518519
Secrets stored in specialized services	Accountability	Security	0	0	4	0	2	(+)	0.3158436
Secrets stored in specialized services	Authenticity	Security	0	0	3	1	2	(+)	0.6399177
Secrets stored in specialized services	Integrity	Security	0	0	1	2	3	(+)	0.1184842
Service replication	Availability	Reliability	0	0	1	4	9	++	0.0000276
Service replication	Fault tolerance	Reliability	0	0	5	3	6	+	0.0238423
Usage of existing solutions for non-core capabilities	Interoperability	Compatibility	0	0	4	2	1	(+)	0.5498971
Usage of existing solutions for non-core capabilities	Simplicity	Maintainability	0	0	3	4	0	+	0.0559842
Use infrastructure as code	Reusability	Maintainability	0	0	9	2	5	+	0.0241678
Use infrastructure as code	Testability	Maintainability	0	0	9	3	4	+	0.0404223
Use infrastructure as code	Adaptability	Portability	0	0	8	6	2	+	0.0163453
Use infrastructure as code	Replaceability	Portability	0	0	10	3	3	+	0.0285723
Use infrastructure as code	Availability	Reliability	0	0	9	3	4	+	0.0404223
Use infrastructure as code	Fault tolerance	Reliability	0	0	12	0	4	+	0.0009318
Vertical data replication	Analyzability	Maintainability	0	3	2	0	0	(-)	0.1923868
Vertical data replication	Availability	Reliability	0	0	3	0	2	(+)	0.3518519
Vertical data replication	Fault tolerance	Reliability	0	0	3	1	1	(+)	0.8456790

6 Conclusion

The results of this survey will be used to continue the work on the quality model for cloud-native application software architectures. While parts of the quality model could be validated by the results of the survey, others need reconsideration. Our plan for future work is to apply the quality model to software architectures functioning as use cases based on which we want to further validate the elements of the quality model. The insights from this survey can guide this work by showing which factors to focus on or for example which hypotheses to formulate for experiments with such software architectures.

Although we received a lower number of submissions than we had initially hoped for, the general approach for the survey showed feasible. But the lower number of submission also means that the interpretation of the results needs to be done with caution which we did by considering the significance of results and marking results as uncertain if applicable.

Cloud-native Application Quality Model Validation Survey Report

Robin Lichtenthäler, Jonas Fritzsch