Chapter 2 A Brief Example of Discrete Choice Experiments using the support.CEs and apollo Packages

Author: Hideo Aizaki

Latest Revision: 9 March 2021 (First Release: 29 August 2019)

2.1 Before you get started

This tutorial, which is a revised version of Aizaki (2012, sec. 4.1), presents the process of discrete choice experiments (DCEs) using R (R Core Team 2021a) combained with the support.CEs (Aizaki 2012) and apollo (Hess and Palma 2019, 2021) packages. The apollo package, available from the Comprehensive R Archive Network (CRAN), enables the user to estimate various discrete choice models.

Please refer to Aizaki (2012) and Aizaki, Nakatani, and Sato (2014) for detailed explanation on the support.CEs package, and visit www.ApolloChoiceModelling.com for that on the apollo package, including the full user manual.

2.2 Outline of the example

We explain how to use the packages on the basis of a hypothetical DCE study about consumers’ preferences for an agricultural product (for details, see Aizaki (2012, sec. 4.1)). The respondents were asked to select their preferred alternative from two agricultural products and an option “none of these.” The agricultural product alternative consists of three attributes, in which each one has three levels:

Region of origin: region A, region B, and region C
Cultivation methods: conventional method, more eco-friendly method, and most eco-friendly method
Price per piece: $1, $1.1, and $1.2

The responses to the DCE questions were analyzed using a conditional/multinomial logit model with the following systematic components of utility functions:

\[\begin{align} V_1 &= asc + \beta_{Reg\_B} Reg\_B_{1} + \beta_{Reg\_C} Reg\_C_{1} + \beta_{More} More_{1} + \beta_{Most} Most_{1} + \beta_{Price} Price_{1}, \\ V_2 &= asc + \beta_{Reg\_B} Reg\_B_{2} + \beta_{Reg\_C} Reg\_C_{2} + \beta_{More} More_{2} + \beta_{Most} Most_{2} + \beta_{Price} Price_{2}, \\ V_3 &= 0, \end{align}\]

where $asc$ and $\beta$s are coefficients to be estimated; $Reg\_B$ and $Reg\_C$ are dummy-coded variables corresponding to region B and region C, respectively; $More$ and $Most$ are dummy-coded variables corresponding to more eco-friendly cultivation method and most eco-friendly cultivation method, respectively; and $Price$ is the price variable.

2.3 Loading packages

Let us load the packages needed for this tutorial into the current R session:

library(support.CEs)
library(apollo)

Note that the following line of code is needed to reproduce the results on R 3.6.0 or later:

if(getRversion() >= "3.6.0") RNGkind(sample.kind = "Rounding")

## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used

2.4 Designing choice sets

Choice sets for the study were created using the mix-and-match method. The function rotation.design() in the support.CEs package generates the choice sets according to the attributes and their levels:

des1 <- rotation.design(
   attribute.names = list(
     Region = c("Reg_A", "Reg_B", "Reg_C"),
     Eco    = c("Conv.", "More", "Most"),
     Price  = c("1", "1.1", "1.2")),
   nalternatives = 2,
   nblocks = 1,
   row.renames = FALSE,
   randomize = TRUE,
   seed = 987)
des1

## 
## Choice sets:
## alternative 1 in each choice set
##   BLOCK QES ALT Region   Eco Price
## 1     1   1   1  Reg_B  More   1.1
## 8     1   2   1  Reg_B  Most     1
## 9     1   3   1  Reg_A  Most   1.1
## 6     1   4   1  Reg_A Conv.     1
## 3     1   5   1  Reg_C  Most   1.2
## 7     1   6   1  Reg_A  More   1.2
## 5     1   7   1  Reg_C Conv.   1.1
## 4     1   8   1  Reg_B Conv.   1.2
## 2     1   9   1  Reg_C  More     1
## 
## alternative 2 in each choice set
##   BLOCK QES ALT Region   Eco Price
## 1     1   1   2  Reg_C  Most   1.2
## 5     1   2   2  Reg_A  More   1.2
## 8     1   3   2  Reg_C Conv.   1.1
## 4     1   4   2  Reg_C  More     1
## 3     1   5   2  Reg_A Conv.     1
## 9     1   6   2  Reg_B Conv.   1.2
## 2     1   7   2  Reg_A  Most   1.1
## 7     1   8   2  Reg_B  Most     1
## 6     1   9   2  Reg_B  More   1.1
## 
## Candidate design:
##   A B C
## 1 2 2 2
## 2 3 2 1
## 3 3 3 3
## 4 2 1 3
## 5 3 1 2
## 6 1 1 1
## 7 1 2 3
## 8 2 3 1
## 9 1 3 2
## class=design, type= oa 
## 
## Design information:
## number of blocks = 1 
## number of questions per block = 9 
## number of alternatives per choice set = 2 
## number of attributes per alternative = 3

The resultant nine choice sets corresponds to nine DCE questions, which were assumed to be assigned to each of the 100 respondents.

2.5 Preparing a dataset for analysis

The responses to the DCE questions are stored in the dataset syn.res1 in the support.CEs package:

data(syn.res1, package = "support.CEs")
dim(syn.res1)

## [1] 100  12

head(syn.res1)

##   ID BLOCK q1 q2 q3 q4 q5 q6 q7 q8 q9 F
## 1  1     1  2  1  2  2  2  1  2  3  1 0
## 2  2     1  1  1  1  1  3  1  3  2  2 1
## 3  3     1  1  2  1  2  2  3  3  3  1 0
## 4  4     1  3  3  1  2  1  1  2  2  1 1
## 5  5     1  3  3  1  1  1  3  2  3  1 1
## 6  6     1  3  1  1  1  3  1  3  3  3 0

The dataset contains a variable ID (the respondent identification number), a variable BLOCK (the block number of choice sets), variables q1 to q9 (the responses to nine DCE questions), and a variable F (the respondent’s gender: 1 for female, 0 otherwise).

The choice sets were converted into a design matrix, which was combined with the dataset to create a dataset for analysis. The function make.design.matrix() is used to create the design matrix from the choice sets, while the function make.dataset() is used to merge the dataset with the design matrix:

desmat1 <- make.design.matrix(
   choice.experiment.design = des1,
   categorical.attributes = c("Region", "Eco"),
   continuous.attributes = c("Price"),
   optout = TRUE,
   unlabeled = TRUE)
dataset1 <- make.dataset(
   respondent.dataset = syn.res1,
   choice.indicators = c("q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9"),
   design.matrix = desmat1)

The format of the resultant dataset dataset1, in which a row corresponds to an alternative, differs from that for the apollo package, in which a row corresponds to a choice set. Therefore, datasete1 needs to be converted into a dataset with a suitable format for apollo, using the function reshape() in the stats package:

dataset1r <- reshape(
   dataset1, timevar = "ALT", idvar = "STR",
   v.names = c("RES", "ASC", "Reg_B", "Reg_C", "More", "Most", "Price"),
   drop = "F",
   direction = "wide")

Further, we have to create a variable indicating the responses to the DCE questions that are in an appropriate format for apollo. The following R code chunk creates a variable choice:

dataset1r$choice <- dataset1r$RES.1 * 1 + dataset1r$RES.2 * 2 + dataset1r$RES.3 * 3
head(dataset1r)

##    ID BLOCK QES STR RES.1 ASC.1 Reg_B.1 Reg_C.1 More.1 Most.1 Price.1 RES.2
## 1   1     1   1 101 FALSE     1       1       0      1      0     1.1  TRUE
## 4   1     1   2 102  TRUE     1       1       0      0      1     1.0 FALSE
## 7   1     1   3 103 FALSE     1       0       0      0      1     1.1  TRUE
## 10  1     1   4 104 FALSE     1       0       0      0      0     1.0  TRUE
## 13  1     1   5 105 FALSE     1       0       1      0      1     1.2  TRUE
## 16  1     1   6 106  TRUE     1       0       0      1      0     1.2 FALSE
##    ASC.2 Reg_B.2 Reg_C.2 More.2 Most.2 Price.2 RES.3 ASC.3 Reg_B.3 Reg_C.3
## 1      1       0       1      0      1     1.2 FALSE     0       0       0
## 4      1       0       0      1      0     1.2 FALSE     0       0       0
## 7      1       0       1      0      0     1.1 FALSE     0       0       0
## 10     1       0       1      1      0     1.0 FALSE     0       0       0
## 13     1       0       0      0      0     1.0 FALSE     0       0       0
## 16     1       1       0      0      0     1.2 FALSE     0       0       0
##    More.3 Most.3 Price.3 choice
## 1       0      0       0      2
## 4       0      0       0      1
## 7       0      0       0      2
## 10      0      0       0      2
## 13      0      0       0      2
## 16      0      0       0      1

2.6 Preparing an analysis

Before explaining the R code of a conditional logit model analysis using the apollo package, we will outline the analysis scheme using apollo according to Hess and Palma (2019) and the vignette of apollo.

Figure 2.1: A flow for estimating a conditional logit model using apollo

Figure 2.1 shows a flow for implementing a discrete choice model analysis using apollo in a simple case (i.e., a conditional logit model). In the flow, rectangles indicate R objects (i.e., a vector, data frame, and list), while ovals indicate R functions. Solid ovals and rectangles correspond to functions in apollo and objects created by these functions, respectively. Dashed oval and rectangles refer to function and objects that the user is required to create. The core function apollo_estimate() is used to estimate discrete choice models and it requires four R object inputs: apollo_beta, apollo_fixed, apollo_inputs, and apollo_probabilities(). The apollo_inputs object is created using the function apollo_validateInputs(), whose inputs are apollo_beta, apollo_fixed, apollo_control, and database.

The apollo_beta is a named vector whose elements are named after parameters to be estimated, and whose values correspond to the starting values. The apollo_fixed is a character vector containing parameter names (similar to those used in apollo_beta) that are fixed when estimating the model. If the model does not contain any fixed parameters, NULL is assigned to apollo_fixed. The apollo_control is a list object containing the main conditions for the estimation. The database is a data frame containing variables used in the estimation. The apollo_probabilities() is a user-defined function, in which probabilities of selecting alternatives are calculated according to the discrete choice model defined by the user.

Although the apollo package has various functions for the output, which is generated from apollo_estimate(), we will use only two: apollo_modelOutput() and apollo_deltaMethod(). The former one summarizes the output, while the latter one implements the delta method. In the example below, the apollo_deltaMethod() will be employed to calculate the marginal willingness to pays (MWTPs) for non-monetary attributes and/or their levels.

Let us now go back to the explanation of the R code chunks for estimating the model. The first task for preparing the analysis is to initialize the current session:

apollo_initialise()

The second task is to prepare the inputs, which are apollo_control, database, apollo_beta, and apollo_fixed, for the function apollo_validateInputs(). The list object apollo_control consists of three elements: the name of the model (modelName), its description (modelDescr), and the name of the respondent identification number variable (indivID):

apollo_control <- list(
   modelName = "CL_apollo",
   modelDescr = "Consumers' preferences for an agricultural product",
   indivID = "ID"
)

Although our dataset dataset1r has already been created, it is stored into another object database whose name is mandatory:

database <- dataset1r

According to the systematic components of the utility functions, the parameter names and starting values of the named vector apollo_beta are set as:

apollo_beta <- c(
   asc     = 0,
   b_RegB  = 0,
   b_RegC  = 0,
   b_More  = 0,
   b_Most  = 0,
   b_Price = 0
)

There are no parameters with fixed values in our model settings, therefore, the apollo_fixed vector is set to NULL:

apollo_fixed <- c()

These inputs are validated and, afterwards, merged into the list object apollo_inputs using the function apollo_validateInputs():

apollo_inputs <- apollo_validateInputs()

## Several observations per individual detected based on the value of ID.
##   Setting panelData in apollo_control set to TRUE.
## All checks on apollo_control completed.
## All checks on database completed.

The final task for the preparation is to define the function apollo_probabilities(). The following R code chunk is responsible for this definition, according to the model specification:

apollo_probabilities <- function(apollo_beta, apollo_inputs, functionality = "estimate"){
   apollo_attach(apollo_beta, apollo_inputs)
   on.exit(apollo_detach(apollo_beta, apollo_inputs))

   P <- list()

   V <- list()
   V[['alt1']] = asc + b_RegB*Reg_B.1 + b_RegC*Reg_C.1 + b_More*More.1 + b_Most*Most.1 +
                 b_Price*Price.1
   V[['alt2']] = asc + b_RegB*Reg_B.2 + b_RegC*Reg_C.2 + b_More*More.2 + b_Most*Most.2 +
                 b_Price*Price.2
   V[['alt3']] = 0

   mnl_settings <- list(
     alternatives = c(alt1 = 1, alt2 = 2, alt3 = 3),
     avail        = list(alt1 = 1, alt2 = 1, alt3 = 1),
     choiceVar    = choice,
     V            = V
   )

   P[['model']] <- apollo_mnl(mnl_settings, functionality)
   P <- apollo_panelProd(P, apollo_inputs, functionality)
   P <- apollo_prepareProb(P, apollo_inputs, functionality)

   return(P)
}

Refer to Hess and Palma (2019) for details about the components in apollo_probabilities().

2.7 Conducting the analysis using apollo

The conditional logit model is estimated with the function apollo_estimate(). The results are subsequently stored into an object model in list format:

model <- apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs,
                         estimate_settings = list(writeIter = FALSE))

## Preparing user-defined functions.
## 
## Testing likelihood function...
## 
## Overview of choices for MNL model component :
##                                    alt1   alt2   alt3
## Times available                  900.00 900.00 900.00
## Times chosen                     260.00 275.00 365.00
## Percentage chosen overall         28.89  30.56  40.56
## Percentage chosen when available  28.89  30.56  40.56
## 
## 
## Pre-processing likelihood function...
## 
## Testing influence of parameters
## Starting main estimation
## 
## BGW using analytic model derivatives supplied by caller...
## 
## 
##     it    nf     F            RELDF    PRELDF    RELDX    MODEL stppar
##      0     1 9.887510598e+02
##      1     4 9.544575685e+02 3.468e-02 3.708e-02 1.00e+00   G   5.18e-01
##      2     5 9.408575194e+02 1.425e-02 1.657e-02 6.32e-01   G   4.14e-02
##      3     7 9.253435532e+02 1.649e-02 1.629e-02 6.02e-01   S   3.10e-03
##      4     8 9.220095593e+02 3.603e-03 3.595e-03 2.18e-01   S   0.00e+00
##      5     9 9.220053970e+02 4.514e-06 4.320e-06 2.34e-03   S   0.00e+00
##      6    10 9.220053801e+02 1.835e-08 1.810e-08 3.41e-05   G   0.00e+00
##      7    11 9.220053801e+02 1.986e-11 1.987e-11 7.31e-06   S   0.00e+00
## 
## ***** Relative function convergence *****
## 
## Estimated parameters with approximate standard errors from BHHH matrix:
##            Estimate     BHHH se BHH t-ratio (0)
## asc          4.6653      0.7304           6.388
## b_RegB      -0.5951      0.1310          -4.544
## b_RegC      -0.3518      0.1158          -3.039
## b_More       0.5818      0.1365           4.263
## b_Most       0.8064      0.1252           6.439
## b_Price     -4.7505      0.6634          -7.161
## 
## Final LL: -922.0054
## 
## Calculating log-likelihood at equal shares (LL(0)) for applicable
##   models...
## Calculating log-likelihood at observed shares from estimation data
##   (LL(c)) for applicable models...
## Calculating LL of each model component...
## Calculating other model fit measures
## Computing covariance matrix using numerical jacobian of analytical
##   gradient.
##  0%....25%....50%....75%.100%
## Negative definite Hessian with maximum eigenvalue: -1.02264
## Computing score matrix...
## 
## Your model was estimated using the BGW algorithm. Please acknowledge
##   this by citing Bunch et al. (1993) - DOI 10.1145/151271.151279

The summary of the results can be shown on your R console using the function apollo_modelOutput():

apollo_modelOutput(model, list(printPVal = TRUE))

## Model run by R using Apollo 0.3.4 on R 4.4.2 for Windows.
## Please acknowledge the use of Apollo by citing Hess & Palma (2019)
##   DOI 10.1016/j.jocm.2019.100170
##   www.ApolloChoiceModelling.com
## 
## Model name                                  : CL_apollo
## Model description                           : Consumers' preferences for an agricultural product
## Model run at                                : 2025-03-31 12:58:53.131365
## Estimation method                           : bgw
## Model diagnosis                             : Relative function convergence
## Optimisation diagnosis                      : Maximum found
##      hessian properties                     : Negative definite
##      maximum eigenvalue                     : -1.02264
##      reciprocal of condition number         : 0.00176474
## Number of individuals                       : 100
## Number of rows in database                  : 900
## Number of modelled outcomes                 : 900
## 
## Number of cores used                        :  1 
## Model without mixing
## 
## LL(start)                                   : -988.75
## LL at equal shares, LL(0)                   : -988.75
## LL at observed shares, LL(C)                : -978.3
## LL(final)                                   : -922.01
## Rho-squared vs equal shares                  :  0.0675 
## Adj.Rho-squared vs equal shares              :  0.0614 
## Rho-squared vs observed shares               :  0.0575 
## Adj.Rho-squared vs observed shares           :  0.0535 
## AIC                                         :  1856.01 
## BIC                                         :  1884.83 
## 
## Estimated parameters                        : 6
## Time taken (hh:mm:ss)                       :  00:00:0.41 
##      pre-estimation                         :  00:00:0.32 
##      estimation                             :  00:00:0.04 
##      post-estimation                        :  00:00:0.05 
## Iterations                                  :  7  
## 
## Unconstrained optimisation.
## 
## Estimates:
##            Estimate        s.e.   t.rat.(0)  p(1-sided)    Rob.s.e.
## asc          4.6653      0.7330       6.365   9.775e-11      0.6729
## b_RegB      -0.5951      0.1317      -4.520   3.090e-06      0.1288
## b_RegC      -0.3518      0.1145      -3.073    0.001059      0.1136
## b_More       0.5818      0.1379       4.220   1.223e-05      0.1349
## b_Most       0.8064      0.1246       6.471   4.860e-11      0.1229
## b_Price     -4.7505      0.6680      -7.111   5.755e-13      0.6042
##         Rob.t.rat.(0)  p(1-sided)
## asc             6.934   2.051e-12
## b_RegB         -4.622   1.897e-06
## b_RegC         -3.096  9.7942e-04
## b_More          4.313   8.050e-06
## b_Most          6.561   2.668e-11
## b_Price        -7.862   1.887e-15

All coefficients are significantly different from 0 at a 1% level. Negative values of estimates regarding regions B and C indicate that the consumer valuations of regions B and C are relatively lower than that of region A. Positive values of estimates regarding more and most eco-friendly cultivation methods indicates that consumers prefer these methods over the conventional one.

To understand difference in preference among the levels of the attributes clearly, MWTPs for non-monetary attribute levels are calculated using the function apollo_deltaMethod(). In the specification of the model, a MWTP for a non-monetary attribute level $k$ is defined as the negative value of the ratio of the parametr for the level $k$ and that for the price variable. The following R code chunk calculates MWTPs for the non-monetary attribute levels:

for (i in c("b_RegB", "b_RegC", "b_More", "b_Most")) {
   deltaMethod_settings <- list(operation = "ratio", parName1 = i, parName2 = "b_Price",
                                multPar1 = -1)
   apollo_deltaMethod(model, deltaMethod_settings)
}

## 
## Running Delta method computations using robust standard errors
##                                                    Value    s.e. t-ratio (0)
## Ratio of b_RegB (multiplied by -1) and b_Price:  -0.1253 0.02775      -4.514
## 
## INFORMATION: The results of the Delta method calculations are returned invisibly as
##   an output from this function. Calling the function via
##   result=apollo_deltaMethod(...) will save this output in an object
##   called result (or otherwise named object). 
## 
## Running Delta method computations using robust standard errors
##                                                     Value    s.e. t-ratio (0)
## Ratio of b_RegC (multiplied by -1) and b_Price:  -0.07406 0.02543      -2.912
## 
## INFORMATION: The results of the Delta method calculations are returned invisibly as
##   an output from this function. Calling the function via
##   result=apollo_deltaMethod(...) will save this output in an object
##   called result (or otherwise named object). 
## 
## Running Delta method computations using robust standard errors
##                                                   Value    s.e. t-ratio (0)
## Ratio of b_More (multiplied by -1) and b_Price:  0.1225 0.02991       4.095
## 
## INFORMATION: The results of the Delta method calculations are returned invisibly as
##   an output from this function. Calling the function via
##   result=apollo_deltaMethod(...) will save this output in an object
##   called result (or otherwise named object). 
## 
## Running Delta method computations using robust standard errors
##                                                   Value    s.e. t-ratio (0)
## Ratio of b_Most (multiplied by -1) and b_Price:  0.1698 0.03484       4.873
## 
## INFORMATION: The results of the Delta method calculations are returned invisibly as
##   an output from this function. Calling the function via
##   result=apollo_deltaMethod(...) will save this output in an object
##   called result (or otherwise named object).

The outputs show that all MWTPs are significantly different from 0 at a 1% level. Compared with region A, the MWTPs for regions B and C are $-0.1253$ and $-0.0741$, respectively. Compared with the conventional cultivation method, the MWTPs for more and most eco-friendly cultivation methods are $0.1225$ and $0.1698$, respectively.

References

Aizaki, H. 2012. “Basic Functions for Supporting an Implementation of Choice Experiments in .” Journal of Statistical Software, Code Snippets 50 (2): 1–24. https://doi.org/10.18637/jss.v050.c02.

Aizaki, H., T. Nakatani, and K. Sato. 2014. Stated Preference Methods Using r. Florida, USA: CRC Press. https://doi.org/10.1201/b17292.

Hess, S., and D. Palma. 2019. “Apollo: A Flexible, Powerful and Customisable Freeware Package for Choice Model Estimation and Application.” Journal of Choice Modelling 32: 100170. https://doi.org/10.1016/j.jocm.2019.100170.

———. 2021. : A Flexible, Powerful and Customisable Freeware Package for Choice Model Estimation and Application. Choice Modelling Centre. http://www.ApolloChoiceModelling.com.

R Core Team. 2021a. : A Language and Environment for Statistical Computing. Vienna, Austria: Foundation for Statistical Computing. https://www.R-project.org/.