The starting point for estimating dynamic discrete choice models is John Rust’s nested fixed point algorithm. This page provides a programmatic overview of the linear model in his paper “Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher,” although it can be adapted to other specifications easily enough. This page assumes that you are familiar with the technical aspects of the estimation already, and only discusses the programmatic aspects. The estimation technique has been simplified in a number of ways to ease exposition. First, I only use the contraction mapping to find the fixed point in the dynamic program (rather than Newton’s Method). Second, I use an optimization package rather than program the BHHH algorithm myself. If this means nothing to you it does not matter. The rest of the material is self contained, it just doesn’t follow Rust’s paper step by step. My primary source for this program is Rust’s original paper along with its online documentation. I also found Holger Sieg’s slides_io_ddc helpful.

The optimization package that I use is Optim. This is the only dependency that you will need to download. You can download it by removing the comment and running “Pkg.add()”. If it is already downloaded you can just run this next line.

using Optim using ExcelReaders

It helps to start off with organized code. In Julia, you have the option to create user defined types which provide a convenient way to store variables and keep your code general. For instance, if you

Foresight helps when determining what should be in your user-defined type. Because of additive separability and conditional independence, we know that Rust’s model is completely characterized by the expected value function

The unknown components in this equation are , (which will determined the value of ), , and . Remember that the nested fixed point algorithm depends on a discrete state space. Rust takes the continuous value of “accumulated mileage” and assigns each value to a bin, e.g. , , etc. Over a discrete state space, any function can be represented by a vector or a matrix. For instance, assume we want to represent the function for the values . We could either program function that takes a single input and outputs or we could define the vector , where each value of this vector corresponds to of its index position. It might not not seem helpful to define a function in the second way, however it is far more convenient when you don’t know the functional form of the function you are representing. Naturally won’t have a functional form, but it will still be representable by a vector.

With these considerations in mind, I define the type RustModel and have it store values for , , , and .

type RustModel beta::Float64 # Discount Factor params::Array{Float64} # Utility Function Parameters pi::Array{Float64} # Transition Probabilities EV::Array{Float64} # Expected Value Array K::Int32 # Size of State Space # Initialize Model Parameters function RustModel( beta = .9999, K = 90 ) EV = ones(K, 1) pi = [.348, .639, .013] params = [3.6, 10] return new( beta, params, pi, EV, K ) end end; m = RustModel() println("Beta is ", m.beta) println("Guesses for theta are ", m.params) println("Transition probability guesses are ", m.pi) println("Size of State Space: ", m.K)

Beta is 0.9999 Guesses for theta are [3.6, 10.0] Transition probability guesses are [0.348, 0.639, 0.013] Size of State Space: 90

I similarly define a type for the data. The two variables will be the endogenous decision variable and the exogenous state variable

type RustData endog::Array{Int32} exog::Array{Int32} function RustData( endog = [], exog = []) return new( endog, exog ) end end

#### Per Period Returns Function

The return function is given by the following discrete linear form

and

where is the replacement cost of an engine, and is the current state variable. is the decision at time $t$. Again, the utility function is translated into a by matrix, where the first row is the utility from in each state and the second row is the utility from in each state. Since utility is equal to when the second row is a constant vector.

The function takes a RustModel variable as the input and returns a by matrix (where is the size of the state space). The parameters that are used from the RustModel are and (called params). The first entry in params is and the second entry is .

function u(model::RustModel) S = collect(1:model.K)' # Generate State Space range, i.e. [1, 2, 3, 4, ...] d_0 = .001 * model.params[1] * S # Utility from not replacing d_1 = model.params[2] * ones(1, model.K) # Utility from replacing U = vcat(d_0, d_1) # Utility matrix return -U end u(m)

2×90 Array{Float64,2}: -0.0036 -0.0072 -0.0108 -0.0144 … -0.3168 -0.3204 -0.324 -10.0 -10.0 -10.0 -10.0 -10.0 -10.0 -10.0

#### Transition Probabilities

The transition matrix is assumed to have a simple form. Given any state today, the probability of staying in that state is , the probability of moving forward one state is and the probability of moving forward two states is . Because the state space is discrete, there is a maximum state that can be attained. This state, by necessity, becomes an absorbing state, i.e. if we ever arrive at this state we will be stuck in it with probability . The transition matrix becomes

Note that we can (and do) estimate , , and via maximum likelihood. Naturally, the maximum likelihood estimator is etc. These are the probabilities that will be used when estimating the model. The following program creates the transition matrix from the probabilities given in the RustModel. Note that the number of probabilities to be estimated depends on the discretization of the state space. More bins means more probabilities to be estimated. Also, a computational sidenote, the majority of entries in this matrix are , so a sparse matrix leads to efficieny gains in this instance. The sparse matrix is not implemented until the contraction mapping function below for the sake of illustrative purposes.

The function transition_probs takes a RustModel variable and returns a by transition matrix. The variables used from the RustModel variable are and .

function transition_probs(model::RustModel) t = length(model.pi) ttmp = zeros(model.K - t, model.K) # Transition Probabilities for i in 1:model.K - t for j in 0:t-1 ttmp[i, i + j] = m.pi[j+1] end end atmp = zeros(t,t) # Absorbing State Probabilities for i in 0:t - 1 atmp[i+ 1,:] = [zeros(1,i) m.pi[1:t - i - 1]' ( 1 - sum(m.pi[1:t- i - 1]) ) ] end return [ttmp ; zeros(t, model.K - t) atmp] end; transition_probs( m )

90×90 Array{Float64,2}: 0.348 0.639 0.013 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.348 0.639 0.013 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.348 0.639 0.013 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.348 0.639 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.348 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ⋮ ⋱ ⋮ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.013 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.639 0.013 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.348 0.639 0.013 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.348 0.639 0.013 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.348 0.639 0.013 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.348 0.652 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0

#### Social Surplus Function

Remember that the expected value function takes the form

where is the social surplus function. I show in the notes that the social surplus function has the form

Note that . So the expected value of replacing a bus engine is the the same as not replacing a bus engine that has zero total mileage. The function is therefore a function of one variable, , not two. We can therefore treat it as a by vector.

The program ss takes a RustModel variable and returns a by vector. The variables from the RustModel that are used are and the vector. The function also uses the previously defined function . Note that there are issues of overflow here. Because is chosen to be large, we generate large negative values for . Because a computer has limited precision, this can result in undefined results. For instance, will result in for many calculators (and Google). Therefore results in -Inf, instead of and the estimation will stop. Because the social surplus function takes logs of the final result, we can subtract the current value of before exponentiation and then add it back on after taking logs. This will be numerically identical to the correct number and fixes the overflow issue, e.g. .

function ss(model::RustModel) ss_val = ( exp( u( model )[1,:] + model.beta * model.EV - model.EV) + exp( u( model )[2,:] + model.beta * model.EV[1] - model.EV) ) return model.EV + log( ss_val ) end;

#### Contraction Mapping

With the social surplus vector (i.e. the vector of the functions values for each state) and the transition probabilities we can construct the expected value vector

The contraction mapping uses an initial guess for and calculates . It continues this iteration until the maximum difference between and is sufficiently small (you can choose whatever tolerance you would like, although the smaller the tolerance, the longer it takes to converge).

This function takes a RustModel variable and doesn’t return anything. It merely updates the value of for the RustModel it was provided with the fixed point it calculates.

function contraction_mapping( model::RustModel ) P = sparse( transition_probs( model ) ) # Transition Matrix (K x K) eps = 1 # Set epsilon to something greater than 0 while eps > .000001 EV1 = P * ss( model ) eps = maximum(abs(EV1 - model.EV)) model.EV = EV1 end end; m.EV = ones(1, m.K)' contraction_mapping( m ) println(m.EV)

[-1718.29; -1718.54; -1718.78; -1719.02; -1719.25; -1719.48; -1719.71; -1719.92; -1720.14; -1720.34; -1720.54; -1720.74; -1720.93; -1721.12; -1721.3; -1721.47; -1721.65; -1721.81; -1721.97; -1722.13; -1722.28; -1722.42; -1722.57; -1722.7; -1722.84; -1722.96; -1723.09; -1723.21; -1723.32; -1723.43; -1723.54; -1723.64; -1723.74; -1723.84; -1723.93; -1724.02; -1724.11; -1724.19; -1724.27; -1724.35; -1724.42; -1724.49; -1724.56; -1724.63; -1724.69; -1724.76; -1724.82; -1724.87; -1724.93; -1724.98; -1725.04; -1725.09; -1725.14; -1725.18; -1725.23; -1725.27; -1725.32; -1725.36; -1725.4; -1725.44; -1725.47; -1725.51; -1725.55; -1725.58; -1725.62; -1725.65; -1725.68; -1725.71; -1725.74; -1725.77; -1725.8; -1725.83; -1725.85; -1725.88; -1725.91; -1725.93; -1725.95; -1725.98; -1726.0; -1726.02; -1726.04; -1726.06; -1726.08; -1726.1; -1726.11; -1726.13; -1726.14; -1726.15; -1726.15; -1726.15]

#### Choice Probabilities

Choice probabilities can be derived from the social surplus function. The expression for these probabilities is

where .

The following function returns a choice probability vector of the form

Again, overflow is an issue, so the maximum value of is subtracted from the exponents in the numerator and denominator. This is equivalent to multiplying through by and resolves the overflow.

function choice_p( model::RustModel ) max_EV = maximum( model.EV ) P_k = exp( u( model )[1,:] + model.beta * model.EV - max_EV ) ./ ( exp( u( model )[1, :] + model.beta * model.EV - max_EV ) + exp( u( model )[2,:] + model.beta * model.EV[1] - max_EV ) ) return P_k end;

#### Partial Log-Likelihood

The likelihood function, given the choice probabilities and the transition probabilities, is given by

A simpler formulation, which still provides consistent estimates, is the partial log-likelihood function

First we need to limit the data to the relevant probabilities given the current state. After taking the log of these probabilities, we then need to select the log-probability of the agent’s choice and sum over these. This will be the log-likelihood.

function partialLL( model::RustModel, data::RustData) decision_obs = data.endog state_obs = data.exog cp_tmp = choice_p( model ) relevant_probs = [ cp_tmp[convert(Int, i)] for i in state_obs ] pll = [ if decision == 0 log(r_p) else log(1 - r_p) end for (decision, r_p) in zip(decision_obs, relevant_probs)] return -sum(pll) end function ll( model::RustModel, data::RustData ) function objFunc( params ) model.params = params contraction_mapping( model ) pll = partialLL( model, data ) return pll end params0 = [.01, 4] optimum = optimize( objFunc, params0 ) return optimum end

#### John Rust’s Bus Engine Replacement Data

This section just runs code to clean Rust’s original datasets. It can largely be ignored. Just note that the omax and n variables below need to match your choice of K in the RustModel type. The data has been compiled from files that can be found on the companion website to Aguirregabiria and Mira’s paper “Dynamic discrete choice structural models: A survey” (see here).

using DataFrames d309 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "d309", header = false); g870 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "g870", header = false); rt50 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "rt50", header = false); t8h203 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "t8h203", header = false); a452372 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "a452372", header = false); a452374 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "a452374", header = false); a530872 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "a530872", header = false); a530874 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "a530874", header = false); a530875 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "a530875", header = false);

Each column of the matrix corresponds to an individual bus. The first 11 rows correspond to header information. Respectively,

1. Bus Number 2. Month Purchased 3. Year Purchased 4. Month of 1st Engine Replacement 5. Year of 1st Engine Replacement 6. Odometer at Replacement 7. Month of 2nd Engine Replacement 8. Year of 2nd Engine Replacement 9. Odometer at Replacement 10. Month Odometer Data Begins 11. Year Odometer Data Begins

# Reshape to Proper Matrix Format d309 = reshape(d309[1], 110, 4); g870 = reshape(g870[1], 36, 15); rt50 = reshape(rt50[1], 60, 4); t8h203 = reshape(t8h203[1], 81, 48); a452372 = reshape(a452372[1], 137, 18); a452374 = reshape(a452374[1], 137, 10); a530872 = reshape(a530872[1], 137, 18); a530874 = reshape(a530874[1], 137, 12); a530875 = reshape(a530875[1], 128, 37); n = 90; # Fixed point dimension omax = 450000; # Upper bound on the odometer reading rt=zeros(n,1); nt=copy(rt); rc=copy(rt); nc=copy(rt); milecnt=zeros(n,1); function build(input) dt = copy(input) # Get odometer values at replacement ov1=dt[6,:]' ov2=dt[9,:]' # Get dimensions of the underlying data nr = size(dt, 1); nc = size(dt, 2); # dtc= (dt[12:nr,:] .>= ov1) .* (ov1 .> 0) + (dt[12:nr,:] .>= ov2) .* (ov2 .> 0); dtx = dt[12:nr,:] + ov1 .* dtc .* (dtc-2) - .5*ov2.*dtc.*(dtc-1); dtx = ceil(n*dtx/omax); # dtc=vcat((dtc[2:nr-11,:]-dtc[1:nr-12,:]), zeros(1, nc)); mil=(dtx[2:nr-11,:]-dtx[1:nr-12,:]) + dtx[1:nr-12,:].*dtc[1:nr-12,:]; df = DataFrame(dtc = copy(vec(dtc[2:nr-11,:])), dtx = copy(vec(dtx[2:nr-11,:])), mil = copy(vec(mil[:,:]))); return df end inputDataset = build(g870); for i in [rt50, t8h203, a530875 ] inputDataset = vcat(inputDataset, build(i)) end

#### Estimating the Rust Model

m = RustModel() rd = RustData() rd.endog = inputDataset[1] rd.exog = inputDataset[2] pi_1 = sum( [x == 0 for x in inputDataset[3] ] )/length(inputDataset[3]) pi_2 = sum( [x == 1 for x in inputDataset[3] ] )/length(inputDataset[3]) pi_3 = 1 - pi_1 - pi_2 m.pi = [ pi_1, pi_2, pi_3 ] @time ll( m, rd )

218.624701 seconds (2.45 G allocations: 217.564 GB, 8.66% gc time) Results of Optimization Algorithm * Algorithm: Nelder-Mead * Starting Point: [0.01,4.0] * Minimizer: [2.6274875140792378,9.758217072326453] * Minimum: 3.002501e+02 * Iterations: 1000 * Convergence: false * √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: false * Reached Maximum Number of Iterations: true * Objective Function Calls: 1017

#### Calculating Standard Errors

The derivation of the standard errors depends on your choice of , so I will only derive them for the linear case. The variance of a maximum likelihood estimator is the inverse of the information matrix, where the information matrix is given by

We must therefore compute this matrix in order to get our standard errors. Actually, we can make a simplification by noting that the information identity requires the covariance of the scores is equal to the negative of the Hessian at the true parameter values (see Train 2009 for a derivation of this property). The observations score is the derivative of the log choice probability and given our functional form above, it is straightforward to see that

(remember that ). For both it is easy to derive , so it remains to calculate . We can calculate this derivative using the implicit function theorem, which states that if then there exists such that

In our case the operator is (because the fixed point requires ). Rust calls the derivative of $latex(I – T)$ with respect to , and I follow his notation. Applying the theorem shows that

where is the partial derivative with respect to . We have an expression for and so we find analytic expressions for each piece and combine them. Remember that and in matrix notation we can write that as

Let . Then the derivative of this expression with respect to is given by

If you stare at the above matrix you will notice that the rows sum to , i.e. it is a transition probability matrix. This makes computing it easier, as columns through are just the element-wise product of the transition probability matrix and the choice probability vector. The first column is then just one minus the sum of the other columns.

The final piece we need is , which is straightforward to compute. Remember that so has components

Combining these expressions into a single matrix we can calculate the score for each observation and calculate their covariance to estimate the negative Hessian. The square-root of the diagonal elements of the inverse of the Hessian are our standard errors.

m.params = ot.minimizer tmpP = choice_p(m) byOb = tmpP[rd.exog,:] T = length(rd.exog) tmpT2 = transition_probs(m)[:,2:end] .* tmpP[2:end,:]' dR = -(1-transition_probs(m) * tmpP) dTheta = -(transition_probs(m)*(1:1:90)*-.001) .* tmpP dEV = inv(eye(m.K) - m.beta * hcat(1 - sum(tmpT2,2), tmpT2)) * hcat(dTheta, dR) # Derivative of utility with respect to parameters tmp = -(1 - byOb .* (rd.endog .== 0) .- (1-byOb) .* (rd.endog .== 1)) score = hcat( -.001*rd.exog, -ones(T, 1)).*tmp # Add the derivative of the difference in Expected Value score .+= m.beta*(dEV[1,:]' .* ones(T,1) .- dEV[rd.exog, :]) .*tmp # Calculate inverse of the covariance H = inv(score'score) se = sqrt(diag(H)) hcat( ot.minimizer, se)

2×2 Array{Float64,2}: 2.62749 0.616073 9.75822 1.22672

When replacement is chosen, why do you take only the first element of EV (EV[1])?

LikeLike

When the agent chooses to replace the engine, the state is reset to 0 and the expected value of being in state 0 is given by the first element of EV. So the agent is chooses between two values when in state x: u(0,x) + beta*EV(0, x) (the value of not replacing), and u(1,0) + beta*EV(1,0) (the value of replacing). Because EV(1,x) = EV(0, 0), we can represent EV by a single vector (rather than a vector for each choice), and just use the first value of EV whenever the decision to replace is made.

LikeLike

Thank you very much

LikeLike

Can I make you another question? In order to sample the first state, we usually use a stationary distribution, to which we arrive through the transition probabilities. However, in this case, the stationary probabilities are just (0, 0, …, 0, 1). So, when you simulate, do you just assume that the realized states start from x0?

LikeLike

I think it makes more sense to start from the stationary distribution of the controlled process. The stationary probabilities are (0,0,…,1) because engine mileage always increases for the uncontrolled process, and so the stationary distribution will put all the mass on the upper bound. But if you consider the transitions of the controlled process, the process is almost never at the upper bound. For instance

# Choice probabilities

cp = 1 – choice_p(m);

# Controlled system transition matrix

Π = transition_probs( m ) .* (1-pps) + hcat(ones(500,1), zeros(500, 499)) .* pps;

# Stationary Distribution

statdst = inv((eye(m.K) – Π + ones(m.K,m.K))’)*ones(m.K)

The expected state of this system is ~ 121. However, for many of my applications, I simulate from the last observed state because I am studying how the state evolves from that point. So where I start the simulation typically depends on the problem I’m studying. Hope this answers your questions.

LikeLike

Thank you very much, Mark. I assume that, by pps, you mean cp, and that hcat(ones(500,1), zeros(500, 499)) should be hcat(ones(90,1), zeros(90, 89)), right?

LikeLike

Yep, sorry about that. It should be:

Π = transition_probs( m ) .* (1-cp) + hcat(ones(m.K), zeros(m.K, m.K-1)) .* cp;

Thanks for catching that.

LikeLike

Dear Mark,

hi again. In line 10 of your last code chunk, you have an element-by-element product by the choice probabilities, as Rust does in his manual. Why isn’t it -0.001*transition_probs(m)*(tmpP.*(1:1:90))? This which would result in a vector with the following lines:

p(0|x1)*x1*f(x1|x1,0) + p(0|x2)*x2*f(x2|x1,0) + p(0|x3)*x3*f(x3|x1,0)

p(0|x2)*x2*f(x2|x2,0) + p(0|x3)*x3*f(x3|x2,0) + p(0|x4)*x4*f(x4|x2,0)

.

.

.

Instead, in your and Rust’s code, the vector is

p(0|x1) * [ x1*f(x1|x1,0) + x2*f(x2|x1,0) + x3*f(x3|x1,0) ]

p(0|x2)* [ x2*f(x2|x2,0) + x3*f(x3|x2,0) + x4*f(x4|x2,0) ]

.

.

.

In line 9, I agree with dR, where you have a linear combination of the columns of the transition matrix weighted by the choice probabilities.

Thank you,

Eliza

LikeLike

The code (or the equations) could be wrong. It’s been awhile since I looked at it. Have you tried taking the numerical derivative of the contraction mapping and comparing that result to the analytic derivative? That would be an easy way to check which is correct.

LikeLike

Hello –

Is the code associated with this post available as a direct download somewhere?

Thanks for putting this together.

Matt

LikeLike

Dear Mark.

Thank you for your code sharing.

Except for some minor changes in julia syntax (mostly on elementwise calculation in array), it worked quite well in general.

However, the last section on standard errors is problematic for me.

First, in line 18 for the section, doesn’t one need to multiply *m.beta* given form of v=u+\beta*EV?

Second, I ran the code with just simple changes for elementwise syntax, but it returns much smaller standard errors compared to the original: (0.0121674,0.187222). I have tackled this for a while, however cannot find why such huge differences are made.

LikeLike

Thanks for catching this. Unfortunately, most of the code was written under Julia 0.6, so a lot of the syntax needs to be updated to be compatible with Julia 1.0. I updated the code block for three mistakes. First, in line 16, I wasn’t multiplying the score by (1 – P(d|x)). Second, the derivatives for theta and RC were switching in line 16. Finally, I added in m.beta following your suggestion. Once I made these adjustments I get the correct standard errors. I really appreciate the feedback. Let me know if you find anything else.

LikeLike

Could I ask what does &gt mean in John Rust’s Bus Engine Replacement Data section?

LikeLike

Looks like the webpage converted the “greater than” sign to that string of text. Should be fixed now. Thanks for the heads up.

LikeLike