Working in Linux, in a Windows World

If you’re interested in making the switch, there are several things you should know. I use the most popular flavor of Linux: Ubuntu (currently 16.04 Xenial Xerus) so my advice will be tailored to that distribution, though there will be some degree of generality among distributions. Interestingly, nowadays it seems all the popular OS have “gone back to” the Linux model of getting programs (i.e. through some kind of repository software). So, you’re probably familiar with using an “App Store” or something of the like to get your programs now, even on a computer. This actually used to be one of my criticisms of the “feel” of using an OS like Ubuntu, but now it’s pretty standard. Anyway, the easiest way to get new programs for Ubuntu is to search in “Ubuntu Software,” which is basically their “App Store” program. If what you’re looking for isn’t in that main repository, you can download *.deb files from websites, which are basically like Windows setup files.

For stuff that’s more focused on my work as an ecologist, I often use R. A curiosity of sorts is that the install messages for packages are often more verbose; I’m not sure why. In any case, be prepared to track down dependencies that are often automatically taken care of for you in Windows.

rk4 vs. lsoda Solver Specified in Method

It turns out that the solver really was the issue. Running an ODE w/default “lsoda” produces nothing, whereas running w/rk4 at least produces…something. I’m sure it’s not correct, though, at the very least because the delays aren’t properly defined. Now to try to simplify the model a little to better understand what’s going on.

Classical Runge-Kutta 4th Order Integration

I’ve been using the “rk4” solver, with luck, to solve the system of ODE’s translated from Stella. The original authors of the translation script used this setting, though I’m not entirely sure what it does. Is this the same type of solver that Stella uses? Looking back at the doc, I have flashbacks to my undergrad differential equations course (which was admittedly my favorite math class I ever took). It is a fixed step solver “with less overhead and less functionality (e.g. no interpolation and no events),” i.e. seemingly exactly what I want (deSolve vignette). It is also actually its own function, meaning it can be used outside of ode(), so I’m not sure what the advantage is to calling it within the ODE function.

In any case, using the dede() function with lsoda hasn’t been working out for me, even without any lag. I just wanted to run it to see if it would run, and it’s not producing anything. Now, I’m going to try running ode() w/lsoda (or the default solver) to see if the solver is indeed the problem.

I’m also curious about changing the model to “a compiled function in a dynamically loaded shared library” to potentially speed things up (rk4 section in deSolve vignette). First, perhaps, though, I’ll look around to get to know the model better and simplify it as I see fit.

Using dede() from the deSolve package in R

The problem is, it seems the delay slows things down a lot. I found out the reason is because the lag variables calculated in dede() during integration indeed are calculating the lags of all the state variables, and there’s no way around it. I’m curious if the lag() R function would work while it was integrating, and thus be a simpler solution.

I tried increasing the hmax to Inf but not including any lags in the dede() model. At this point, the dede() is returning blank variable values, and I’m not sure if it’s a consequence of that. So, I took hmax = Inf back out, and we’ll see what happens to run time, and if the correct values return to the output.

Using ode() from the deSolve Package in R

My latest adventures have revolved around solving ordinary differential equations (ODE’s), for the most part, in R. To do this, I’ve been using the ode() function from the deSolve package. In the next step of what I’m trying to do, though, it seems I need dede() for delayed differential equations.

From what I understand, you can pass non-delayed functions to dede(), which is just what I need. I have a system of equations that incorporates both delayed and non-delayed equations, so I think to solve the delayed, the whole thing needs to be in dede().

Part 2: Fixing the Stella R Python Script

Background

Welcome to my continued adventures in translating Stella models to R. To catch you up in case you haven’t been following along (i.e. a high probability that this applies to you) I’ve been retooling the StellaR.py script that I found on the internet. The latest version on the linked site is buggy and old, and written in Python 2. Additionally, I was given such a huge Stella model to work with that it would throw almost every exception in the book! I am almost finished with the new version of the StellaR.py conversion script (upgraded to Python 3), and though it’s not perfect (i.e. surely someone in the future will make some edits and thereby push the versioning forward), I’m happy to say it works: it translates old-version Stella models to R scripts (+1 for the open source movement). It does this by formatting the equation layer to be input into the ode() function in the package deSolve in R.

What I’m Currently Working On

In the old StellaR.py translation script, there was a note that the “delay function should be developed”…well, yes, it should! DELAY() was a function native to Stella, and the concept is really simple: it’s just a delayed version of whatever variable is being input. So, in Stella, let’s say the line is…

DELAY(X,1)

It’s just the variable X delayed by one time step, and the default initial value is 0. Practically in our model, it’s most commonly used to indicate e.g. the previous time step’s value alongside the current value in the system of equations. In an R data frame, this is a piece of cake, done with 1 line of indexing and a fill value at the beginning (I have a line just like it in several of my scripts).

The problem for me right now is conceptualizing how this is done in the context of the ode() function, which I am just at the beginner stage of learning. My big question about ode(), I guess, is “how does it work?” More specifically, is it iterating through time steps and calculating each value of each equation (and related, is that why it’s so slow? the code that runs it should be fast in terms of the languages (C/FORTRAN) they’re written in…)? Is this then generating a data frame as it goes as part of the object that results from the ode() function, or is it doing all of this “under the hood,” and if so, what hope do I have to index these variables as they’re being written? It feels like it should be an easy answer, and hopefully it will be, but it also seems like I’ll need to be able to access the values, and the resulting data frame, as it’s being calculated.

So I guess my biggest question is: can I reference prior values of the equations within/as the ode() is running, or do I need to handle it separately with the dede() function? My current line of thought is to write my own custom DELAY() function, which takes as arguments all of X’s specifications for the ode() function (e.g., the derivative equation, initial value, etc.) and just have it pass all of this to a dede() with the specified lag, returning the variable values at each time step. Here’s a pseudo-code jotting down of an example of my thoughts:

DELAY <- function(model_name,derivative,IV,lag,times) {
      lag_model <- <-function(t,Y,parms,...) {
      Time <<- t
      delayed <- model_name
      ddelayed <- derivative
      }
   dede(y=IV,times=time,func=lag_model)
}

T1_2previous_yeardry0 <- DELAY(T1_yeardry0,dT1_yeardry0,0,2,times)

A followup question though is whether or not this will run “alongside” the same time steps as the ode() function is taking, and thus return the values in step.

Part 1: Fixing the Stella R Python Script

To fix along yourself, first download StellaR.py v. 1.3. First, I’ll start with in-place changes so I can reference the original line numbers. Line 321 needs an added test to see if there’s really anything “to” the model, even if the INIT is found:

if init_match and len(lines[i-1].strip().split()[6:-2]) > 0:

Line 354: I added the R modulus %% to the optlist since in my version of the script, I have already translated it.

In various places in the script, look-behinds are needed:

Line 357: txtline=re.split('(=|\*|>(?!=)|<(?!=)|,|\^|\+|\-|[)]|[(]|[{]|[}]|/|\s+|>=|<=)',txtline)

Also, at various places, a parenthesis is needed after “if” when searching a line. Otherwise, you get variable names with the “if” string in them.

Line 363: if_num=len(re.findall('(if\s+|IF\s+|If\s+|if\(|IF|If)',txtline))
Line 367: conv_temp = re.split('(=|\*|>(?!=)|<(?!=)|,|\^|\+|\-|[)]|[(]|[{]|[}]|/|\s+|>=|<=|%%)', txtline)
Line 395: if (len(re.findall('(if\s+|IF\s+|If\s+|if\(|IF|If)',txtline)) > 0):
Line 420: if (len(re.findall('(if\s+|IF\s+|If\s+|if\(|IF|If)',txtline)) > 0):

There are some errors with temporary variable naming starting at the loop in line 443. The uses of variable names “l” and “c” (line 634 on) don’t work for Python. Search carefully for these occurrences as variable names and replace them down the script. Where “l” occurs in the mentioned loop, I’d recommend replacing it with 2 different variable names. They in fact reference different things and earn different names. I get that maybe “parts” isn’t important and can be overwritten, but for the sake of clarity, let’s keep it for now.

parts=lines[ln].strip().split() #splits the line into a list of strings by spaces
var=re.split('([)]|[(]|[{]|[}]|\s+)',parts[0])[0] #pulls the key out of the string

The method used to check for an installed package seems deprecated, so I just changed this line to loading the package, with a comment note to install it if it’s missing (this should be basic knowledge of anyone wanting to use an R script).

Line 495: ff.writelines(["library(deSolve) #If you don't have the package 'deSolve' installed, install it.\n"])

 

Before, this line didn’t check for the possibility that the “xname” variable referenced wasn’t itself data.

Line 523: if (convertors[conv].Data.xname not in convT2 and convertors[conv].Data.xname in convlist and not convertors[(convertors[conv].Data.xname)].isData)
Line 620: ff.writelines(["\t return(list(c(d", EqsL[0]])
Line 622: ff.writelines([")))\n}"])

From here, I can’t really easily reference line numbers from the original script here because I changed blank line spacing/formatting. Yet, the whole script will be posted, so this is just an explanation of the changes. I only kept 1 of the “explore” functions since they were redundant, and updated it to be more efficient.

def convExplore(lines, convname):
    i = [line for line,s in enumerate(lines) if s.startswith(convname + " =")][0]
    convEq = lines[i].split("=",1)[1].strip()
    tempc=i+1
    multilineconv=True
    while (multilineconv):
       if tempc == len(lines): multilineconv=False
       elif ('=' not in lines[tempc].strip().split() or re.search("(ELSE|else)$",lines[tempc-1].strip()) or re.search("^else",lines[tempc].strip()) or re.search("^then",lines[tempc].strip())):
          convEq = convEq + ' ' + ' '.join(lines[tempc].strip().split())
          tempc+=1
       else: multilineconv=False
    convEq=re.sub('{.*}', '', convEq).strip()
    return (convEq)

I deleted “single_string()” because join basically does the same thing. I rewrote the “if_extract” function, and deleted the “convWrite” function.

def if_extract(stella_var,conv):
   stella = stella_var
   txtline = str(stella[conv].Eq)
   if stella[conv].isIf:
      pattern = '(if\s+|IF\s+|If\s+|if\(|IF|If)'
      if_pattern = re.compile(pattern)
      function_pattern = '(\^|\*|DELAY|exp|min|max|mean|sum|abs|sin|cos|tan|log10|sqrt|round|log|atan|acos|floor)\(.*\)'
      protectedstrings = []
      for m in re.finditer(function_pattern,txtline):
         if m.group().count("(") < m.group().count(")"):
            endpos = [m.start() for m in re.finditer('\)', m.group())][m.group().count("(")-m.group().count(")")] + m.start()
         else: endpos = m.group().rfind(")") + m.start() + 1
         startpos = m.group().find("(") + m.start()
         protectedstrings.append(startpos)
         protectedstrings.append(endpos)
      protectedstrings.insert(0,0)
      if protectedstrings[-1] == len(txtline):
         parts = [txtline[i:j] for i,j in zip(protectedstrings, protectedstrings[1:])]
      else: parts = [txtline[i:j] for i,j in zip(protectedstrings, protectedstrings[1:]+[None])] 
      for i in range(0, len(parts), 2):
         if ('(' in parts[i]): parts[i] = parts[i].replace('(', ' ')
         if (')' in parts[i]): parts[i] = parts[i].replace(')', ' ')
      txtline = ''.join(parts)
      ifs = if_pattern.finditer(txtline)
      ifspos = []
      for y in ifs: ifspos.append(y.start())
      for ifst in reversed(ifspos):
         if_clause = re.match('(if\s+|IF\s+|If\s+|if(?!else)|IF|If)(?P<ifclause>.*?)\s+(then\s+|THEN\s+|Then\s+|then|THEN|Then)', txtline[ifst:])
         ifclause = if_clause.group('ifclause')
         then_clause = re.search('(then\s+|THEN\s+|Then\s+|then|THEN|Then)(?P<thenclause>.*?)\s*((?<!if)else\s+|ELSE\s+|Else\s+|(?<!if)else|ELSE|Else)', txtline[ifst:])
         thenclause = then_clause.group('thenclause')
         if len(re.split('((?<!if)else\s+|ELSE\s+|Else\s+|(?<!if)else|ELSE|Else)',txtline[ifst:])) > 2:
            elseclause = re.split('((?<!if)else\s+|ELSE\s+|Else\s+|(?<!if)else|ELSE|Else)',txtline[ifst:])[2].strip()
         else: elseclause = None
         ifelse = "ifelse(" + ifclause + "," + thenclause + "," + elseclause + ")" + ' '.join(re.split('((?<!if)else\s+|ELSE\s+|Else\s+|(?<!if)else|ELSE|Else)',txtline[ifst:])[3:])
         txtline = txtline[:ifst] + ifelse
      txtline = re.sub('(?<!<|>)=', '==', txtline) 
      if (' OR ' in txtline): txtline = txtline.replace(' OR ', ' | ')
      if (' or ' in txtline): txtline = txtline.replace(' or ', ' | ')
      if (' AND ' in txtline): txtline = txtline.replace(' AND ', ' & ')
      if (' and ' in txtline): txtline = txtline.replace(' and ', ' & ')
   return(txtline)

I deleted all the “function” functions. I wrote my own function to broaden the criteria to find meaningless lines.

def special_match(strg, search=re.compile(r'^(?:(?:.*[^A-Za-z0-9:()_\s\\])|(?:THEN|ELSE|then|else)).*$').search):
   return not bool(search(strg))

Accordingly, I deleted the “WN” line, and changed the test in the first loop to…

lines[ln] = lines[ln].expandtabs()
if special_match(lines[ln]) and not re.search("ELSE$",lines[ln-1].strip()): lines.remove(lines[ln])

Since at this point, the script is already looping over all lines in the file, I just did in-place line changes in lieu of all the “function” functions. Also, % isn’t a special character in Stella, so it was popping up in variable names, etc. In my equation layer, I had arrays, but in this case, I didn’t see much difference between the way an array was being used vs. just an ordinary variable. So, that’s why the brackets are being changed/replaced.

lines[ln]=lines[ln].replace('%', 'percent')
lines[ln]=lines[ln].replace('[', '_')
lines[ln]=lines[ln].replace(']', '')
lines[ln]=lines[ln].replace(' MOD ', ' %% ')
lines[ln]=lines[ln].replace('delay(', 'DELAY(')
lines[ln]=lines[ln].replace(' EXP', ' exp')
lines[ln]=lines[ln].replace(' MIN', ' min')
lines[ln]=lines[ln].replace(' MAX', ' max')
lines[ln]=lines[ln].replace(' MEAN', ' mean')
lines[ln]=lines[ln].replace(' SUM', ' sum')
lines[ln]=lines[ln].replace(' ABS', ' abs')
lines[ln]=lines[ln].replace(' SIN', ' sin')
lines[ln]=lines[ln].replace(' COS', ' cos')
lines[ln]=lines[ln].replace('TAN', 'tan')
lines[ln]=lines[ln].replace('LOG10', 'log10')
lines[ln]=lines[ln].replace(' SQRT', ' sqrt')
lines[ln]=lines[ln].replace(' ROUND', ' round')
lines[ln]=lines[ln].replace(' LOGN', ' log')
lines[ln]=lines[ln].replace(' ARCTAN', ' atan')
lines[ln]=lines[ln].replace(' ARCCOS', ' acos')
lines[ln]=lines[ln].replace('TIME', 't')
lines[ln]=lines[ln].replace('(time', '(t')
lines[ln]=lines[ln].replace(' PI', ' pi')
lines[ln]=lines[ln].replace(' INT', ' floor')
lines[ln]=re.sub(r'\(0\-([a-zA-Z0-9_]+)\)<0', r'\1>0', lines[ln])

At the end of that block is somewhat of a stylistic preference: the authors were testing if 0 – x was less than 0, so I just changed it to be if x > 0.  According to my in-place line changes above, I made a list called…

supported_func=['DELAY','delay','exp','min','max','mean','sum','abs','sin','cos','tan','log10','sqrt','round','log','atan','acos','t','pi','floor','dt']

I make this for use in a restricted word list, to test what’s in the equation. In the equation layer I was given, the previous authors multiplied by 0 to “turn off” flows. So when going through the flows, I added…

if "0*" in txtline:
 init_position = txtline.find("0*")
 txtline = txtline.replace(txtline[init_position:], "0")

…to just change anything multiplied by 0 to 0. When finding a function in the line, I updated the format so that it would be anchored to the parenthesis.

if ((conv_temp[i]) in supported_func):
   flows[fl].hasFunction=True
   Rformatting = []
   for x in re.split('(\()',flows[fl].Eq): Rformatting.append(x.strip())
   flows[fl].Eq = ''.join(Rformatting)
conv_temp = re.split('=|\*|>|<|,|\+|\^|\-|\)|\(|[{]|[}]|/|\s+|%%', convertors[conv].Eq)
restricted_words = ['if','If','IF','AND','and','THEN','then','ELSE','else','OR','','or'] + supported_func
if ((conv_temp[i] not in restricted_words) and (not is_number(conv_temp[i]))):
   if (conv_temp[i] not in list(convertors.keys()) and conv_temp[i] not in list(models.keys()) and conv_temp[i] not in list(flows.keys())):  convertors[conv_temp[i]]=convertor(conv_temp[i])

I changed that block of text for each loop where the convertors are analyzed. When determining what will go into the “additional lines”…

var=re.split('([)]|[(]|[{]|[}]|\s+|\*)', parts[0])[0]
 special_words = ['INIT','','THEN','then','if','IF','else','ELSE'] + supported_func + list(flows.keys()) + list(convertors.keys()) + list(models.keys())
 if (var not in special_words and not lines[ln].startswith(var + '(t)')):

As mentioned above, in the data write out loop, I had to add some complexity in case the variable it references is itself data:

if (convertors[conv].Data.xname not in convT2 and convertors[conv].Data.xname in convlist and not convertors[(convertors[conv].Data.xname)].isData):
   convT2.append(convertors[conv].Data.xname)
   ff.writelines(["\t", convertors[conv].Data.xname, " <- ", convertors[convertors[conv].Data.xname].Eq, "\n"])
elif (convertors[conv].Data.xname not in convT2 and convertors[conv].Data.xname in convlist and convertors[(convertors[conv].Data.xname)].isData):
   ff.writelines(["\t", convertors[conv].Data.xname, " <- inputData(", convertors[convertors[conv].Data.xname].Data.xname, ", '", convertors[conv].Data.xname, "')\n"])

Then, I overhauled the part that writes out the convertors and flows.

convsflows = convlist + flowlist
while convsflows:
   for fl in convsflows:
   if fl in flowlist: dependents = flows[fl].in_flows + flows[fl].in_convertors
   elif fl in convlist: dependents = convertors[fl].in_convertors
   flowWrite=False
   if len([i for i in dependents if i in initialized]) < len(dependents) and [i for i in dependents if i in convsflows]: flowWrite=False
   else: flowWrite = True
   if (flowWrite):
      if fl in flowlist:
         if flows[fl].isIf: flows[fl].Eq = if_extract(flows,fl)
         converted = flows[fl]
      elif fl in convlist:
         if convertors[fl].isIf: convertors[fl].Eq = if_extract(convertors,fl)
         converted = convertors[fl]
      ff.writelines(["\t", fl, " <- ", str(converted.Eq), "\n"])
      initialized.append(fl)
      convsflows.remove(fl)

I had a line change earlier that went along with the fact that I decided to keep parameter names separate, and then join them up…

ff.write('parm_names <- c("' + convs[0] + '"')
for i in range(1, len(convs)): ff.write(',\n"' + convs[i] + '"')
ff.writelines(")\n")
ff.write("names(parms) <- parm_names\n")

This avoids some mess in assigning tricky names. I took the liberty of taking out several functions of the “x_functions.R” file; for instance, the MOD function at the end is just as easily replaced with the %% in line. There were also several others that I thought would warrant translation within the script written out.

Some Lingering Questions: Stella –> R

  1. I converted array variables to just regular variables; I replaced the front bracket with an underscore and removed the closing bracket. Does this have any consequence?
  2. I still may have some order of operations problems since I took parentheses out unless there are functions to be translated within if statements…
  3. Does case not matter  for variable names in Stella equation layers? I see what I assume is the same variable referenced but the case of the first letter varies.
  4. I see repeated equations in the layers. How does this work?