Mike Gerwitz

Free Software Hacker+Activist

aboutsummaryrefslogtreecommitdiffstats
blob: d56bdad8ca5f13c9200a064b6b7f5737dfc68350 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
# Environment access and mutation using regular expressions
#
# Copyright (C) 2018 Mike Gerwitz
#
#  This program is free software: you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation, either version 3 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#  along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
# This script, when applied recursively, implements a system for
# accessing and manipulating an environment using only a series of regular
# expressions (in the formal sense).  This poses some interesting challenges
# and builds off of previous example scripts (which are noted below).  In
# particular, since regular expressions are context-free, we cannot use
# for backreferences, and so much of the logic here is spent performing
# comparisons (see `cmp.sed').  (Note that sed does support backreferences;
# we do not use them here.)
#
# Consider this example:
#
#   foo=quick
#   bar=fox
#   baz=lazy
#
#   foo  <- the *foo
#   bar  <- brown *bar
#   verb <- jumped
#   subj <- the *baz dog
#   full <- *foo *bar *verb over *subj
#
# Recursively evaluating this until completion will yield the output
#
#   @0
#   foo=the quick
#   bar=brown fox
#   baz=lazy
#   verb=jumped
#   subj=the lazy dog
#   full=the quick brown fox jumped over the lazy dog
#
# along with a couple trailing newlines.
#
# To accomplish this task, we define a series of states that drive each
# regular expression.  The state is represented at the beginning of the
# string, prefixed by a `@'.  Some states take arguments so that they can
# act like functions with continuations, allowing them to be re-used in
# multiple contexts.  To transition to a new state, a regular expression
# replaces the state at the beginning of the string.  Iteration is
# accomplished by creating cycles in this state machine.  We can visualize
# the system roughly as this NFA, where the accepting states are enclosed in
# parenthesis:
#
#     ,-----,--------------------------------------------------,
#    v     v                                                    \
#   @A -> @E -> @PRECMP -> @CMP -> { @*, @_, @=, @! } -> @CLEAN -`
#    \
#     `-> { (@0), (@1) }
#
# This use of state prefixes in each regular expression makes everything
# quite verbose.  We could have accomplished the same thing with
# considerably less verbosity by using sed's block support, but that
# distracts from the fact that all of this can be done using only
# context-free regular expressions.  So we don't.
#
# Another consequence of using state prefixes is that the match prohibits
# multiple replacements within the same state on the same invocation of
# sed.  For example, @CLEAN's expressions do not use the `/g' flag because
# doing so would still only match the beginning of the pattern space once;
# so rather than being able to clean each env line in one shot, we have to
# clean one line at a time.  This could also be mitigated by using sed's
# block support (but again, we won't).  Consequently, each pass does very
# little---this script must be run many times to produce the final
# result.  However, it does have the nice benefit that you can observe the
# program processing step-by-step.
#
# This is intentionally esoteric; please do not ever use this methodology in
# a production system---it is simply the wrong thing to do (unless you're
# working under such heavy constraints that you have no choice, in which
# case it may be better to choose a different system).  I want to say that I
# had fun making this, but the inability to abstract away some fundamental
# verbosity gets quite frustrating.  (Attempts at further abstraction would
# have distracted too much from the implementation in this somewhat small
# example.)  I attempted to mitigate some of this frustration for you (the
# reader) by adding comments below the expressions to loosely document the
# groups, but as you may know (unless your implementation supports e.g. /x):
# writing them is often considerably easier than reading them.  Especially
# the BRE flavor.
#
# To run:
#
#   $ ./animate -c env-dyn.sed input-file
#
# See also `env-fixed.sed', which presents a much simpler implementation if
# the environment has a fixed set of variables.
##


# Read all lines into the pattern space.
:a; N; $!ba

# Exit code 1 means we finished successfully, whereas 2 means we finished in
# error.
/^@0/q1;  /^@1/q2

# If no state is found at the beginning of the string, then this is our
# first pass: initialize, beginning in state @A (assignment).  Always end in
# a newline to simplify regular expressions for assignments.
s/^[^@]/@A\n&/
s/[^\n]$/&\n/



##
# State @A - Validate and Normalize Assignment
##

# If there are no assignments remaining, transition @A->@0 to quit
# successfully the next iteration.
s/^@A\(.*\n\n\)[ \n]*$/@0\1/

# Check that the next assignment is well-formed.  If so, transition @A->@E
# and normalize the form by removing whitespace around the assignment
# operator.
s/^@A\(.*\n\n[a-z]\+\) *<- */@E\1<-/
#     (      1       )

# If we're still in State @A, then the assignment must have been
# syntatically invalid.  Transition to state @1 (to exit in error next
# iteration).
s/^@A/@1/



##
# State @E - Expand Assignment
#
# Expands variable references in the next assignment.  Note how this uses
# @PRECMP for both the expansions and for transitioning to the actual
# assignment to the environment.
##

# If a reference is found, transition @E->@PRECMP with two arguments
# representing the reference replacement (@*) and unknown reference (@_)
# states.
s/^@E\(.*\n\n[^\n]\+\*\([a-z]\+\)\)/@PRECMP \2 @* @_\1/
#     (        1     ^ (    2   ) )
#                       ref name

# If there are no more references (that is, no more `*'s in the assignment
# value), then transition @E->@PRECMP with two arguments representing the
# assignment (@=) and append (@!) states.
s/^@E\(.*\n\n\([a-z]\+\)<-[^\n*]\+\n\)/@PRECMP \2 @= @!\1/
#     (   1   (    2   )      ^      )
#              var name



##
# State @PRECMP(S) - Prepare for Comparison
#
# This state takes a single argument S that is the name of the variable to
# compare against.  This allows it to set up comparisons for multiple uses.
##

# Set up env var line for comparison by prefix it with ``~S'' (where S is
# the argument to @PRECMP), followed by a duplicate of the env var
# name.  This duplicate is necessary since the name will be modified during
# @CMP as part of the comparison process.
s/^\(@PRECMP \([a-z]\+\).*\n\)\([a-z]\+\)=/\1~\2 \3 \3=/
#   (    1    (   2    )     ) (    3   )
#              cmp name         env var

# Transition @PRECMP->@CMP if all environment variables have a comparison
# string before them.
s/^@PRECMP [a-z]\+\([^\n]\+\n\(~[^\n]*\n\)*\n\)/@CMP\1/
#                  (     1    (     2    )    )
#                               env vars



##
# State @CMP(C A)
#
# The comparison state is responsible for performing the actual comparison
# between the values set up by @PRECMP.  It takes two arguments C and A
# that represent the states to transition to for the consequent (if a match
# is successful) or alternative (if no match is found), respectfully.  This
# allows @CMP to be used in multiple contexts, like a function call.
##

# At this point, all values to be compared will be of the form ``~x y''.
# Compare the first character of x and y, replacing with an exclamation
# point at the beginning of the line in the case of a non-match.
#
# For more information on this strategy, see `cmp.sed'.  We limit ourselves
# here to [a-z].
s/^\(@CMP.*\n\)~a[^ ]* [^a][^ ]* /\1!/
s/^\(@CMP.*\n\)~b[^ ]* [^b][^ ]* /\1!/
s/^\(@CMP.*\n\)~c[^ ]* [^c][^ ]* /\1!/
s/^\(@CMP.*\n\)~d[^ ]* [^d][^ ]* /\1!/
s/^\(@CMP.*\n\)~e[^ ]* [^e][^ ]* /\1!/
s/^\(@CMP.*\n\)~f[^ ]* [^f][^ ]* /\1!/
s/^\(@CMP.*\n\)~g[^ ]* [^g][^ ]* /\1!/
s/^\(@CMP.*\n\)~h[^ ]* [^h][^ ]* /\1!/
s/^\(@CMP.*\n\)~i[^ ]* [^i][^ ]* /\1!/
s/^\(@CMP.*\n\)~j[^ ]* [^j][^ ]* /\1!/
s/^\(@CMP.*\n\)~k[^ ]* [^k][^ ]* /\1!/
s/^\(@CMP.*\n\)~l[^ ]* [^l][^ ]* /\1!/
s/^\(@CMP.*\n\)~m[^ ]* [^m][^ ]* /\1!/
s/^\(@CMP.*\n\)~n[^ ]* [^n][^ ]* /\1!/
s/^\(@CMP.*\n\)~o[^ ]* [^o][^ ]* /\1!/
s/^\(@CMP.*\n\)~p[^ ]* [^p][^ ]* /\1!/
s/^\(@CMP.*\n\)~q[^ ]* [^q][^ ]* /\1!/
s/^\(@CMP.*\n\)~r[^ ]* [^r][^ ]* /\1!/
s/^\(@CMP.*\n\)~s[^ ]* [^s][^ ]* /\1!/
s/^\(@CMP.*\n\)~t[^ ]* [^t][^ ]* /\1!/
s/^\(@CMP.*\n\)~u[^ ]* [^u][^ ]* /\1!/
s/^\(@CMP.*\n\)~v[^ ]* [^v][^ ]* /\1!/
s/^\(@CMP.*\n\)~w[^ ]* [^w][^ ]* /\1!/
s/^\(@CMP.*\n\)~x[^ ]* [^x][^ ]* /\1!/
s/^\(@CMP.*\n\)~y[^ ]* [^y][^ ]* /\1!/
s/^\(@CMP.*\n\)~z[^ ]* [^z][^ ]* /\1!/

# If we have not made a negative comparison replacement above, then the
# characters match; discard them to prepare for the next iteration.
s/^\(@CMP.*\n~\).\([^ ]*\) ./\1\2 /

# If we have ``~  '', then that means all characters have been compared and
# a match has been found.  Transition @CMP->C and remove the two trailing
# spaces.  If all are non-matches (all env lines prefixed with `!'),
# transition @CMP->A.  Otherwise, we're not done yet, so do nothing.
s/^@CMP \(@[^ ]\+\)[^\n]*\(.*\n~\)  /\1\2/
#        ( 1 = C  )  A    (   2  )
s/^@CMP [^ ]\+ \(@[^ ]\+\)\(\(\n![^\n]\+\)*\n\n\)/\1\2/
#         C     ( 1 = A  ) ( (     3     )   2  )



##
# State @CLEAN(S)
#
# This state cleans up after @CMP to restore environments to their original
# state.  It takes a single argument S to transition to after cleanup.
##

# Strip any non-letter prefix and extra comparison words off of env
# vars.  Note that we do not check for the empty line separator---a prefix
# check is enough since the assignment form must begin with a letter.
s/^\(@CLEAN.*\n\)[!~]\([a-z]\+ \)*\([a-z]\+=\)/\1\3/
#   (     1     )     (    2    )  (    3    )
#    state + args     extra words   var name=

# If all such prefixes have been stripped, then we're done with
# cleanup.  Transition @CLEAN->S.
s/^@CLEAN \(@[^ \n]\+\)\(\(\n[a-z][^\n]*\)*\n\n\)/\1\2/
#          (   1    ) ( (      3       )   2  )
#          next state     entire environment



##
# State @* - Replace Variable Reference
#
# Replace a variable reference with the value of that variable from the
# environment.
##

# Replace the last reference on the assignment line (which happens to be the
# reference that we matched on previously) with the value of the matched
# variable from the environment (prefixed by `~').  Then transition
# @*->@CLEAN with a single argument to return to @E for further expansion.
s/^@\*\(.*\n~[^=]\+=\([^\n]\+\).*\n\n[^\n]\+\)\*[a-z]\+/@CLEAN @E\1\2/
#      (      1      (   2    )              ) ^
#                     var value



##
# State @_ - Unknown Reference
#
# Replace unknown references with a string of the form "[UNKNOWN REF: name]".
##

# If we're in this state, then the last reference on the assignment line is
# unknown.  Replace it, then transition @_->@E for further expansion.
s/^@_\(.*\n\n[^\n]\+\)\*\([a-z]\+\)/@CLEAN @E\1[UNKNOWN REF: \2]/
#     (       1      ) ^ (    2   )
#                         ref name


##
# State @= (Assignment)
#
# Replace env var value with the text of the assignment and then transition
# @=->@CLEAN, providing a single state argument to transition to state @A
# after cleanup is complete.
##

s/^@=\(.*\n\)~\([^=]*\)=[^\n]*\(.*\n\n\)[^<]*<-\([^\n]*\)\n/@CLEAN @A\1\2=\4\3/
#     (  1  )  (   2  )        (   3   )        (   4   )
#                 var                            new val


##
# State @! (Non-Match Append)
#
# This state indicates that a match for an environment variable was not
# found and that it should be appended to the environment.
##

# Reformat the ``var<-val'' form as ``var=val'' and add it to the bottom of
# the environment.  The assignment line is removed.  Then, transition
# @!->@CLEAN with a single argument to transition back to @A after cleanup
# is complete.
s/^@!\(.*\)\n\n\([a-z]\+\)<-\([^\n]\+\)\n/@CLEAN @A\1\n\2=\3\n\n/
#     ( 1 )     ( 2 = var)   ( 3 = val)