Mike Gerwitz

Activist for User Freedom

aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tamer: xir::parse::ele: Introduce sum nonterminalsmainMike Gerwitz2022-07-144-10/+369
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the nonterminal and `A ... Z` are the inner nonterminals---it produces a parser that provides a choice between a set of nonterminals. This is implemented efficiently by understanding the QName that is accepted by each of the inner nonterminals and delegating that token immediately to the appropriate parser. This is a benefit of using a parser generator macro over parser combinators---we do not need to implement backtracking by letting inner parsers fail, because we know ahead of time exactly what parser we need. This _does not_ verify that each of the inner parsers accept a unique QName; maybe at a later time I can figure out something for that. However, because this compiles into a `match`, there is no ambiguity---like a PEG parser, there is precedence in the face of an ambiguous token, and the first one wins. Consequently, tests would surely fail, since the latter wouldn't be able to be parsed. This also demonstrates how we can have good error suggestions for this parsing framework: because the inner nonterminals and their QNames are known at compile time, error messages simply generate a list of QNames that are expected. The error recovery strategy is the same as previously noted, and subject to the same concerns, though it may be more appropriate here: it is desirable for the inner parser to fail rather than retrying, so that the sum parser is able to fail and, once the Kleene operator is introduced, retry on another potential element. But again, that recovery strategy may happen to work in some cases, but'll fail miserably in others (e.g. placing an unknown element at the head of a block that expects a sequence of elements would potentially fail the entire block rather than just the invalid one). But more to come on that later; it's not critical at this point. I need to get parsing completed for TAME's input language. DEV-7145
* tamer: xir::parse::ele: Introduce open/close span bindingsMike Gerwitz2022-07-134-48/+113
| | | | | | | | | | | | | | This adds the ability to bind identifiers to represent `OpenSpan` and `CloseSpan`, available to the `@` and `/` maps. Since identifiers in TAME originate from attributes, this may not get a whole lot of use, but it's important to be available. There is some awkwardness in that the opening span appears to be scoped to the entire nonterminal, but it's actually only available in the `@` mapping. I'll change this if it's actually needed; this keeps things simple for now. DEV-7145
* tamer: xir::parse::ele: Initial Close mapping supportMike Gerwitz2022-07-133-21/+86
| | | | | | | | | | Since the parsers produce streaming IRs, we need to be able to emit tokens representing closing delimiters, where they are important. This notably doesn't use spans; I'll add those next, since they're also needed for the previous work. DEV-7145
* tamer: xir::parse::ele::test: TODO regarding recovery strategyMike Gerwitz2022-07-131-0/+7
| | | | | | | | | | | The comment explains the issue. I don't think the strategy is going to be a desirable one, but I want to move on and observe in retrospect how it ought to be handled. The important part right now is that recovery is accounted for and possible, which was a long-standing concern. DEV-7145
* tamer: xir::parse::ele: Initial element parser generator conceptMike Gerwitz2022-07-1311-32/+922
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This begins generating parsers that are capable of parsing elements. I need to move on, so this abstraction isn't going to go as far as it could, but let's see where it takes me. This was the work that required the recent lookahead changes, which has been detailed in previous commits. This initial support is basic, but robust. It supports parsing elements with attributes and children, but it does not yet support the equivalent of the Kleene star (`*`). Such support will likely be added by supporting parsers that are able to recurse on their own definition in tail position, which will also require supporting parsers that do not add to the stack. This generates parsers that, like all the other parsers, use enums to provide a typed stack. Stitched parsers produce a nested stack that is always bounded in size. Fortunately, expressions---which can nest deeply---do not need to maintain ancestor context on the stack, and so this should work fine; we can get away with this because XIRF ensures proper nesting for us. Statements that _do_ need to maintain such context are not nested. This also does not yet support emitting an object on closing tag, which will be necessary for NIR, which will be a streaming IR that is "near" to the source XML in structure. This will then be used to lower into AIR for the ASG, which gives structure needed for further analysis. More information to come; I just want to get this committed to serve as a mental synchronization point and clear my head, since I've been sitting on these changes for so long and have to keep stashing them as I tumble down rabbit holes covered in yak hair. DEV-7145
* tamer: parse::transition::Lookahead: ParseState=>Token type paramMike Gerwitz2022-07-133-12/+9
| | | | | | | | | | Having the lookahead token generic over the `ParseState` was a pain in the ass for stitching, since they shared the same token type but not the same parser. I don't expect there to be any need to be able to infer other parser-related types for a token of lookahead, so I'd rather just make my life easier until such a thing is needed. DEV-7145
* tamer: Replace ParseStatus::Dead with generic lookaheadMike Gerwitz2022-07-1212-327/+404
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
* tamer: parse::state::transition: Extract module into own fileMike Gerwitz2022-07-072-238/+251
| | | | | | | That's it. Just preparing for changes that will change how lookahaeds and dead state transitions will work. DEV-7145
* tamer: parse: Introduce lookahaed token in ParserMike Gerwitz2022-07-074-17/+512
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | *NB: This is the initial change to introduce the token of lookahead, but this does not fully integrate it. In particular, this is missing from the stitching/delegation layer.* This has been a long time coming, I suppose, though I had tried to avoid it with `Parser::delegate_lookahead`. But the problem with doing that is that it forced the ParserState to recurse, which both violates that I want no looping constructs except for the toplevel, and performs additional stack allocation as it is not in tail position. The final straw was having to both return an error _and_ an aggregate object for the attribute parser when an unexpected element is encountered (this code is not yet committed). One option was to add a recovery object to the error object, and formalize that, but then we have other concerns; for example, what if that recovery object triggered an error? We'd have to mask either the old or the new error. But we wouldn't want to mask either, because the object causing the error would be the aggregate attributes, which is _not_ a recovery object, but actual data we want to emit. And so it's a kluge right off of the bat. The use of a token of lookahaed is a more traditional approach and has uses outside of just this one scenario. It'll also allow for the removal of recursion from the existing ParserStates, and possibly the elimination of dead state associated data, though I may end up leaving that; more to come. Rust will also optimize away lookahead storage and processing in Parsers that do not utilize it. DEV-7145
* tamer: Ensure debug_assert! takes effect in test profileMike Gerwitz2022-07-051-0/+15
| | | | | | | | | I'd feel rather silly if I used `debug_assert!` for the sake of tests and they weren't actually being run due to optimization settings. This is just to catch potential future regressions; all is well today. DEV-7145
* tamer: parse::state::TransitionResult: Make opaqueMike Gerwitz2022-07-052-18/+21
| | | | | | | | There was only one test outside of the `parse` module using these fields. The next commit will be introducing lookahead, and I do not want to have to trust callers to ensure invariants are met. DEV-7145
* Revert "tamer: xir: Initial re-introduction of AttrEnd"Mike Gerwitz2022-06-296-191/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit b973d36862a4a2aaf53fb0b25fba01b57e5a7463. Alright, I'm getting sick of fighting with myself on this. But rather than just removing the last commit, I'm going to keep it around, so that my thoughts are clearly documented for my future quarrels with myself. Firstly: this added more overhead than I wanted it to. While it wasn't significant, it did add 100--150ms to one of our largest systems, up from ~2.8s, which seems a bit much for a token that's really just meant to make life easier for the parser. Further, it seems that all I've managed to do is push my original problem to a different layer---this started as a means to resolve having to emit both an object and an error simultaneously in the case where aggregate attribute parsing has completed, but we encounter an error on the next token (e.g. an unexpected element). But XIRF, if it's missing AttrEnd, should throw an error, but should also recover. Recovery is easy---just assume that it was present---_but then we don't emit a XIRF `AttrEnd` token_, which is necessary for downstream systems. So we'd need to either: (a) emit both a token and an error; or (b) panic. But if we're doing (a), then the need for `AttrEnd` goes away, because it solves the original problem (though the other concerns of the previous commit still stand). (b) is not ideal at all, even though the missing token does represent an internal system error; it's not something the user can correct. But, given that it's something that the user cannot correct, doesn't that imply that it's an awkward thing to include in the token stream? So back to `AttrEnd` being an awkward PITA to have. So, given (a), I'll just do that: errors will become more of a "hey, this error just occurred, but I'm trying to recover---here's an object that you should use if you choose to continue parsing, but it may or may not be what you're looking for; proceed with caution". That flips the original script: I imagined having external systems feed recovery tokens, but this encapsulates recovery within the parser, which really is more appropriate, though less flexible than having an omniscient external recovery system; such a monolith was always an awkward concept and would be difficult to implement cleanly. This can also potentially be implemented as a generalization of the Dead state change that allowed an object to be emitted alongside the lookahead/error. Anyway, back to where I was...I'm sure I'll look back on this in the future shaking my head, reflecting on how naive I was. DEV-7145
* tamer: xir: Initial re-introduction of AttrEndMike Gerwitz2022-06-296-40/+191
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AttrEnd was initially removed in 0cc0bc9d5a92e666e4ec8319f6bd29c35cc331a8 (and the commit prior), because there was not a compelling reason to use it over a lookahead operation (returning a token via the a dead state transition); `AttrEnd` simply introduced inconsistencies between the XIR reader (which produced AttrEnd) and internal XIR stream generators (e.g. the lowering operations into XIR->XML, which do not). But now that parsers are performing aggregation---in particular the attribute parser-generator `xir::parse::attr`---this has become quite a pain, because the dead state is an actionable token. For example: 1. Open 2. Attr 3. Attr 4. Open 5. ... In the happy case, token #4 results in `Parsed::Incomplete`, and so can just be transformed into the object representing the aggregated attributes. But even in this happy path, it's ugly, and it requires non-tail recursion on the parser which requires a duplicate stack allocation for the `ParserState`. That violates a core principle of the system. But if there is an error at #4---e.g. an unexpected element---then we no longer have a `Parsed::Incomplete` to hijack for our own uses, and we'd have to introduce the ability to return both an error and a token, or we'd have to introduce the ability to keep a token of lookahead instead of reading from the underlying token stream, but that's complicated with push parsers, which are used for parser composition. Yikes. And furthermore, the aggregation has caused me to introduce the ability to override the dead state type to introduce both a token of lookahead and aggregation information. This complicates the system and is going to be confusing to others. Given all of this, AttrEnd does now seem appropriate to reintroduce, since it will allow processing of aggregate operations when encountering that token without having to worry about the above scenario; without having to duplicate a `ParseState` stack; without having to hijack dead state transitions for producing our aggregate object; and everything else mentioned above. This commit does not modify those abstractions to use AttrEnd yet; it re-introduces the token to the core system, not the parser-generators, and it doesn't yet replace lookahead operations in the parsers that use them. That'll come next. Unlike the commit that removed it, though, we are now generating proper spans, so make note of that here. This also does not introduce the concept to XIRF yet, which did not exist at the time that it was removed, so XIRF is filtering it out until a following commit. DEV-7145
* tamer: Cargo.toml: Remove lazy_staticMike Gerwitz2022-06-242-8/+0
| | | | | | | | | | This is not longer needed after the previous commit, with static spans having been replaced by `const` spans. This used to be required before Rust acquired better const features, and before I had preinterned symbols. DEV-7145
* tamer: xir: Introduce {Ele,Open,Close}SpanMike Gerwitz2022-06-2417-424/+767
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This isn't conceptally all that significant of a change, but there was a lot of modify to get it working. I would generally separate this into a commit for the implementation and another commit for the integration, but I decided to keep things together. This serves a role similar to AttrSpan---this allows deriving a span representing the element name from a span representing the entire XIR token. This will provide more useful context for errors---including the tag delimiter(s) means that we care about the fact that an element is in that position (as opposed to some other type of node) within the context of an error. However, if we are expecting an element but take issue with the element name itself, we want to place emphasis on that instead. This also starts to consider the issue of span contexts---a blob of detached data that is `Span` is useful for error context, but it's not useful for manipulation or deriving additional information. For that, we need to encode additional context, and this is an attempt at that. I am interested in the concept of providing Spans that are guaranteed to actually make sense---that are instantiated and manipulated with APIs that ensure consistency. But such a thing buys us very little, practically speaking, over what I have now for TAMER, and so I don't expect to actually implement that for this project; I'll leave that for a personal project. TAMER's already take a lot of my personal interests and it can cause me a lot of grief sometimes (with regards to letting my aspirations cause me more work). DEV-7145
* tamer: asg::ident: {prolog=>prologue} typo fixMike Gerwitz2022-06-232-5/+3
| | | | Somewhat humorous.
* tamer: xir::reader: Opening and closing tag whitespaceMike Gerwitz2022-06-222-27/+116
| | | | | | | | | | | | | | | Non-attribute and non-empty start/end tags will have their whitespace as part of the produced span. This sets us up for a following change that will allow for deriving the name span from this span given a QName, which gives us a span that both represents the entire XIR token and allows deriving the element name. An accurate token span is necessary for parsing errors where an element was not expected, while an element name span is more appropriate for issues of grammar and semantic errors that deal not with the fact that an element was encountered, but _what_ element was encountered. DEV-7145
* tamer: xir::reader: Correct empty element whitespace handlingMike Gerwitz2022-06-222-10/+53
| | | | | | | | This both adds clarifying tests and corrects the case of `<foo/>`, where the offset was erroneously off by one---it saw that there were no attributes and added a byte thinking it'd include `>`, as in `<foo>`. DEV-7145
* tamer: xir::parse: Attribute parser generatorMike Gerwitz2022-06-219-10/+871
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first parser generator for the parsing framework. I've been waiting quite a while to do this because I wanted to be sure that I understood how I intended to write the attribute parsers manually. Now that I'm about to start parsing source XML files, it is necessary to have a parser generator. Typically one thinks of a parser generator as a separate program that generates code for some language, but that is not always the case---that represents a lack of expressiveness in the language itself (e.g. C). Here, I simply use Rust's macro system, which should be a concept familiar to someone coming from a language like Lisp. This also resolves where I stand on parser combinators with respect to this abstraction: they both accomplish the exact same thing (composition of smaller parsers), but this abstraction doesn't do so in the typical functional way. But the end result is the same. The parser generated by this abstraction will be optimized an inlined in the same manner as the hand-written parsers. Since they'll be tightly coupled with an element parser (which too will have a parser generator), I expect that most attribute parsers will simply be inlined; they exist as separate parsers conceptually, for the same reason that you'd use parser combinators. It's worth mentioning that this awkward reliance on dead state for a lookahead token to determine when aggregation is complete rubs me the wrong way, but resolving it would involve reintroducing the XIR AttrEnd that I had previously removed. I'll keep fighting with myself on this, but I want to get a bit further before I determine if it's worth the tradeoff of reintroducing (more complex IR but simplified parsing). DEV-7145
* tamer: xir::st: Add missing docs for generated QName constantsMike Gerwitz2022-06-211-0/+14
| | | | | | | | This was missed. It was not possible, using the documentation alone (without looking at the linked source) to tell what the QName actually represented, though you could assume by the name. DEV-7145
* tamer: fmt: New type-based formatting systemMike Gerwitz2022-06-104-0/+555
| | | | | | | | | | | | This is partly an experiment, but is designed to simplify producing English sentences in various contexts. It makes use of a not only unstable, but incomplete, Rust feature---adt_const_params, for a static str const type parameter. Hopefully that ends up being stabalized. This uses types, but it's the same as function composition due to Rust's monomorphization. DEV-7145
* tamer: parse::Parser: Add remaining field docsMike Gerwitz2022-06-071-0/+13
| | | | DEV-7145
* tamer: parse::ParseState: Remove Default trait boundMike Gerwitz2022-06-076-18/+70
| | | | | | | | | | | | | | | | | | | `ParseState` originally required `Default` for use with `mem::take` in `Parser::feed_tok`. This unfortunately cannot last, since more specialized parsers require context during initialization in order to provide useful diagnostic information. (The other option is to require the caller to augment errors with diagnostic information, but that would have to be duplicated by every caller and complicates parser composition; I'd prefer those diagnostic details remain encapsulated.) Replacing `Default` with `Option` is uglier, but it ends up producing the same assembly as `mem::take` did, at least at the time of writing. Because Rust is able to elide unnecessary moves using this implementation, there is no need for `unwrap_unchecked` or other unsafe methods, which is great, since it shows that this parsing methodology is viable entirely in safe Rust. DEV-7145
* tamer: parse::state::ParseState::DeadToken: New associated typeMike Gerwitz2022-06-077-31/+85
| | | | | | | | | | | | | | | | | | | | | | | | Previously, `ParseStatus::Dead` always yielded `ParseState::Token`. However, I'm working on introducing parsers that aggregate (parsing XML attributes into structs), and those parsers do not know that they have completed aggregation until they reach a dead state; given that, I need to yield additional information at that time. I played around with a number of alternative ideas, but this ended up being the cleanest, relative to the effort involved. For example, introducing another parameter to `ParseStatus::Dead` was too burdensome on APIs that ought not concern themselves with the possibility of receiving an object in addition to a lookahead token, since many parsers are not capable of doing so (given that they map M:(N<=M)). Another option that I abandoned fairly quickly was having `is_accepting` (potentially renamed) return an aggregate object, since that's on the side and didn't feel like it was part of the parsing pipeline. The intent is to abstract this some in a new `ParseState` method for delegation + aggregation. DEV-7145
* tamer: Consistent span diagram representationMike Gerwitz2022-06-067-67/+72
| | | | | | | | | | | I'll document it more formally eventually, but this settles on a mix of the two: square brackets and dashes for intervals, `+` for intersecting lines, byte offsets below interval endpoints, and names below that. The docblock for `Span` itself iss still off; I'll probably just take one of the test cases and paste it there at some point. DEV-7145
* tamer: xir::attr::Attr: Introduce AttrSpanMike Gerwitz2022-06-064-44/+163
| | | | | | | | | | | | | | | | | This replaces a tuple with a tuple struct that allows for calculating more complete span information, such as the span encompassing the entire attribute and the value span including the surrounding quotes. This includes logic that ought to be abstracted into `Span` itself, and it's not as formal as I'd like it to be (e.g. not ensuring context), but this is a good starting point. Note that parsers call `Token::span`, which in turn calculates the attribute span, each time an attribute is encountered during lowering. But Rust does a good job at optimizing away unnecessary operations, so this didn't have an observable impact on time. DEV-7145
* tamer: xir::st::qname: New moduleMike Gerwitz2022-06-065-104/+122
| | | | | | This moves and deduplicates the static `QName`s into a common area. DEV-7145
* tamer: xir::flat::{State=>XirToXirf}: RenameMike Gerwitz2022-06-026-94/+96
| | | | | | | Like the previous two commits, this states the intent of this parser, which results in more clear pipeline composition. DEV-7145
* tamer: asg::air::{AirState=>AirAggregate}: RenameMike Gerwitz2022-06-022-14/+16
| | | | | | Like the previous commit, this emphasizes what is happening. DEV-7145
* tamer: obj::xmlo::{lower=>air}: Rename {LowerState=>XmloToAir}Mike Gerwitz2022-06-023-56/+58
| | | | | | | | This provides much more clarity as to what is going on. Further, it's less ambiguous, since I'm about to introduce a new type of xmlo lowering into XIR for writing the actual xmlo files. DEV-7145
* tamer: Integrate xir::reader as a parser in the lowering pipelineMike Gerwitz2022-06-027-161/+260
| | | | | | | | | | | | | | | | | This allows `XmlXirReader` to be used in a `Lower` operation, just as everything else, bringing me one step closer to a pipeline that can be concisely represented; this is finally beginning to unify in a clear way, though it is still a bit of a mess. This causes `XmlXirReader` to _act_ like a `parse::Parser` in that it yields a `ParsedResult`, but it does not use `parse::Parser` itself; that was the _original_ plan: convert it into a `ParseState` where `XmlXirReader` became a context, and force `Parser` to yield by feeding it a stream of tokens with `repeat`, but that ended up performing poorly relative to this change. I did some investigation, which I might write about in the future, but for now, this solution works just fine. DEV-7145
* tamer: parse: Split into multiple modulesMike Gerwitz2022-06-015-1041/+1193
| | | | | | | | | | | | | | | | | | | This abstraction has grown quite a bit, and it's time to start formalizing it a bit. This split doesn't change any behavior, but it does start to make it easier to reason about by clearly stating the broad components and how they interact with one-another. This doesn't yet move the tests; those will come next, but they are very few. The reason I gave previously for this was because (a) they're tested indirectly via the systems that utilize them and (b) because the abstraction was not yet settled on the process was already very expensive. No test coverage was lost---it's only that failures were potentially harder to debug on test failures, but in practice not even this was true, because the deeply expressive types all but ensured that, if it compiles, it will function in a way that is expected. Unit tests and documentation for this system will be added once I'm sure that this abstraction is in a proper state. DEV-7145
* tamer: parse: Move parse::lower into LowerMike Gerwitz2022-06-012-59/+62
| | | | | | | | This also modifies `poc` such that `Lower` is invoked as an associated function rather than a method to emphasize the pattern that is forming, so that it can be later abstracted away. DEV-11864
* tamer: parse: Rename {lower_*_while_ok=>lower_*}Mike Gerwitz2022-05-272-6/+6
| | | | | | | | The `while_ok` can just be implied with a lowering operation, and that reduces the name complexity so that we can maybe introduce even more specialized methods without resulting in a huge sentence as a name. DEV-11864
* tamer: Refactor asg_builder into obj::xmlo::lower and asg::airMike Gerwitz2022-05-2712-1200/+1599
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This finally uses `parse` all the way up to aggregation into the ASG, as can be seen by the mess in `poc`. This will be further simplified---I just need to get this committed so that I can mentally get it off my plate. I've been separating this commit into smaller commits, but there's a point where it's just not worth the effort anymore. I don't like making large changes such as this one. There is still work to do here. First, it's worth re-mentioning that `poc` means "proof-of-concept", and represents things that still need a proper home/abstraction. Secondly, `poc` is retrieving the context of two parsers---`LowerContext` and `Asg`. The latter is desirable, since it's the final aggregation point, but the former needs to be eliminated; in particular, packages need to be worked into the ASG so that `found` can be removed. Recursively loading `xmlo` files still happens in `poc`, but the compiler will need this as well. Once packages are on the ASG, along with their state, that responsibility can be generalized as well. That will then simplify lowering even further, to the point where hopefully everything has the same shape (once final aggregation has an abstraction), after which we can then create a final abstraction to concisely stitch everything together. Right now, Rust isn't able to infer `S` for `Lower<S, LS>`, which is unfortunate, but we'll be able to help it along with a more explicit abstraction. DEV-11864
* current/compiler/worksheet: Generate lv:package/@nameMike Gerwitz2022-05-262-2/+4
| | | | | | | | | | This is present on all other packages. Rather than complicating TAMER to accommodate a missing name, it's trivial to just add it. This will, unfortunately, invalidate and require rebuilding of all xmlo files, based on the `.rev-xmlo` bump. DEV-11864
* tamer: Add Display impl for each ParseState for generic ParseErrorsMike Gerwitz2022-05-258-41/+209
| | | | | | | | | | | | | This is intended to describe, to the user, the state that the parser is in. This will be used to convey additional information for general parser errors, but it should also probably be integrated into parsers' individual errors as well when appropriate. This is something I expected to add at some point, but I wanted to add them because, when dealing with lowering errors, it can be difficult to tell what parser the error originated from. DEV-11864
* tamer: parse::LowerIter: Generic inner TripIter iteratorMike Gerwitz2022-05-241-4/+4
| | | | | | This commit is preparing to compose LowerIter directly. DEV-11864
* tamer: iter::trip: Flatten ResultMike Gerwitz2022-05-207-65/+88
| | | | | | | | | | | | | | | | | | | The `*_iter_while_ok` functions now compose like monads, flattening `Result` at each step and drastically simplifying handling of error types. This also removes the bunch of `?`s at the end of the expression, and allows me to use `?` within the callback itself. I had originally not used `Result` as the return type of the callback because I was not entirely sure how I was going to use them, but it's now clear that I _always_ use `Result` as the return type, and so there's no use in trying to be too accommodating; it can always change in the future. This is desirable not just for cleanup, but because trying to refactor `asg_builder` into a pair of `Parser`s is really messy to chain without flattening, especially given some state that has to leak temporarily to the caller. More on that in a future commit. DEV-11864
* tamer: asg: Hoist Root from Ident into ObjectMike Gerwitz2022-05-196-52/+27
| | | | | | | | This was always the intent, but I didn't have a higher-level object yet. This removes all the awkwardness that existed with working the root in as an identifier. DEV-11864
* tamer: asg::Object: Introduce Object::IdentMike Gerwitz2022-05-196-169/+114
| | | | | | | | | | | | | | | This wraps `Ident` in a new `Object` variant and modifies `Asg` so that its nodes are of type `Object`. This unfortunately requires runtime type checking. Whether or not that's worth alleviating in the future depends on a lot of different things, since it'll require my own graph implementation, and I have to focus on other things right now. Maybe it'll be worth it in the future. Note that this also gets rid of some doc examples that simply aren't worth maintaining as the API evolves. DEV-11864
* tamer: num: Header typo correctionMike Gerwitz2022-05-191-1/+1
|
* tamer: asg::Ident{Object=>}: RenameMike Gerwitz2022-05-199-236/+203
| | | | | | | I think this may have been renamed _from_ `Ident` some time ago, but I'm too lazy to check. In any case, the name is redundant. DEV-11864
* tamer: asg: Move SymAttrs conversion into asg_builderMike Gerwitz2022-05-192-57/+56
| | | | | | | | | This is a lowering operation and does not belong here. What a tangled mess this all was (see recent commits); no wonder it was so confusing. DEV-11864
* tamer: asg::object: Merge into asg::identMike Gerwitz2022-05-194-1593/+1588
| | | | | | | Everything in this file relates to identifiers, and I'm about to introduce a higher-level object, one of which may be an identifier. DEV-11864
* tamer: obj::xmlo::asg_builder::IdentKindError: Merge into AsgBuilderErrorMike Gerwitz2022-05-191-49/+18
| | | | | | | Now that these are in the same module, there's no need for them to be separate from one-another. DEV-11864
* tamer: Move Dim and {Sym=>}Dtype into num moduleMike Gerwitz2022-05-1915-197/+183
| | | | | | | | | | | | A previous commit mentioned that there's not a place for `Dim`, and duplicated it between `asg` and `xmlo`. Well, `Dtype` is also needed in both, and so here's a home for now. `Dtype` has always been an inappropriate detail for the system and will one day be removed entirely in favor of higher-level types; the machine representation is up to the compiler to decide. DEV-11864
* tamer: Move SymAttrs lowering into asg_builderMike Gerwitz2022-05-193-245/+236
| | | | | | | | | | | | asg_builder is about to be replaced, but in the process of simplifying the destination IR (the ASG), I'm moving things into the proper place. This never belonged here---it belongs with the actual lowering operation. Previously, this was not reasoned about in terms of a lowering operation, and was written when I was first introducing myself to Rust and trying to get a proof-of-concept linker working. DEV-11864
* tamer: asg::ident::Dim: Narrow typeMike Gerwitz2022-05-196-97/+62
| | | | | | | | | | | | This matches xmlo::Dim, and could be the same thing, if we can find a home for it in the future; it's not worth creating such a home right now when I'm not yet sure what else ought to live there; the duplication may be fine. The conversion from xmlo needs to be moved, and `Dim` is going to be used for more than just identifiers (expressions will have type inference performed). DEV-11864
* tamer: parse: Persistent contextMike Gerwitz2022-05-181-8/+112
| | | | | | | | | | | | This allows retrieving and providing a context to a `Parser`. This is intended for use with an aggregating parser, in particular to construct the ASG and return it. This is a component of a change that replaces `asg_builder` with a `Parser`-based lowering into the ASG, but there are still changes that need to be made to simplify things and complete its integration. DEV-11864