Mike Gerwitz

Activist for User Freedom

aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tamer: ir::asg::ident: Use symbols in place of string slice mappingMike Gerwitz2021-09-291-3/+3
| | | | | | `IdentKind` needs to be written to `xmle` files and displayed in error messages. String slices were used when quick-xml was used for writing, which will be going away with the new writer.
* tamer: Start of XIR-based xmle writerMike Gerwitz2021-09-281-4/+24
| | | | | | | | | | | | | | | | | | | | | This has been a long time coming, and has been repeatedly stashed as other parts of the system have evolved to support it. The introduction of the XIR tree was to write tests for this (which are sloppy atm). This currently writes out the `xmle` header and _most_ of the `l:dep` section; it's missing the object-type-specific attributes. There is, relatively speaking, not much more work to do here. The feature flag `wip-xir-xmle-writer` was introduced to toggle this system in place of `XmleWriter`. Initial benchmarks show that it will be competitive with the quick-xml-based writer, but remember that is not the goal: the purpose of this is to test XIR in a production system before we continue to implement it for a frontend, and to refactor so that we do not have multiple implementations writing XML files (once we echo the source XML files). I'm excited to get this done with so that I can move on. This has been rather exhausting.
* tamer: Remove Ix generalization throughout systemMike Gerwitz2021-09-231-6/+5
| | | | | | | | | | | | | | | | | This had the writing on the wall all the same as the `'i` interner lifetime that came before it. It was too much of a maintenance burden trying to accommodate both 16-bit and 32-bit symbols generically. There is a situation where we do still want 16-bit symbols---the `Span`. Therefore, I have left generic support for symbol sizes, as well as the different global interners, but `SymbolId` now defaults to 32-bit, as does `Asg`. Further, the size parameter has been removed from the rest of the code, with the exception of `Span`. This cleans things up quite a bit, and is much nicer to work with. If we want 16-bit symbols in the future for packing to increase CPU cache performance, we can handle that situation then in that specific case; it's a premature optimization that's not at all worth the effort here.
* tamer: tameld: Use buffered writesMike Gerwitz2021-08-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was an oversight. The difference is significant. I had my suspicions about this when I noticed the huge difference in time between writing to /dev/null vs. an actual file during profiling. On one of our systems, here's the number of syscalls _before_ this change: $ strace -c target/release/tameld --emit xmle -o foo foo.xmlo % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 85.05 4.966192 16 318473 write 7.23 0.421977 13 32298 lstat 6.53 0.381424 15 25113 read 0.75 0.043691 13 3350 readlink 0.25 0.014713 61 241 close 0.12 0.007167 30 241 openat 0.05 0.003175 151 21 munmap 0.01 0.000488 14 35 brk 0.01 0.000292 9 33 mmap 0.00 0.000266 38 7 mremap 0.00 0.000004 1 3 sigaltstack 0.00 0.000000 0 6 fstat 0.00 0.000000 0 1 poll 0.00 0.000000 0 11 mprotect 0.00 0.000000 0 7 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 6 6 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 set_robust_list 0.00 0.000000 0 2 prlimit64 ------ ----------- ----------- --------- --------- ---------------- 100.00 5.839389 379854 6 total And _after_: $ strace -c target/release/tameld --emit xmle -o foo foo.xmlo % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.21 0.435010 13 32298 lstat 40.09 0.385752 15 25113 read 6.14 0.059113 21 2809 write 4.75 0.045687 14 3350 readlink 2.51 0.024115 100 241 close 0.84 0.008045 33 241 openat 0.26 0.002468 118 21 munmap 0.06 0.000580 17 35 brk 0.06 0.000566 17 33 mmap 0.03 0.000279 40 7 mremap 0.02 0.000181 16 11 mprotect 0.01 0.000087 15 6 6 access 0.01 0.000082 12 7 rt_sigaction 0.01 0.000075 13 6 fstat 0.00 0.000027 9 3 sigaltstack 0.00 0.000024 12 2 prlimit64 0.00 0.000018 18 1 execve 0.00 0.000016 16 1 poll 0.00 0.000013 13 1 sched_getaffinity 0.00 0.000012 12 1 rt_sigprocmask 0.00 0.000012 12 1 arch_prctl 0.00 0.000012 12 1 set_robust_list 0.00 0.000011 11 1 set_tid_address ------ ----------- ----------- --------- --------- ---------------- 100.00 0.962185 64190 6 total What a difference! There's still a lot of other red flags in there; those can be addressed separately. This was originally written as I was learning Rust, and I suspect that I didn't realize that File wasn't buffered at the time. For the above link: times go from 1.23s pre-change to 0.85s after: 0.77user 0.44system 0:01.23elapsed 99%CPU (0avgtext+0avgdata 48520maxresident)k 0inputs+43952outputs (0major+12825minor)pagefaults 0swaps 0.69user 0.15system 0:00.85elapsed 98%CPU (0avgtext+0avgdata 48396maxresident)k 0inputs+43952outputs (0major+12823minor)pagefaults 0swaps
* tamer: Global internersMike Gerwitz2021-08-111-41/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a major change, and I apologize for it all being in one commit. I had wanted to break it up, but doing so would have required a significant amount of temporary work that was not worth doing while I'm the only one working on this project at the moment. This accomplishes a number of important things, now that I'm preparing to write the first compiler frontend for TAMER: 1. `Symbol` has been removed; `SymbolId` is used in its place. 2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer. 3. Using symbols no longer requires dereferencing. 4. **Lifetimes no longer pollute the entire system! (`'i`)** 5. Two global interners are offered to produce `SymbolStr` with `'static` lifetimes, simplfiying lifetime management and borrowing where strings are still needed. 6. A nice API is provided for interning and lookups (e.g. "foo".intern()) which makes this look like a core feature of Rust. Unfortunately, making this change required modifications to...virtually everything. And that serves to emphasize why this change was needed: _everything_ used symbols, and so there's no use in not providing globals. I implemented this in a way that still provides for loose coupling through Rust's trait system. Indeed, Rustc offers a global interner, and I decided not to go that route initially because it wasn't clear to me that such a thing was desirable. It didn't become apparent to me, in fact, until the recent commit where I introduced `SymbolIndexSize` and saw how many things had to be touched; the linker evolved so rapidly as I was trying to learn Rust that I lost track of how bad it got. Further, this shows how the design of the internment system was a bit naive---I assumed certain requirements that never panned out. In particular, everything using symbols stored `&'i Symbol<'i>`---that is, a reference (usize) to an object containing an index (32-bit) and a string slice (128-bit). So it was a reference to a pretty large value, which was allocated in the arena alongside the interned string itself. But, that was assuming that something would need both the symbol index _and_ a readily available string. That's not the case. In fact, it's pretty clear that interning happens at the beginning of execution, that `SymbolId` is all that's needed during processing (unless an error occurs; more on that below); and it's not until _the very end_ that we need to retrieve interned strings from the pool to write either to a file or to display to the user. It was horribly wasteful! So `SymbolId` solves the lifetime issue in itself for most systems, but it still requires that an interner be available for anything that needs to create or resolve symbols, which, as it turns out, is still a lot of things. Therefore, I decided to implement them as thread-local static variables, which is very similar to what Rustc does itself (Rustc's are scoped). TAMER does not use threads, so the resulting `'static` lifetime should be just fine for now. Eventually I'd like to implement `!Send` and `!Sync`, though, to prevent references from escaping the thread (as noted in the patch); I can't do that yet, since the feature has not yet been stabalized. In the end, this leaves us with a system that's much easier to use and maintain; hopefully easier for newcomers to get into without having to deal with so many complex lifetimes; and a nice API that makes it a pleasure to work with symbols. Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I end up regretting that down the line, but it exists for an important reason: the `Span` and other structures that'll be introduced need to pack a lot of data into 64 bits so they can be freely copied around to keep lifetimes simple without wreaking havoc in other ways, but a 32-bit symbol size needed by the linker is too large for that. (Actually, the linker doesn't yet need 32 bits for our systems, but it's going to in the somewhat near future unless we optimize away a bunch of symbols...but I'd really rather not have the linker hit a limit that requires a lot of code changes to resolve). Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid that for now. Most systems can just use on of the `PkgSymbolId` or `ProgSymbolId` type aliases and not have to worry about it. Systems that are actually shared between the compiler and the linker do, though, but it's not like we don't already have a bunch of trait bounds. Of course, as we implement link-time optimizations (LTO) in the future, it's possible most things will need the size and I'll grow frustrated with that and possibly revisit this. We shall see. Anyway, this was exhausting...and...onward to the first frontend!
* tamer: Remove default SymbolIndex (et al) index typeMike Gerwitz2021-07-291-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oh boy. What a mess of a change. This demonstrates some significant issues we have with Symbol. I had originally modelled the system a bit after Rustc's, but deviated in certain regards: 1. This has a confurable base type to enable better packing without bit twiddling and potentially unsafe tricks I'd rather avoid unless necessary; and 2. The lifetime is not static, and there is no global, singleton interner; and 3. I pass around references to a Symbol rather than passing around an index into an interner. For #3---this is done because there's no singleton interner and therefore resolving a symbol requires a direct reference to an available interner. It also wasn't clear to me (and still isn't, in fact) whether more than one interner may be used for different contexts. But, that doesn't preclude removing lifetimes and just passing around indexes; in fact, I plan to do this in the frontend where the parser and such will have direct interner access and can therefore just look up based on a symbol index. We could reserve references for situations where exposing an interner would be undesirable. Anyway, more to come...
* Copyright year update 2021Mike Gerwitz2021-07-221-1/+1
|
* [DEV-8000] ir::asg: Introduce SortableAsgErrorMike Gerwitz2020-07-011-3/+3
| | | | | | | | | | | | This will be used for the next commit, but this change has been isolated both because it distracts from the implementation change in the next commit, and because it cleans up the code by removing the need for a type parameter on `AsgError`. Note that the sort test cases now use `unwrap` instead of having `{,Sortable}AsgError` support one or the other---this is because that does not currently happen in practice, and there is not supposed to be a hierarchy; they are siblings (though perhaps their name may imply otherwise).
* [DEV-7504] Add GraphML generationJoseph Frazer2020-05-131-3/+55
| | | | | | | | We want to be able to build a representation of the dependency graph so we can easily inspect it. We do not want to make GraphML by default. It is better to use a tool. We use "petgraph-graphml".
* [DEV-7084] TAMER: obj::xmlo: Private inner modulesMike Gerwitz2020-04-281-2/+1
|
* [DEV-7084] TAMER: AsgBuilderState::new: New constructorMike Gerwitz2020-04-281-1/+1
|
* [DEV-7084] TAMER: AsgBuilderError: Introduce proper error variantsMike Gerwitz2020-04-281-3/+3
| | | | | | | | | This is a union (sum type) of three other errors types, plus errors specific to this builder. This commit does a good job demonstrating the boilerplate, as well as a need for additional context (in the case of `IdentKindError`), that we'll want to work on abstracting away.
* [DEV-7084] TAMER: xmlo::AsgBuilder: Accept XmloResult iteratorMike Gerwitz2020-04-281-1/+1
| | | | | | | This flips the API from using XmloWriter as the context to using Asg and consuming anything that can produce XmloResults. This not only makes more sense, but avoids having to create a trait for XmloReader, and simplifies the trait bounds we have to concern ourselves with.
* [DEV-7084] TAMER: Simplify path canonicalizationMike Gerwitz2020-04-281-11/+7
| | | | | | | This abstracts away the canonicalizer and solves the problem whereby canonicalization was not being performed prior to recording whether a path has been visited. This ensures that multiple relative paths to the same file will be properly recognized as visited.
* [DEV-7084] TAMER: ld::poc: Remove unused fragments argMike Gerwitz2020-04-281-5/+2
|
* [DEV-7084] TAMER: ld::poc: Remove unnecessary initial path canonicalizationMike Gerwitz2020-04-281-3/+1
| | | | Less to refactor and test.
* [DEV-7084] TAMER: AsgBuilderStateMike Gerwitz2020-04-281-40/+23
| | | | | This completes the POC extraction for AsgBuilder, but is still POC code. The commits that follow will clean it up and provide tests.
* [DEV-7084] TAMER: AsgBuilder extracted from POCMike Gerwitz2020-04-281-106/+14
| | | | | This extracts the changes nearly verbatim before doing refactoring so that it's easier to observe what changes have been made.
* [DEV-7084] TAMER: fs: impl File for BufReaderMike Gerwitz2020-04-281-3/+2
| | | | This further simplifies the POC linker.
* [DEV-7084] TAMER: CanonicalFileMike Gerwitz2020-04-281-4/+8
| | | | | | This will be entirely replaced in an upcoming commit. See that for details. I don't feel like dealing with the conflicts for rearranging and squashing these commits.
* [DEV-7084] TAMER: fs: Basic filesystem abstractionMike Gerwitz2020-04-281-15/+12
| | | | | | | | | This also includes an implementation to visit paths only once. Note that it does not yet canonicalize the path before visiting, so relative paths to the same file can slip through, and relative paths to _different_ files could be erroneously considered to have been visited. This will be fixed in an upcoming commit.
* [DEV-7084] TAMER: From<B, &I> for XmloReaderMike Gerwitz2020-04-201-1/+1
| | | | | | This serves as a constructor for the time being, decoupling from POC. We may do something better once we have a better idea of how the various abstractions around this will evolve.
* [DEV-7086] TAMER: Remove WIP linker warningMike Gerwitz2020-04-061-2/+0
| | | | | While it is true that this is still being finalized, the warnings originally existed because tameld was not feature complete. It is now.
* [DEV-7087] TAMER: Remote optional Source from ASG and ObjectMike Gerwitz2020-03-261-5/+2
| | | | | | | | | This undoes work I did earlier today...but now we'll be able to support a Source on an extern. There is duplicate code between `BaseAsg::declare{,_extern}` that will be resolved in an upcoming commit. Upcoming commits will also simplify terminology and clean up methods on ObjectState.
* [DEV-7087] TAMER: Asg: Reintroduce declare_externMike Gerwitz2020-03-261-16/+19
| | | | | | | There is some duplication here with `declare` that will be cleared up in a following commit. Reintroducing this method is necessary so that Source can be used to represent the source location of the extern itself; it's currently None to indicate an extern in `declare`.
* [DEV-7087] TAMER: Type compatability check during extern resolutionMike Gerwitz2020-03-261-10/+12
| | | | | | | | | | This properly verifies extern types, and cleans up Asg's API a little so that externs aren't handled much differently than other declarations. With that said, after making src optional, I realized that we will indeed want source information for externs themselves so we can direct the user to what package is expecting that symbol (as the old linker does). So this approach will not work, and I'll have to undo some of those changes.
* [DEV-7133] Clearly show the cycles in the outputJoseph Frazer2020-03-261-3/+28
|
* [DEV-7087] TAMER: {=>Ident}Object{,State,Data}Mike Gerwitz2020-03-241-4/+4
| | | | | | This is essential to clarify what exactly the different object types represent with the new generic abstractions. For example, we will have expressions as an object type.
* TAMER: Make Asg generic over objectMike Gerwitz2020-03-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | There's a lot here to make the object stored on the `Asg` generic. This introduces `ObjectState` for state transitions and `ObjectData` for pure data retrieval. This will allow not only for mocking, but will be useful to enforce compile-time restrictions on the type of objects expected by the linker vs. the compiler (e.g. the linker will not have expressions). This commit intentionally leaves the corresponding tests in their original location to prove that the functionality has not changed; they'll be moved in a future commit. This also leaves the names as "Object" to reduce the number the cognative overhead of this commit. It will be renamed to something like "IdentObject" in the near future to clarify the intent of the current object type and to open the way for expressions and a type that marries both of them in the future. Once all of this is done, we'll finally be able to make changes to the compatibility logic in state transitions to implement extern compatibility checks during resolution. DEV-7087
* TAMER: ld/poc: Simplify {get_interner_value=>get_ident}Mike Gerwitz2020-03-191-27/+17
|
* [DEV-7085] Create `SortableAsg` traitJoseph Frazer2020-03-131-56/+4
| | | | | | Create a trait that sorts a graph into `Sections` that can then be used as an IR. The `BaseAsg` should implement the trait using what was originally in the POC.
* [DEV-7085] Move sections to IR moduleJoseph Frazer2020-03-131-2/+2
| | | | | We need to use `Sections` in both the writer and the ASG so it needs to be in a place that makes sense.
* [DEV-7134] Propagate errors from the writerJoseph Frazer2020-03-091-15/+13
| | | | | When an error occurs during the XML writing, they should be shown to the user.
* [DEV-7134] Propagate sorting errorsJoseph Frazer2020-03-091-6/+12
| | | | | If a node is found while sorting that is not expected, we should show the error to the user.
* [DEV-7134] Propagate errors setting fragmentsJoseph Frazer2020-03-091-12/+12
| | | | | | | | If we cannot set a fragment, we need to display the error to the user. We are currently ignoring "___head", "___tail", and objects that are both virtual and overridden. Those will be corrected in with future changes.
* [DEV-7134] Pass read event errors up the stackJoseph Frazer2020-03-061-5/+1
|
* [DEV-7134] Return error for XmloEvent::SymDeclJoseph Frazer2020-03-061-2/+1
| | | | | We want more than warnings when a XmloEvent::SymDecl symbol has an unknown "kind".
* [DEV-7134] Add alias for LoadResultJoseph Frazer2020-03-061-1/+4
| | | | It looks better and was recommended by Rust's linter.
* [DEV-7134] Remove unwrap so we can bubble up error messagesJoseph Frazer2020-03-061-1/+1
|
* [DEV-7134] Escalate the error from finding the absolute pathJoseph Frazer2020-03-061-1/+1
| | | | | We do not want to have a panic here. The error should be displayed properly.
* Copyright year 2020 updateMike Gerwitz2020-03-061-1/+3
|
* [DEV-7081] Add options to tameldJoseph Frazer2020-03-061-10/+5
| | | | | | | We want to add an option to set the output file to the linker so we do not need to redirect output to awk any longer. This also adds integration tests for tameld.
* [DEV-7083] TAMER: xmle writerJoseph Frazer2020-03-031-269/+64
| | | | This introduces the writer for xmle files.
* TAMER: Separate static xmle sectionMike Gerwitz2020-02-261-1/+5
|
* TAMER: POC: Use FxHash to remove nondeterminismMike Gerwitz2020-02-261-7/+7
| | | | | The default SipHash is a cryptographic hash and causes ordering to change between runs.
* TAMER: xmle output changes to support Summary PageMike Gerwitz2020-02-261-47/+73
| | | | Co-Authored-By: Joseph Frazer <joseph.frazer@ryansg.com>
* TAMER: POC: Output xmleMike Gerwitz2020-02-261-15/+297
| | | | This is a working proof-of-concept that will be finalized in future commits.
* TAMER: Symbol source data and metadataMike Gerwitz2020-02-261-2/+4
|
* TAMER: Initial abstract semantic graph (ASG)Mike Gerwitz2020-02-261-234/+97
| | | | | | | This begins to introduce the ASG, backed by Petgraph. The API will continue to evolve, and Petgraph will likely be encapsulated so that our implementation can vary independently from it (or even remove it in the future).
* TAMER: xmlo: Add Package eventMike Gerwitz2020-02-251-0/+2
|