Mike Gerwitz

Free Software Hacker+Activist

aboutsummaryrefslogtreecommitdiffstats
blob: ebdd49be1861b80ca9100e34225decee42d8944c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
# GHCQ's "Exceptional Access", End-To-End Encryption, Decentralization, and Reproducible Builds

Late last November,
  Ian Levy and Crispin Robinson of the GHCQ (the British intelligence
  agency) published a proposal for intercepting end-to-end encrypted
  communications,
    entitled ["Principles for a More Informed Exceptional
    Access Debate"][proposal].
Since then,
  there have been a series of notable rebuttals to this proposal
  arguing why this system would fail in practice and why it should be
  rejected.
Completely absent from these responses, however,
  is any mention of existing practices that would prohibit this attack
  outright---the
    combination of free/libre software, reproducible builds, and
    decentralized or distributed services.

[proposal]: https://www.lawfareblog.com/principles-more-informed-exceptional-access-debate

<!-- more -->

This proposal is just the latest episode in the [crypto
  wars][crypto-wars]:
    Users need secure communications to protect their privacy and defend
      against attackers,
        but law enforcement and governments argue that this leaves them in
        the dark.
But this one's a bit different.
The proposal states:

[crypto-wars]: https://en.wikipedia.org/wiki/Crypto_wars

> The U.K. government strongly supports commodity encryption. The Director
> of GCHQ has publicly stated that we have no intention of undermining the
> security of the commodity services that billions of people depend upon
> and, in August, the U.K. signed up to the Five Country statement on access
> to evidence and encryption, committing us to support strong encryption
> while seeking access to data. [...] We believe these U.K. principles will
> enable solutions that provide for responsible law enforcement access with
> service provider assistance without undermining user privacy or security.

The suggestions in the article are a pleasant deviation from past proposals,
  such as [key escrow schemes][key-escrow-eff];
    in fact,
      it categorically denounces such schemes:

[key-escrow-eff]: https://www.eff.org/deeplinks/2015/04/clipper-chips-birthday-looking-back-22-years-key-escrow-failures

> There is no single solution to enable all lawful access, but we definitely
> don’t want governments to have access to a global key that can unlock any
> user’s data.  Government controlled global key escrow systems would be a
> catastrophically dumb solution in these cases.

So how do the authors propose intercepting communications?
They suggest inserting a third party---a
  "ghost", as others have been calling it---into
  the conversation.

To understand the implications of adding a third party to an
  end-to-end (E2E) encrypted protocol,
    you have to understand how end-to-end encryption usually works in
    practice.^[
      For another perspective,
        see [Matthew Green's overview][green-ghost] in his response to
        the GHCQ proposal.]

[green-ghost]: https://blog.cryptographyengineering.com/2018/12/17/on-ghost-users-and-messaging-backdoors/


## Undermining End-to-End Encrypted Communication Systems
Let's say that three users named Alice, Bob, and Carol wish to communicate
  with one-another privately.
There are many ways to accomplish this,
  but for the sake of this discussion,
    we need to choose a protocol that attempts to fit into the model that
      Levy and Robinson had in mind.
Alice and the others will make use of a centralized messaging service that
  relays messages on behalf of users.[^centralized]
Centralized services are commonplace and include popular services like
  Signal, WhatsApp, Facebook Messenger, iMessage, and many others.
They all work in slightly different ways,
  so to simplify this analysis,
  I'm going to talk about an imaginary messaging service called FooRelay.

[^centralized]: See section [The Problem With Centralized
    Services](#centralized-services).

FooRelay offers a directory service that allows participants to find
  one-another by name or pseudonym.
The directory will let Alice know if Bob and Carol are online.
FooRelay also offers private chat rooms supporting two or more participants.

Alice, Bob, and Carol don't want anyone else to know what they are
  saying---that
    includes FooRelay's servers,
    their Internet Service Providers (ISPs),
    their employers,
    their governments,
    or whomever else may be monitoring the network that any of them are
      communicating over.[^threat-model]
Fortunately for them,
  FooRelay makes use of _end-to-end encryption_.[^primitive-e2e]

[^threat-model]: The process of determining potential threats and
    adversaries is called [threat modeling][].
  Since this article is about a proposal from a government spy agency,
    it's also worth noting that global passive adversaries like the GHCQ and
    NSA have the ability to monitor and store global traffic with the hopes
    of later decrypting it.
  [I have written about pre-Snowden revelations][national-uproar],
    and [the EFF has compiled a bunch of information on NSA spying][eff-nsa].

[threat modeling]: https://en.wikipedia.org/wiki/Threat_model
[national-uproar]: /2013/06/national-uproar-a-comprehensive-overview-of-the-nsa-leaks-and-revelations
[eff-nsa]: http://eff.org/nsa-spying

[^primitive-e2e]: Here I will describe a fairly elementary public-key
    end-to-end encrypted protocol that omits many important features
      (most notably, forward secrecy).
  For detailed information on a modern and well-regarded key exchange
    protocol,
      see [X3DH][] (Extended Triple Diffie-Hellman),
        which is employed by Signal.
  Following a key agreement,
    the [Double Ratchet][] algorithm is widely employed for forward
      secrecy even in the event of a compromised session key.

[X3DH]: https://signal.org/docs/specifications/x3dh/
[Double Ratchet]: https://www.signal.org/docs/specifications/doubleratchet/

Alice, Bob, and Carol each hold secret encryption keys known only to
  them---their
    _private keys_,
      which are generated for them automatically by the FooRelay client
      software running on their systems.
These keys can be used to _decrypt_ messages sent to them,
  and can be used to _sign_ messages to assert their authenticity.
But these private keys must never be divulged to others,
  including FooRelay's servers.
Instead,
  each private key has a _public key_ paired with it.
The public key can be used to _encrypt_ messages that can only be decrypted
  using the associated private key.[^pke]
Alice, Bob, and Carol each publish their public keys into FooRelay's
  directory so that others may discover and use them.
When Alice wants to start a chat with Bob and Carol,
  she can ask FooRelay to provide their public keys from the directory.

[^pke]: This is called [_public-key cryptography_][public-key-crypto]
    (or _asymmetric encryption_).

[public-key-crypto]: https://en.wikipedia.org/wiki/Public-key_cryptography

But making the public keys available in a directory is only part of the
  problem---how
    do Alice, Bob, and Carol know that the keys published to the directory
    are actually associated with the _real_ Alice, Bob, and Carol?^[
      This topic is known as [_key distribution_][key-distribution].]
**This is the first opportunity to spy**,
  if FooRelay is poorly designed.

[key-distribution]: https://en.wikipedia.org/wiki/Key_distribution

As stated by the proposal:

> It’s relatively easy for a service provider to silently add a law
> enforcement participant to a group chat or call. The service provider
> usually controls the identity system and so really decides who’s who and
> which devices are involved - they’re usually involved in introducing the
> parties to a chat or call.  You end up with everything still being
> end-to-end encrypted, but there’s an extra ‘end’ on this particular
> communication.


### Man-in-the-Middle

Let's start by assuming a pretty grim scenario.
This is not quite the plan of attack that Levy and Robinson had in mind,
  but it's important to understand why it would not work in practice.

The FooRelay client software running on Alice's computer retrieves Bob's
  public key from the identity service and initiates a chat.
FooRelay's server creates a new private chat room to accommodate the
  request and adds two initial participants---Alice and Bob.
The FooRelay client then generates an invitation message containing
  the identifier of the new room,
    signs it using Alice's private key to prove that it was from Alice,
    and sends it off to FooRelay's servers.
FooRelay's server verifies Alice's signature to make sure that she is
  authorized to invite someone to the room,
    and then sends the invitation off to Bob.[^whatsapp-group-chat]

[^whatsapp-group-chat]: As it turns out,
    getting invitations right can be difficult too.
  [WhatsApp had a vulnerability that allowed for users to insert themselves
    into group conversations][whatsapp-vuln] because it didn't implement a
    similar protocol.
  A better defense would be for Bob to publish the invitation from Alice
    when he joins the room,
      allowing anyone else in the room (like Carol) to verify that he was
      invited by someone authorized to do.
  Only after verifying the invitation's signature would Carol decide to
    encrypt messages to him.

[whatsapp-vuln]: https://techcrunch.com/2018/01/10/security-researchers-flag-invite-bug-in-whatsapp-group-chats/

Bob is also running the FooRelay client on his computer.
It receives the invitation from Alice,
  looks up her public key from the identity service,
  and uses it to verify the signature on the invitation to make sure it
    originated from Alice.
If the signature checks out,
  FooRelay asks Bob if he'd like to join the chat.
Bob accepts.

Alice enters a message into the FooRelay client to send to the chat room.
But remember,
  Alice does not want the FooRelay server to know what message is being
  sent.
So the FooRelay client on Alice's computer encrypts the message using Bob's
  pubic key,
    signs it using Alice's private key to assert that it was from her,
    and sends it.
The FooRelay server---and
  anyone else watching---see
  junk data.
But Bob,
  upon receiving the message and verifying its signature,
  is able to decrypt and read it using his private key.[^sending]

[^sending]: This is omitting many very important details that are necessary
    for a proper implementation.
  While this portrayal isn't necessarily dishonest at a high level,
    there is a lot more that goes into sending a message.
  See information on the [Double Ratchet][] algorithm for information on
    one robust way to handle this exchange.

**Now let's explore how to intercept communications.**
Enter Mallory.
Mallory works for the GHCQ.
FooRelay has been provided with a wiretap order against Carol.

Alice wants to bring Carol into the conversation with her and Bob,
  so she requests Carol's key from the identity service.
FooRelay's identity service,
  subject to the wiretap order,
  doesn't return Carol's public key;
    instead, it returns Mallory's,
      _who is pretending to be Carol_.
Alice sends the invitation to Mallory
  (again, thinking he's Carol),
  and the fake Carol (Mallory) joins the room.
Now when sending a message,
  Alice encrypts using both Bob and Mallory's public keys,
  so both of them can read it.

But when Alice and Carol meet up tomorrow for lunch,
  it will be pretty clear that Carol was not part of the conversation.
So Mallory is clever---he
  has FooRelay provide him with Carol's _real_ public key.
When Alice sends Mallory an invitation to the room,
  Mallory instructs FooRelay to create a covert _fake_ chat room with the
    same identifier.
Mallory then sends an invitation to Carol to that new chat room,
  _pretending to be Alice_.
But Mallory doesn't have access to Alice's private key,
  and so cannot sign it as her;
    he instead signs it using his own private key.

FooRelay on Carol's computer receives the invitation,
  which claims to be from Alice
  (but is really from Mallory).
When it attempts to retrieve the key from the identity service,
  rather than receiving Alice's key,
  _the identity service sends back Mallory's_.
Now Mallory is impersonating _both_ Alice and Carol.
The signature checks out,
  and Carol joins the covert chat.
FooRelay---still
  under the wiretap order---announces
  that Alice and Bob are both in the room,
    even though they aren't.

Now,
  when Mallory receives a message from Alice that is intended for Carol,
  he encrypts it using Carol's public key,
  signs it using his own,
  and sends it off to Carol.
Since Carol's FooRelay client thinks that Mallory's key is Alice's
  (remember the invitation?),
  the signature checks out and she happily decrypts the message and
  reads it.
If Bob sends a message,
  we repeat the same public key lookup procedure---FooRelay's identity
    service lies and provides Mallory's key instead,
      and Mallory proxies the message all the same.^[
        Of course,
          it may be suspicious if Alice and Bob both have the same key,
            so maybe Mallory has multiple keys.
        Or maybe the FooRelay software just doesn't care.]

This is a [man-in-the-middle (MITM)][mitm] attack.
But notice how **the conversation is still fully end-to-end encrypted**,
  between each of Alice, Bob, Carol, and Mallory.

[mitm]: https://en.wikipedia.org/wiki/Man-in-the-middle_attack

Why is this attack possible?
Because FooRelay has not offered any insight into the identity
  process---there
    is no _authentication_ procedure.
Blind trust is placed in the directory,
  which in this case has been compromised.


#### Mutual Authentication

If the FooRelay client allowed Alice, Bob, and Carol to inspect each others'
  public keys by displaying a [public key "fingerprint"][fingerprint],
    then that would have immediately opened up the possibility for them to
      discover that something odd was going on.
For example,
  if Alice and Carol had previously communicated before Mallory was
    involved,
      then maybe they would notice that the fingerprint changed.
If they met _after_ the fact,
  they would notice that the fingerprint Alice had for Carol was not the
    fingerprint that Carol had for _herself_.
Maybe they would notice---perhaps
  by communicating in person---that
  the fingerprint that Alice associated with Carol and the fingerprint that
    Carol associated with Alice were in fact the same (that is, Mallory's).

[fingerprint]: https://en.wikipedia.org/wiki/Key_fingerprint

To mitigate the first issue,
  Mallory would have to MITM communications from the moment that Carol first
  signed up for FooRelay,
    and permanently thereafter.
The second could not be mitigated unless Mallory compromised Carol's device,
  or FooRelay cooperated with Mallory to plant a defective FooRelay client
  on Carol's device.
To mitigate the third,
  maybe Mallory would use separate keys.
But if Alice, Bob, or Carol ever compared public keys in person with someone
  else that was outside of their group of three,
    then they would notice that the fingerprints did not match.
So FooRelay would have to always provide the wrong key to _everyone_ trying
  to communicate with Carol,
    and for _everyone_ Carol tried to communicate with,
    in perpetuity---an
      everlasting wiretap.

This issue of mutual authentication is another complex topic that is very
  difficult to solve in a manner that is convenient for users.[^wot]
For example,
  Alice, Bob, and Carol could all meet in person and verify that
  one-anothers' fingerprints look correct.
Or they could post their fingerprints to something outside of FooRelay's
  control,
    like social media.
This is the ["safety number"][safety-number] concept that Signal employs.

[^wot]: One distributed model of assoicating a key with an owner is PGP's
    [Web of Trust][wot],
      which has been in use since the 1990s.
  While it does enjoy use in certain communities,
    it has failed to take off with average users due to the [complexities of
    implementing the model properly][debian-keysign].
  PGP's author also came up with short authentication string (SAS)
    authentication protocol for VoIP systems called [ZRTP][],
      but it relies on users being able to identify the authenticity of
        one-anothers' voices,
          a luxury that may be undermined in the near future by speech
            synthesis systems [trained to reproduce real voices][ss-deep].

[safety-number]: https://signal.org/blog/safety-number-updates/
[wot]: https://en.wikipedia.org/wiki/Web_of_trust
[debian-keysign]: https://wiki.debian.org/Keysigning/
[zrtp]: https://en.wikipedia.org/wiki/ZRTP
[ss-deep]: https://en.wikipedia.org/wiki/Speech_synthesis#Deep_learning

FooRelay could also implement a [trust-on-first-use (TOFU)][tofu]
  policy---the
    client software would remember the last public key that it saw for a
      user,
        and if that key ever changed,
          then a prominent warning would be displayed.[^ssh-tofu]
For example,
  if Alice communicates once with the real Carol,
  the TOFU policy in the FooRelay client would record that real public key.
Then,
  when Mallory tries to MITM the conversation,
  Alice's FooRelay client would say:
    "Hold up; the key changed!  Something is wrong!"

[tofu]: https://en.wikipedia.org/wiki/Trust_on_first_use

[^ssh-tofu]: SSH users, for example, may be familiar with the almost-violent
    warning when the server fingerprint changes.
  Server fingerprints are stored in `~/.ssh/known_hosts` the first time they
    are contacted,
      and those fingerprints are used for verification on all subsequent
      connection attempts.

In any case,
  let's assume that FooRelay's cooperation in serving up the wrong public
  key is no longer sufficient because of these mitigations.
What does Mallory do without the ability to MITM?

No respectable communication software should be vulnerable to this sort of
  attack.
Knowing this,
  Levy and Robinson had a different type of attack in mind.


### A Ghost in the Room

Back when most people used land lines for communication via telephone,
  wiretapping was pretty easy.
Conversations were transmitted in an unencrypted,
  analog form;
    anyone could listen in on someone else's conversation if they had some
    elementary technical know-how and knew where to apply it.
By severing or exposing the line at any point,
  an eavesdropper could attach [alligator clips][]---or
    "crocodile clips", if you're east of the Atlantic---to
    route the analog signal to another phone or listening device.

[alligator clips]: https://en.wikipedia.org/wiki/Crocodile_clip

Levy and Robinson try to apply this same concept as a metaphor for Internet
  communications,
    presumably in an effort to downplay its significance.
But the concepts are very different.
Continuing from the previous quote of Levy and Robinson's proposal:

> This sort of solution seems to be no more intrusive than the virtual
> crocodile clips that our democratically elected representatives and
> judiciary authorise today in traditional voice intercept solutions and
> certainly doesn’t give any government power they shouldn’t have.
>
> We’re not talking about weakening encryption or defeating the end-to-end
> nature of the service. In a solution like this, we’re normally talking
> about suppressing a notification on a target’s device, and only on the
> device of the target and possibly those they communicate with. That’s a
> very different proposition to discuss and you don’t even have to touch the
> encryption.

This statement is disingenuous.
We can implement the quoted suggestion in two different ways:
The first is precisely the situation that was just previously
  described---allow
    MITM and remain ignorant about it.
The second way is to have the FooRelay server _actually invite Mallory_ to
  the chat room,
    but _have the FooRelay client hide him from other participants_.
**He would be a ghost in the room;**
  nobody would see him,
    but Alice, Bob, and Carol's FooRelay software would each surreptitiously
    encrypt to him using his public key,
      as a third recipient.

Sure,
  the actual ciphers used to encrypt the communications are not weakened.
Sure,
  it is still end-to-end encrypted.
But this is _nothing_ like alligator clips on a phone line---instead,
  _an anti-feature has been built into the software_.
As the EFF notes,
  [this is just a backdoor by another name][eff-ghost].

[eff-ghost]: https://www.eff.org/deeplinks/2019/01/give-ghost-backdoor-another-name

If software has to be modified to implement this backdoor,
  then it has to either be done for _every_ user of FooRelay,
    or individual users have to be targeted to install a malicious version
    of the program.
If either of these things are possible,
  then _everyone_ is made less secure.
What if a malicious actor figures out how to exploit either of those
  mechanisms for their own purposes?
Or what if someone tricks FooRelay into thinking they're from the GHCQ?

And since this is a backdoor in the software running on the user's computer,
  it is very difficult to be covert.
Nate Cardozo and Seth Schoen of the Electronic Frontier Foundation
  [analyze various ways to detect ghosts][detect-ghosts],
    which would tip Alice, Bob, and Carol off that Mallory is watching them.

[detect-ghosts]: https://www.lawfareblog.com/detecting-ghosts-reverse-engineering-who-ya-gonna-call

This is bad,
  and everyone knows it.
The proposal is a non-starter.
But this shouldn't be the end of the conversation---there
  is a much more fundamental issue is at play which has received no
    attention from the mainstream responses.


## Betrayed By Software {#betrayed}
All of these mainstream discussions make an implicit assumption:
  _that users are not in control of the software running on their systems_.
The [detection methods][detect-ghosts] are discussed in terms of binary
  profiling and side-channels.
[The GHCQ's proposal itself][proposal] fundamentally relies on the software
  being modified in ways that are a disservice to the user---adding
    a backdoor that surreptitiously exfiltrates messages to a third
    party (Mallory) without the consent of other participants (Alice, Bob,
    or Carol).

When a user has full control over their software---when
  they have the freedom to use, study, modify, and share it as they
    please---we
  call it [_free software_][free-sw].
If FooRelay's client were free software,
  then Alice, Bob, and Carol would all have the right to inspect it to make
  sure no nasty backdoors were added,[^proprietary-malware]
    or ask someone else to inspect it for them.
Or maybe they could depend on the fact that many other people are
  watching---essentially
    anyone in the world could at any moment look at FooRelay's client source
    code.
This helps to keep FooRelay honest---if
  they _did_ implement a feature that suppresses notifications as Levy and
    Robinson suggest,
      then they would have done so in plain sight of everyone,
        and they would immediately lose the trust of their users.

[free-sw]: https://www.gnu.org/philosophy/free-sw.en.html

[^proprietary-malware]: Unfortunately,
    [proprietary (non-free) software is often malware][proprietary-malware],
      hiding things that work in the interests of its developers but
        _against_ the interests of its users.

[proprietary-malware]: https://www.gnu.org/philosophy/proprietary.html

FooRelay could try make the change in a plausibly deniable way---to
  make the change look like a bug---but
  then _anyone with sufficient skill in the community could immediately fix
    it_ and issue a patch.
That patch could be immediately circulated and adopted by other users
  without the blessing of FooRelay itself.
If FooRelay didn't implement that patch,
  then users would [_fork_][software-fork] it,
    making their own version and ditching FooRelay entirely.
Forking is a commonly exercised and essential right in the free
  software community.

[software-fork]: https://en.wikipedia.org/wiki/Software_fork

The popular program Signal is free software.[^moxie-signal]
The [OMEMO specification][omemo]---which
  implements many of the encryption standards that were developed by
    Signal---is
  also [implemented by multiple free software projects][omemo-yet],
    some of which include [Pidgin][] (GNU/Linux, Windows, Mac OSX),
    [Conversations][] (Android),
    [ChatSecure][] (iOS),
    and [Gajim][] (GNU/Linux, Windows).

[omemo]: https://conversations.im/omemo/
[omemo-yet]: https://omemo.top/
[pidgin]: https://pidgin.im/
[conversations]: https://conversations.im/
[chatsecure]: https://chatsecure.org/
[gajim]: https://gajim.org/

[^moxie-signal]: Unfortunately,
    its author has caused some friction in the free software community by
    [strongly discouraging forks and saying they are unwelcome to connect to
    Signal's servers][moxie-fdroid].
  This also relates to the issue of centralization,
    which is the topic of the next section;
      Moxie [explains in a blog post why he disagrees with a federated
      Signal][moxie-federation].

[moxie-fdroid]: https://github.com/LibreSignal/LibreSignal/issues/37
[moxie-federation]: https://signal.org/blog/the-ecosystem-is-moving/


If a program does not respect users' freedoms,
  we call it _non-free_, or _proprietary_.
**Most of the popular chat programs today are non-free**:
  Apple iMessage, Facebook Messenger, and WhatsApp are all examples of
  programs that keep secrets from their users.
Those communities are unable to inspect the program,
  or modify it to remove anti-features;
    they are at the mercy of the companies that write the software.

For example,
  a recent [bug in Apple's FaceTime][facetime-vuln] left users
  vulnerable to surveillance by other FaceTime users.
FaceTime likely has hundreds of thousands of users.
If it were free software and only a tiny fraction of those users actually
  inspected the source code,
    it's possible that somebody would have noticed and maybe even fixed the
    bug before it was exploited.[^bugs-shallow]
Further,
  after it _was_ discovered,
  users had no choice but to wait for Apple themselves to issue a fix,
    which didn't come until a week later.
The person who did discover it [tried to contact Apple with no
  success][bad-apple],
    and the world only found out about the issue when a video demoing the
      exploit went viral eight days after its initial discovery.
This differs from free software communities,
  where bugs are typically posted to a public mailing list or bug tracker,
    where anybody in the community can both view and immediately act upon
    it.[^embargo]

[facetime-vuln]: https://9to5mac.com/2019/01/28/facetime-bug-hear-audio/
[bad-apple]: https://www.wsj.com/articles/teenager-and-his-mom-tried-to-warn-apple-of-facetime-bug-11548783393

[^bugs-shallow]: This is often cited as [Linus's Law][linus-law],
    which states that "given enough eyeballs, all bugs are shallow".
  While this may be true,
    that is certainly not always the case.
  It is a common argument in support of open source,
    [which covers the same class of software][floss-class] as free software.
  However,
    it's important not to fixate too much on this argument---it
      [misses the point of free software][oss-misses-point],
        and is a shallow promise,
          since open source software is not always superior in technical
          quality to proprietary software.

[linus-law]: https://en.wikipedia.org/wiki/Linus's_Law
[floss-class]: https://www.gnu.org/philosophy/free-open-overlap.html
[oss-misses-point]: https://www.gnu.org/philosophy/open-source-misses-the-point.html

[^embargo]: Sometimes an exception is made for severe security
    vulnerabilities.
  For example,
    the [`linux-distros` mailing list][linux-distros] is used to coordinate
      security releases amongst GNU/Linux distributions,
        imposing an embargo period.
  This practice ensures that exploits are not made publicly available to
    malicious actors before users are protected.

[linux-distros]: https://oss-security.openwall.org/wiki/mailing-lists/distros

But free software alone isn't enough.
How does Alice know that she _actually_ has the source code to the
  program that she is running?


### Reproducibility and Corresponding Source Code {#reproducibility}

The source code to FooRelay can't provide Alice with any security
  assurances unless she can be confident that it is _actually_
  the source code to the binary running on her machine.
For example,
  let's say that FooRelay has agreed to cooperate with the GHCQ to implement
    ghosts by introducing a backdoor into the FooRelay client.
But since FooRelay is a free software project,
  anyone can inspect it.
Rather than tipping off the community by publishing the _actual_ source
  code,
    _they publish the source code for a version that does not have the
    backdoor_.
But when Alice downloads the compiled (binary) program from FooRelay,
  she receives a backdoored version.

To mitigate this,
  **Alice wants to be sure that she has the _corresponding source code_**.

One way for Alice to be confident is for her to compile the FooRelay client
  herself from the source code.
But not everybody has the technical ability or desire to do
  this.[^bootstrap]
Most users are instead going to download binaries from their operating
  system's software repositories,
    or from FooRelay's website,
    or maybe even from other convenient third parties.
How can _all_ users be confident that the FooRelay client they download
  actually corresponds to the source code that has been published and vetted
  by the community?

[^bootstrap]: And then you have the issue of ensuring that you have the
    corresponding source to the rest of your system so that it does not
    [alter the behavior of the produced binary][trusting-trust].
  System-wide reproducibility is the topic of [_bootstrappable
    builds_][bootstrappable-builds].

[trusting-trust]: https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf
[bootstrappable-builds]: http://bootstrappable.org/http://bootstrappable.org/

[_Reproducible builds_][reproducible-builds] are required to solve this
  problem.
When FooRelay is built,
  it is done so in a manner that can be completely reproduced by others.
Bit-for-bit reproducibility means that,
  if two people on different systems follow the same instructions for
  building a program in similar enough environments,
    every single bit of the resulting binary will match---
      they will be exact copies of one-another.[^unreproducible]

[reproducible-builds]: http://reproducible-builds.org/

[^unreproducible]: Additional effort often has to be put into building
    reproducibly because a build may produce timestamps corresponding to the
    time of the build,
      information specific to the environment in which the program is being
        built,
      and various other sources of nondeterminism.

This has powerful consequences.
Alice no longer has to build the program herself---she
  can trust that others have checked FooRelay's work.
FooRelay wouldn't dare try to distribute a tainted binary now,
  since the community could trivially detect it.
Further,
  Alice, Bob, and Carol could all verify that they have the _exact same
  verison_ of the FooRelay client,
    and _all_ be confident that it was compiled from the same source code
    that was published.[^verify-checksum]
They could even accept FooRelay from complete strangers and _still_ be
  confident that it was compiled from the published source code!

[^verify-checksum]: This verification can be done trivially by verifiying
    the _checksum_ of a program or distribution archive.
  For example,
    running `sha512sum foorelay` on a GNU/Linux system would output a _hash_
    of the contents of the file `foorelay`.
  Alice, Bob, and Carol could then compare this value if they are all
    running the same operating system and CPU architecture.
  Otherwise they can compare it published checksums,
    or with others they trust.

Reproducible builds have made a lot of progress in recent years.
As of February 2019,
  for example,
  [over 93% of all packages on Debian GNU/Linux are reproducible on the
    `amd64` architecture][debian-reproducible],
      which includes the aforementioned Pidgin and Gajim projects that
      implement OMEMO.
[Signal also offers a reproducible Android build][signal-reproducible].

[debian-reproducible]: https://tests.reproducible-builds.org/debian/reproducible.html
[signal-reproducible]: https://signal.org/blog/reproducible-android/


So let's go back to Levy and Robinson's proposal.
How do you implement a ghost in FooRelay where its client source code is
  publicly available and its builds are reproducible?
You don't,
  unless you can hide the implementation in a plausibly-deniable way and
  write it off as a bug.
But anyone that finds that "bug" will fix it and send FooRelay a patch,
  which FooRelay would have no choice but to accept unless it wishes to lose
  community trust (and provoke a fork).

Mallory could instead target specific users and compromise them
  individually,
    but this goes beyond the original proposal;
      if Mallory can cause Alice, Bob, or Carol to run whatever program he
        pleases,
          then he doesn't need to be a ghost---he
            can just intercept communications _before they are encrypted_.
Therefore,
  reproducible builds---if
    done correctly---make
    Levy and Robinson's attack risky and impractical long-term.

But there is still one weak link---the
  fact that Alice, Bob, and Carol are communicating with FooRelay's servers
  at all means that Mallory still has the ability to target them by coercing
  FooRelay to cooperate with him.


## The Problem With Centralized Services {#centralized-services}

The final issue I want to discuss is that of centralized services.

A centralized service is one where all users communicate through one central
  authority---all
    messages go through the same servers.
The hypothetical FooRelay is centralized.
Signal, iMessage, Facebook Messenger, WhatsApp, and many other popular chat
  services are centralized.
And while this offers certain conveniences for users,
  it also makes certain types of surveillance trivial to perform,
    as they are bountiful targets for attackers, governments, and law
    enforcement.

But services don't have to be centralized.
_Decentralized_ services contain many separate servers to
  which users connect,
    and those servers can communicate with one-another.
The term _"federated"_ is also used,
  most often when describing social networks.[^decentralize-term]
Consider email.
Let's say that Alice has an email address `alice@foo.mail` and Bob has an
  email address `bob@quux.mail`.
Alice uses `foo.mail` as her provider,
  but Bob uses `quux.mail`.
Despite this,
  Alice and Bob can still communicate with one-another.
This works because the `foo.mail` and `quux.mail` mailservers send and
  receive mail to and from one-another.

[^decentralize-term]: While the term "decentralized" has been around for
    some time,
      there's not really a solid agreed-upon definition for "federated".
  [Some people use the terms interchangeably][uu-federated].
  The term "federation" is frequently used when talking about social
    networking.

[uu-federated]: http://networkcultures.org/unlikeus/resources/articles/what-is-a-federated-network/

[XMPP][]---the protocol on which OMEMO is based---is
  a federated protocol.
Users can choose to sign up with existing XMPP servers,
  or they can even run their own personal servers.[^me-prosody]
Federation is also the subject of the [ActivityPub][] social networking
  protocol,
    which is implemented by projects like [Mastodon][], [NextCloud][], and
    [PeerTube][].
[Riot][] is an implementation of the [Matrix][] protocol for real-time,
  decentralized, end-to-end encrypted communication including chat, voice,
  video, file sharing, and more.
All of these things make Mallory's job much more difficult---
  instead of being able to go to a handful of popular services like
    FooRelay, Signal, WhatsApp, iMessage, Facebook Messenger, and others,
      Mallory has to go to potentially _thousands_ of server operators and ask
      them to cooperate.[^risk-popular]

[^me-prosody]: I run my own [Prosody][] server,
      for example,
      which supports OMEMO.

[^risk-popular]: Of course,
  there's always the risk of a few small instances becoming very
    popular,
      which once again makes Mallory's job easier.

[xmpp]: https://en.wikipedia.org/wiki/XMPP
[prosody]: https://prosody.im/
[activitypub]: https://www.w3.org/TR/activitypub/
[mastodon]: https://joinmastodon.org/
[nextcloud]: http://nextcloud.org/
[peertube]: https://joinpeertube.org/
[riot]: https://about.riot.im/
[matrix]: https://matrix.org/docs/guides/faq

[_Peer-to-peer (P2P)_][p2p] (or _distributed_) services forego any sort of
  central server and users instead communicate directly with
  one-another.[^dht]
In this case,
  Mallory has no server operator to go to;
    Levy and Robinson's proposal is ineffective in this
    environment.[^excuse-me]
[Tox][] is an end-to-end encrypted P2P instant messaging program.
[GNU Jami][jami] is an end-to-end encrypted P2P system with text, audio, and
  video support.
Another example of a different type of P2P software is Bittorrent,
  which is a very popular filesharing protocol.
[IPFS][] is a peer-to-peer Web.

[^excuse-me]: "Excuse me, kind sir/madam,
  may I please have your cooperation in spying on your
    conversations?"
  Another benefit of distributed systems is that they help to
    evade censorship,
      since no single server can be shut down to prohibit speech.

[^dht]: Though some P2P services offer discovery services.
  For example,
    [GNU Jami][jami] offers a distributed identity service using
    [_distributed hash tables_][dht] (DHTs).
  Bittorrent uses DHTs for its trackers.

[p2p]: https://en.wikipedia.org/wiki/Peer-to-peer
[tox]: https://tox.chat/
[jami]: https://jami.net/
[ipfs]: https://ipfs.io/
[dht]: https://en.wikipedia.org/wiki/Distributed_hash_table

_Decentralization puts users in control._
Users have a _choice_ of who to entrust their data and communications with,
  or can choose to trust no one and self-host.[^metadata-leak]
Alice, Bob, and Carol may have different threat models---maybe
  Carol doesn't want to trust FooRelay.
Maybe Alice, Bob, and Carol can't agree at _all_ on a host.
Nor should they have to.

[^metadata-leak]: Though it is important to understand what sort of data are
    leaked (including metadata) in decentralized and distributed systems.
  When you send a message in a decentralized system,
    that post is being broadcast to many individual servers,
      increasing the surface area for Mallory to inspect those data.
  If there are a couple popular servers that host the majority of users,
    Mallory can also just target those servers.
  For example,
    even if you self-host your email,
      if any of your recipients use GMail,
        then Google still has a copy of your message.

Self-hosting has another benefit: it helps to [put users in control of their
  own computing][saass].[^online-freedom]
Not only do they have control over their own data,
  but they also have full control over what the service does on their
  behalf.
In the previous section,
  I mentioned how free software helps to keep FooRelay honest.
What if FooRelay's _server software_ were _also_ free software?
If Alice can self-host FooRelay's server software and [doesn't like how
  FooRelay implements their group chat][whatsapp-vuln],
    for example,
    she is free to change it.
If Mallory forces FooRelay to implement a feature on their server to allow
  him to be added to group chats,
    the community may find that as well and Alice can remove that
      anti-feature from her self-hosted version.

[^online-freedom]: I go into more information on the problems with modern
    software on the web in [my LibrePlanet 2016 talk "Restore Online
    Freedom!"][rof].

[saass]: http://www.gnu.org/philosophy/who-does-that-server-really-serve.html
[rof]: https://mikegerwitz.com/talks#online-freedom


## Please Continue Debating

This article ended up being significantly longer and more substantive than I
  had originally set out to write.
I hope that it has provided useful information and perspective that
  was missing from many of the existing discussions,
    and I hope that I have provided enough resources for further research.

The prominent responses to which I referred (some of which were already
    referenced above) are analyses by
      [Susan Landau][landau],
      [Matthew Green][green-ghost],
      [Bruce Schneier][schneier],
      [Nate Cardozo and Seth Schoen of the EFF][detect-ghosts],
      and [another by Nate Cardozo][eff-ghost].
There are surely others,
  but these were the ones that motivated this article.

It is important to keep these encryption debates alive.
The crypto wars are far from over.
We must ensure that we provide users with the tools and information
  necessary to defend themselves and one-another---tools
    and practices that are immune from government interference unless they
      themselves become illegal.
What a grim and dangerous world that would be.

I'm most concerned by the lack of debate from community leaders about the
  issues of [software freedom](#betrayed),
  [reproducibility](#reproducibility), and
  [decentralization](#centralized-services).
These are essential topics that I feel must be encouraged if we are to
  ensure the [safety and security][sapsf] of people everywhere.[^disagree]
We need more people talking about them!
If you found these arguments convincing,
  I would appreciate your help in spreading the word.
If you didn't,
  please reach out to me and tell me why;
    I would very much like to hear and understand your perspective.

[landau]: https://www.lawfareblog.com/exceptional-access-devil-details-0
[schneier]: https://www.schneier.com/essays/archives/2019/01/evaluating_the_gchq_.html
[sapsf]: /talks/#sapsf

[^disagree]: But I also know that there are many people that disagree with
    me on each of these points!
  If that weren't the case,
    I wouldn't need to be an activist.