commit b9e5ffa2b37f5a558e9555f88cbf91a7d53fce00 Author: Juhani Krekelä Date: Tue Aug 28 12:27:54 2018 +0300 First commit diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..0b0f584 --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +__pycache__ +*.pyc +*.swp +*.sshwot diff --git a/CC0 b/CC0 new file mode 100644 index 0000000..670154e --- /dev/null +++ b/CC0 @@ -0,0 +1,116 @@ +CC0 1.0 Universal + +Statement of Purpose + +The laws of most jurisdictions throughout the world automatically confer +exclusive Copyright and Related Rights (defined below) upon the creator and +subsequent owner(s) (each and all, an "owner") of an original work of +authorship and/or a database (each, a "Work"). + +Certain owners wish to permanently relinquish those rights to a Work for the +purpose of contributing to a commons of creative, cultural and scientific +works ("Commons") that the public can reliably and without fear of later +claims of infringement build upon, modify, incorporate in other works, reuse +and redistribute as freely as possible in any form whatsoever and for any +purposes, including without limitation commercial purposes. These owners may +contribute to the Commons to promote the ideal of a free culture and the +further production of creative, cultural and scientific works, or to gain +reputation or greater distribution for their Work in part through the use and +efforts of others. + +For these and/or other purposes and motivations, and without any expectation +of additional consideration or compensation, the person associating CC0 with a +Work (the "Affirmer"), to the extent that he or she is an owner of Copyright +and Related Rights in the Work, voluntarily elects to apply CC0 to the Work +and publicly distribute the Work under its terms, with knowledge of his or her +Copyright and Related Rights in the Work and the meaning and intended legal +effect of CC0 on those rights. + +1. Copyright and Related Rights. A Work made available under CC0 may be +protected by copyright and related or neighboring rights ("Copyright and +Related Rights"). Copyright and Related Rights include, but are not limited +to, the following: + + i. the right to reproduce, adapt, distribute, perform, display, communicate, + and translate a Work; + + ii. moral rights retained by the original author(s) and/or performer(s); + + iii. publicity and privacy rights pertaining to a person's image or likeness + depicted in a Work; + + iv. rights protecting against unfair competition in regards to a Work, + subject to the limitations in paragraph 4(a), below; + + v. rights protecting the extraction, dissemination, use and reuse of data in + a Work; + + vi. database rights (such as those arising under Directive 96/9/EC of the + European Parliament and of the Council of 11 March 1996 on the legal + protection of databases, and under any national implementation thereof, + including any amended or successor version of such directive); and + + vii. other similar, equivalent or corresponding rights throughout the world + based on applicable law or treaty, and any national implementations thereof. + +2. Waiver. To the greatest extent permitted by, but not in contravention of, +applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and +unconditionally waives, abandons, and surrenders all of Affirmer's Copyright +and Related Rights and associated claims and causes of action, whether now +known or unknown (including existing as well as future claims and causes of +action), in the Work (i) in all territories worldwide, (ii) for the maximum +duration provided by applicable law or treaty (including future time +extensions), (iii) in any current or future medium and for any number of +copies, and (iv) for any purpose whatsoever, including without limitation +commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes +the Waiver for the benefit of each member of the public at large and to the +detriment of Affirmer's heirs and successors, fully intending that such Waiver +shall not be subject to revocation, rescission, cancellation, termination, or +any other legal or equitable action to disrupt the quiet enjoyment of the Work +by the public as contemplated by Affirmer's express Statement of Purpose. + +3. Public License Fallback. Should any part of the Waiver for any reason be +judged legally invalid or ineffective under applicable law, then the Waiver +shall be preserved to the maximum extent permitted taking into account +Affirmer's express Statement of Purpose. In addition, to the extent the Waiver +is so judged Affirmer hereby grants to each affected person a royalty-free, +non transferable, non sublicensable, non exclusive, irrevocable and +unconditional license to exercise Affirmer's Copyright and Related Rights in +the Work (i) in all territories worldwide, (ii) for the maximum duration +provided by applicable law or treaty (including future time extensions), (iii) +in any current or future medium and for any number of copies, and (iv) for any +purpose whatsoever, including without limitation commercial, advertising or +promotional purposes (the "License"). The License shall be deemed effective as +of the date CC0 was applied by Affirmer to the Work. Should any part of the +License for any reason be judged legally invalid or ineffective under +applicable law, such partial invalidity or ineffectiveness shall not +invalidate the remainder of the License, and in such case Affirmer hereby +affirms that he or she will not (i) exercise any of his or her remaining +Copyright and Related Rights in the Work or (ii) assert any associated claims +and causes of action with respect to the Work, in either case contrary to +Affirmer's express Statement of Purpose. + +4. Limitations and Disclaimers. + + a. No trademark or patent rights held by Affirmer are waived, abandoned, + surrendered, licensed or otherwise affected by this document. + + b. Affirmer offers the Work as-is and makes no representations or warranties + of any kind concerning the Work, express, implied, statutory or otherwise, + including without limitation warranties of title, merchantability, fitness + for a particular purpose, non infringement, or the absence of latent or + other defects, accuracy, or the present or absence of errors, whether or not + discoverable, all to the greatest extent permissible under applicable law. + + c. Affirmer disclaims responsibility for clearing rights of other persons + that may apply to the Work or any use thereof, including without limitation + any person's Copyright and Related Rights in the Work. Further, Affirmer + disclaims responsibility for obtaining any necessary consents, permissions + or other rights required for any use of the Work. + + d. Affirmer understands and acknowledges that Creative Commons is not a + party to this document and has no duty or obligation with respect to this + CC0 or use of the Work. + +For more information, please see + diff --git a/src/entry.py b/src/entry.py new file mode 100644 index 0000000..9305ac5 --- /dev/null +++ b/src/entry.py @@ -0,0 +1,45 @@ +from collections import namedtuple + +import hashing + +# Entry(bytes[32], bytes[32], bytes[32], bytes[0…2¹⁶-1]) +Entry = namedtuple('Entry', ['salt', 'hashed_host', 'fingerprint', 'comment']) + +class UnacceptableComment(Exception): pass + +def create_entry(domain, port, fingerprint, comment): + """create_entry(str, u16, bytes[32], str) → Entry + Given unprocessed host, a binary fingerprint and a comment, creates + and entry describing it""" + assert type(domain) == str + assert type(port) == int and 0 <= port <= (1<<16) - 1 + assert type(fingerprint) == bytes and len(fingerprint) == 32 + assert type(comment) == str + + # We want to have domain names reasonably normalized. This is why we + # convert all internationalized domain names to punycode and + # lowercase all domains. + # The reason the lowercasing happens after the punycoding is because + # that way we don't have to worry about Unicode case mapping: in + # case of IDN the IDNA codec handles that for us, and in case of an + # ASCII domain it passes through the IDNA unmodified + processed_host = domain.encode('idna').lower() + + # If the port is not :22, we store [host]:port instead + if port != 22: + processed_host = b'[%s]%i' % (processed_host, port) + + # Hash the host and store the salt + salt, hashed_host = hashing.hash_host(processed_host) + + # Comment must not include newlines + if '\n' in comment: + raise UnacceptableComment('Comment contains newlines') + + comment_encoded = comment.encode('utf-8') + + # Comment may be at max 2¹⁶-1 bytes long + if len(comment_encoded) >= 1<<16: + raise UnacceptableComment('Comment length of %i bytes is too long' % len(comment_encoded)) + + return Entry(salt, hashed_host, fingerprint, comment_encoded) diff --git a/src/export_known_hosts.py b/src/export_known_hosts.py new file mode 100644 index 0000000..15501a9 --- /dev/null +++ b/src/export_known_hosts.py @@ -0,0 +1,19 @@ +import sys + +import process_known_hosts +import write_file + +def main(): + entries = [] + # TODO: Don't hardcode + # TODO: Handle errors + with open(sys.argv[1], 'r') as f: + for line in f: + entries.extend(process_known_hosts.process_line(line)) + + with open('known_hosts.sshwot', 'wb') as f: + write_file.write(f, entries) + +if __name__ == '__main__': + main() + diff --git a/src/hashing.py b/src/hashing.py new file mode 100644 index 0000000..e6a33d4 --- /dev/null +++ b/src/hashing.py @@ -0,0 +1,25 @@ +import hashlib +import os + +def hash_with_salt(host, salt): + """hash_with_salt(bytes, bytes) → bytes[32] + Hash the host using sha256 and the give salt""" + assert type(host) == bytes + assert type(salt) == bytes + m = hashlib.sha256() + m.update(host) + m.update(salt) + return m.digest() + +def generate_salt(): + """generate_salt() → bytes[32] + Generates 32 bytes of randomness using the system urandom""" + return os.urandom(32) + +def hash_host(host): + """hash_host(bytes) → (bytes[32]: salt, bytes[32]: hashed_host) + Generates a salt and hashes the host with it""" + assert type(host) == bytes + salt = generate_salt() + hashed_host = hash_with_salt(host, salt) + return salt, hashed_host diff --git a/src/process_known_hosts.py b/src/process_known_hosts.py new file mode 100644 index 0000000..ed6cb6f --- /dev/null +++ b/src/process_known_hosts.py @@ -0,0 +1,76 @@ +import base64 +import hashlib + +import entry + +class KnownHostsSyntaxError(Exception): pass + +class HashedHostError(Exception): pass + +def process_line(line): + """process_line(str) → [Entry] + Given a string containing one line of .ssh/known_hosts file, create + a list of Entries based on it.""" + assert type(line) == str + + # Remove trailing newlines + if line[-1] == '\n': line = line[:-1] + + # Just skip over empty lines + if line == '': return [] + + # Each line has host(s), algorithm, public key, and possibly one + # more optional field + fields = line.split(' ') + if len(fields) != 3 and len(fields) != 4: + raise KnownHostsSyntaxError('Weird number of fields on a line (%i)' % len(fields)) + + hosts, algorithm, public_key = fields[0:3] + + # Generate public key fingerprint + # The key is stored base64 encoded, so decode it first + try: + public_key_binary = base64.b64decode(public_key, validate = True) + except (ValueError, base64.binascii.Error) as err: + raise KnownHostsSyntaxError('Malformed public key: %s' % public_key) from err + + # Fingerprint is sha256 hash of the public key + m = hashlib.sha256() + m.update(public_key_binary) + fingerprint = m.digest() + + # There can be several hosts separated with a comma + entries = [] + for host in hosts.split(','): + # A host can't be empty + if len(host) == 0: + raise KnownHostsSyntaxError('An empty host') + + # If the host begins with '|' it's hashed + # We cannot deal with those + if host[0] == '|': + raise HashedHostError('Cannot deal with hashed hosts') + + # If the host behins with '[' it's a nonstandard port + # The format will be [domain]:port + # Extractt both + # Otherwise, default to port 22 + if host[0] == '[': + host_and_port = host[1:].split(']:') + if len(host_and_port) != 2: + raise KnownHostsSyntaxError('Unrecognized host format: ' + host) + + domain = host_and_port[0] + try: + port = int(host_and_port[1]) + except ValueError: + raise KnownHostsSyntaxError('Malformed port: %i' % port) + + else: + domain = host + port = 22 + + # Default to no comment + entries.append(entry.create_entry(domain, port, fingerprint, '')) + + return entries diff --git a/src/write_file.py b/src/write_file.py new file mode 100644 index 0000000..74db2f8 --- /dev/null +++ b/src/write_file.py @@ -0,0 +1,39 @@ +def write_header(f): + """write_header(file(wb)) + Writes the header to the given file.""" + # b'WOT' magic + f.write(b'WOT') + # Version number + f.write(bytes(0)) + +def write_entry(f, salt, hashed_host, fingerprint, comment): + """write_entry(file(wb), bytes[32], bytes[32], bytes[32], bytes[0…2¹⁶-1]) + Writes an entry to the given file.""" + assert type(salt) == bytes and len(salt) == 32 + assert type(hashed_host) == bytes and len(hashed_host) == 32 + assert type(fingerprint) == bytes and len(fingerprint) == 32 + assert type(comment) == bytes and 0 <= len(comment) <= (1<<16) - 1 + + # u8[32]: salt + f.write(salt) + + # u8[32]: hashed_host + f.write(hashed_host) + + # u8[32]: fingerprint + f.write(fingerprint) + + # u16le: len(comment) + comment_len = len(comment) + f.write(bytes([comment_len & 0xff, comment_len >> 8])) + + # u8[]: comment + f.write(comment) + +def write(f, entries): + """write(file(wb), [Entry]) + Creates a file containing all of the entries""" + write_header(f) + + for entry in entries: + write_entry(f, entry.salt, entry.hashed_host, entry.fingerprint, entry.comment) diff --git a/sshwot-format.text b/sshwot-format.text new file mode 100644 index 0000000..863327e --- /dev/null +++ b/sshwot-format.text @@ -0,0 +1,22 @@ +The file has a header like + u8[3]: magic = b'WOT' + u8: version = 0 + +After the header the entries are laid out as + u8[32]: salt + u8[32]: sha256(host concat salt) + u8[32]: sha256-fingerprint + u16le: comment-bytes + utf8[]: comment + +If port is not 22, the host is [host]:port. This is in accordance with how +OpenSSH stores it in .ssh/known_hosts. Internationalized domain names are +punycoded and all domain names are converted into lower case. This differs +from OpenSSH, which is not IDN-aware. + +Sha256 is used instead of a password hash since we want checking for whether +a host is present to be reasonably fast. + +The comment field can have any other valid Unicode, but must not contain +newline characters. An implementation should check for them when displaying +the comment.