|
| 1 | +--- |
| 2 | +title: "Protecting rubygems.org from the outside in: DOS prevention and compromised passwords" |
| 3 | +layout: post |
| 4 | +author: Colby Swandale |
| 5 | +author_email: colby@rubygems.org |
| 6 | +--- |
| 7 | + |
| 8 | +Every gem published to [rubygems.org](https://rubygems.org) ends up running on someone's computer. It's up to [rubygems.org](https://rubygems.org) to ensure that each gem contains what it claims, that its metadata is well-formed, and that the person who pushed it is who they say they are. |
| 9 | + |
| 10 | +We've been chipping away at that. Over the past few months, we shipped two changes that tighten [rubygems.org](https://rubygems.org)'s defences at very different layers: stronger validation of gem contents at push time, and integration with Have I Been Pwned to catch compromised passwords at login. |
| 11 | + |
| 12 | +## What [rubygems.org](https://rubygems.org) checks when you gem push |
| 13 | + |
| 14 | +A RubyGem is actually just a regular tar file, which contains 3 sections: the code, metadata, and checksums, which you can inspect for yourself. |
| 15 | + |
| 16 | +```bash |
| 17 | +$ gem fetch rails |
| 18 | +Fetching rails-8.1.3.gem |
| 19 | +Downloaded rails-8.1.3 |
| 20 | + |
| 21 | +$ tar -xvf rails-8.1.3.gem |
| 22 | +x metadata.gz |
| 23 | +x data.tar.gz |
| 24 | +x checksums.yaml.gz |
| 25 | +``` |
| 26 | + |
| 27 | + [rubygems.org](https://rubygems.org) closely inspects all 3 of these files when a gem is published, but the ones we're looking at are the `metadata` and `checksums.yaml`. |
| 28 | + |
| 29 | + The `checksums.yaml` certifies the integrity hash of the `data.tar.gz` and `metadata.gz` with a sha256 after `gem build`. If someone tampers with the code directly, the checksums won't match and [rubygems.org](https://rubygems.org) rejects the push immediately. Checksums are the easy part. |
| 30 | + |
| 31 | + `metadata.gz` has the serialised YAML of the gem metadata, generated from the gemspec during `gem build`. |
| 32 | + |
| 33 | + ```yaml |
| 34 | + --- !ruby/object:Gem::Specification |
| 35 | +name: rails |
| 36 | +version: !ruby/object:Gem::Version |
| 37 | + version: 8.1.3 |
| 38 | +platform: ruby |
| 39 | +authors: |
| 40 | +- David Heinemeier Hansson |
| 41 | +bindir: bin |
| 42 | +cert_chain: [] |
| 43 | +date: 1980-01-02 00:00:00.000000000 Z |
| 44 | +dependencies: |
| 45 | +- !ruby/object:Gem::Dependency |
| 46 | + name: activesupport |
| 47 | + requirement: !ruby/object:Gem::Requirement |
| 48 | + requirements: |
| 49 | + - - '=' |
| 50 | + - !ruby/object:Gem::Version |
| 51 | + version: 8.1.3 |
| 52 | + type: :runtime |
| 53 | + prerelease: false |
| 54 | + version_requirements: !ruby/object:Gem::Requirement |
| 55 | + requirements: |
| 56 | + - - '=' |
| 57 | + - !ruby/object:Gem::Version |
| 58 | + version: 8.1.3 |
| 59 | +... |
| 60 | +``` |
| 61 | + |
| 62 | +When a gem is pushed, [rubygems.org](https://rubygems.org) deserialises the YAML and reconstructs a `Gem::Specification` object from it. It then validates the result, checking that the name and version are well-formed, that the declared dependencies are valid, that the person pushing is authorised. This is where gem validation gets complex. |
| 63 | + |
| 64 | +## Exploiting the validation process |
| 65 | + |
| 66 | +This process of reconstructing the gemspec YAML into a `Gem::Specification` object invites a class of exploitation called [insecure deserialisation](https://owasp.org/www-community/vulnerabilities/Insecure_Deserialization) that would allow a crafted YAML to attack [rubygems.org](https://rubygems.org). |
| 67 | + |
| 68 | +This isn't a theoretical concern. In 2017, a [security researcher discovered](https://blog.rubygems.org/2017/10/09/unsafe-object-deserialization-vulnerability.html) that rubygems.org was using a bare `YAML.load` to deserialise checksums inside gem files, a vulnerability that had potentially been present since 2012. The team patched it within hours by switching to `YAML.safe_load`, which restricts which Ruby objects can be instantiated from a document. But that only narrowed the problem. Even with a precise allowlist of classes and objects, malicious gems could still exploit the deserialisation process to exhaust memory or CPU before any validation even ran, causing rubygems.org servers to stop working. |
| 69 | + |
| 70 | +## Validating gems without `Gem::Specification` |
| 71 | + |
| 72 | +The fix was to stop trusting the YAML to tell [rubygems.org](https://rubygems.org) what to do with itself. |
| 73 | + |
| 74 | +This was largely [Aaron Patterson's](https://bsky.app/profile/tenderlove.dev) (tenderlove) work. He designed and built the AST-based approach from the ground up. Rather than handing the document to Ruby and letting it materialise objects, we traverse the parsed tree ourselves and extract only the values we expect to find. The YAML never gets to decide what gets instantiated. We also validate the structure against a schema derived from the real thing: Aaron audited all 180,000 gems published on [rubygems.org](https://rubygems.org) and built [a tool](https://github.com/tenderlove/gem-validator/tree/main) to validate them against it. Some very old gems turned up edge cases we deliberately chose not to handle. If those gems were pushed today, they'd be rejected, but these gems that haven't seen a new version in years almost certainly never will be. His contribution here is greatly appreciated. |
| 75 | + |
| 76 | +The result is that an entire class of exploitation (using malformed metadata to attack the push endpoint itself) is no longer possible. The attack surface doesn't exist anymore. |
| 77 | + |
| 78 | +## Compromised passwords and the supply chain |
| 79 | + |
| 80 | +Gem validation protects [rubygems.org](https://rubygems.org) from what gets pushed. But there's a separate persistent threat: who's doing the pushing. |
| 81 | + |
| 82 | +Package registries are high-value targets for credential stuffing. If an attacker gets hold of a developer's reused password from an unrelated breach, they can log in as that developer and push a malicious version of a legitimate gem. The code is signed by a trusted account. The checksums match. Everything looks right, because as far as [rubygems.org](https://rubygems.org) can tell, it is. |
| 83 | + |
| 84 | +[Have I Been Pwned](https://haveibeenpwned.com) (HIBP) is a service run by security researcher [Troy Hunt](https://www.troyhunt.com) that tracks passwords exposed in known data breaches. At the time of writing, it contains over 10 billion compromised passwords. [rubygems.org](https://rubygems.org) now checks against it at login, registration and password resets. |
| 85 | + |
| 86 | +## Checking passwords without exposing them |
| 87 | + |
| 88 | +The obvious concern with checking your password against a third-party service is privacy. [rubygems.org](https://rubygems.org) never sends your password, or even a full hash of it, to HIBP. |
| 89 | + |
| 90 | +Instead, it uses [HIBP's k-anonymity model](https://www.troyhunt.com/understanding-have-i-been-pwneds-use-of-sha-1-and-k-anonymity/). When you log in, [rubygems.org](https://rubygems.org) computes a SHA-1 hash of your password and sends only the first 5 characters of that hash to the HIBP API. HIBP returns a list of all hashed passwords in its database that start with those 5 characters. [rubygems.org](https://rubygems.org) then checks that list locally. Your full password hash never leaves our servers. |
| 91 | + |
| 92 | +If your password appears in the results, [rubygems.org](https://rubygems.org) blocks the session and shows a warning explaining your password has been found in a known breach. You'll need to reset your password before you can log in again. |
| 93 | + |
| 94 | +Since shipping, it's detected 1,166 accounts with compromised passwords. Because rubygems.org hashes passwords with bcrypt, we've never been able to inspect the strength of passwords in the database directly. This is the first real window into how widespread the problem is, and a way to start course correcting it. |
| 95 | + |
| 96 | +## Shipping the work |
| 97 | + |
| 98 | +[rubygems.org](https://rubygems.org) serves almost a billion gem downloads every single day. Every Ruby application, from side projects to the infrastructure powering large parts of the internet, depends on the integrity of what we distribute. |
| 99 | + |
| 100 | +These two changes address the supply chain at different layers: one at the moment a gem is built and pushed, the other at the moment a person logs in. Neither is glamorous. Validating YAML ASTs and hashing password prefixes don't ship in a splash announcement. But this is the work: closing specific, real attack vectors before someone finds them for you. If you want to follow along or get involved, everything happens in the open at [github.com/rubygems/rubygems.org](https://github.com/rubygems/rubygems.org). |
0 commit comments