Massive Great Firewall Leak Exposes 500GB of Censorship Data

October 31, 2025 5 min read

Table of Contents

Nature and Scope of the Leaked Data
Organizational Fingerprints and Attribution
Implications and Future Consequences

In a historic breach of China’s censorship infrastructure, over 500 gigabytes of internal data were leaked from Chinese infrastructure firms associated with the Great Firewall (GFW) in September 2025.

Researchers now estimate the full dump is closer to approximately 600 GB, with a single archive comprising around 500 GB alone.

The material includes more than 100,000 documents, internal source code, work logs, configuration files, emails, technical manuals, and operational runbooks. The number of files in the dump is reported to be in the thousands, though exact totals vary by source.

Among the revealed artifacts are RPM packaging server files, the packaging infrastructure used for distributing software artifacts, project management data from Jira and Confluence showing internal tickets, feature requests, bug reports, and deployment histories, and communications and engineering documents showing how censorship tools are tested against VPNs, Tor, and other circumvention methods, including methods of deep packet inspection (DPI), SSL fingerprinting, and filtering logic.

Deployment records indicate both domestic use in provinces like Xinjiang, Fujian, and Jiangsu, as well as export of censorship or surveillance systems to other countries, including Myanmar, Pakistan, Ethiopia, and Kazakhstan.

Nature and Scope of the Leaked Data

The dataset is a sprawling, multifaceted archive that lays bare the technical scaffolding of China’s digital surveillance regime.

It includes raw IP access logs from state-run telecom providers such as China Telecom, China Unicom, and China Mobile, revealing real-time traffic monitoring and endpoint interaction.

Downloading and research of such data should be handled by professionals in protected environments due to potential malware and information hazards.

Packet captures (PCAPs) and routing tables are paired with blackhole sinkhole exports, detailing how traffic is intercepted, redirected, or silently dropped.

A trove of Excel spreadsheets enumerates known VPN IP addresses, DNS query patterns, SSL certificate fingerprints, and behavioral signatures of proxy services, offering insight into identification and blocking heuristics.

Visio diagrams map out the internal firewall architecture, from hardware deployments to logical enforcement chains spanning various ministries and provinces.

Application-layer logs dissect tools like Psiphon, V2Ray, Shadowsocks, and corporate proxy gateways, capturing how these are tested, fingerprinted, and throttled.

The dataset also contains databases of fully qualified domain names (FQDNs), SNI strings, application telemetry, and “sketch logs” showing serialized behavioral data scraped from mobile apps.

System-level monitoring exports reveal server CPU usage, memory utilization, stream session logs, and real-time user states.

Crucially, metadata leaked from Word, Excel, and PowerPoint files exposes the usernames, organizational affiliations, and edit trails of engineers and bureaucrats working on censorship infrastructure.

Finally, OCR-processed screenshots illustrate the UI panels of traffic control dashboards, logging mechanisms, and internal tooling, offering a visual window into how the Great Firewall is operated in practice.

Organizational Fingerprints and Attribution

Beyond the technical evidence of censorship and traffic manipulation, the leaked dataset offers a rare opportunity to construct a socio-technical map of the Great Firewall apparatus, revealing not just how it works, but who builds it, who maintains it, and how China’s censorship ecosystem is organizationally compartmentalized.

The metadata extracted from over 7,000 documents, spreadsheets, Visio network maps, text logs, dashboards, and software configuration files reveals a complex lattice of state-linked entities operating in tightly controlled silos.

The internal architecture of the Great Firewall is supported by a network of organizations ranging from state-owned enterprises to elite research institutions and private sector vendors.

Core traffic monitoring and enforcement responsibilities are handled by China Telecom, China Unicom, and China Mobile, whose infrastructure appears repeatedly in PCAP logs, IP registries, and system-level telemetry.

Metadata from Visio diagrams and scanning scripts links regional enforcement activities to provincial branches, indicating decentralized operational cells.

At the academic and research level, contributors from the Chinese Academy of Sciences, CNCERT, Tsinghua University, and USTC are implicated in traffic modeling, VPN fingerprinting, and algorithmic SNI detection, functioning in a science-to-policy pipeline.

Additional entities like Huaxin, Venustech, and Topsec, believed to have ties to the Ministry of State Security (MSS), appear responsible for developing packet inspection hardware, smart gateways, and modular control interfaces.

System topology files suggest regional hubs under provincial control, with metadata pointing to a tiered model of command, central rule authors in Beijing, and localized operators managing disruptions and resets.

The leaked dataset exposes a highly modular and deeply integrated censorship architecture underlying the Great Firewall of China. Rather than operating as a single centralized filter, the GFW is revealed to be a distributed system of surveillance and control spanning national, regional, and local network layers.

At the core of traffic interception are the state-run ISPs, which serve as both service providers and surveillance intermediaries. Logs from these providers document the interception and classification of traffic based on packet content, with the use of deep packet inspection techniques.

These techniques target TLS/HTTPS session metadata, such as Server Name Indication (SNI) fields, and distinguish potentially suspicious connections based on protocol anomalies, including entropy, timing patterns, and payload structures. The infrastructure supports detection of known circumvention tools such as Shadowsocks, V2Ray, and Psiphon.

Application-level analysis is conducted using fingerprinting heuristics derived from both raw network characteristics and behavioral modeling.

Anonymous DNS Resolution System via Tor Network with DOH. — *Anonymous DNS Resolution System via Tor Network with DOH*.

Various Excel spreadsheets and telemetry exports include references to TLS fingerprinting rules, heuristic classifiers for VPN/proxy traffic, and statistical models used to flag encrypted tunnels.

These analyses rely on databases of SNI patterns, handshake behaviors, and traffic volume profiles. This reveals a layered approach to detection, with different modules specializing in different levels of granularity and evasiveness.

Implications and Future Consequences

The leak of over 500 gigabytes of internal data from China’s censorship infrastructure constitutes one of the most consequential exposures in the history of digital authoritarianism.

Encompassing more than 7,000 files, the dataset provides not merely an isolated glimpse but an extended, multi-dimensional forensic cross-section of the Great Firewall operational anatomy, revealing system telemetry, logic flows, user sessions, document metadata, application analyses, and network schematics.

Technically, the leak has rendered much of China’s detection arsenal obsolete. VPN heuristics, DPI rule sets, SNI-based fingerprinting algorithms, and application proxy classifiers are now open to scrutiny, replication, and evasion.

Operationally, usernames, hostnames, and file authorship data risk exposing government contractors, telecom engineers, and researchers, increasing their vulnerability to naming and shaming, targeted sanctions, or exploitation by rival intelligence services.

The documentation of flawed infrastructure, such as packet loss under scan load, looped sinkhole rules, and session state anomalies, presents ripe opportunities for adversarial exploitation.

Strategically, this dataset arms censorship circumvention communities, policy advocates, and red teams with the ability to simulate and reverse-engineer enforcement logic, undermining the efficacy of centralized control.

In sum, this breach collapses the asymmetry between censor and censored, offering, for the first time, a detailed blueprint of China’s digital surveillance leviathan. This is not just a technical leak; it is a rare unmasking of the people behind the policy.

Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google.

Source link