Flexible Data Retrieval at Scale with HAQL

What is HAQL?

Back in 2022, we were faced with a challenge: we wanted to build useful, actionable dashboards for our customers, and we wanted to build them fast. We had the data, we had the context, and we had the designs, but we were missing a way to scale the process of data wrangling for each additional data visualization. We built helper classes, DRY’d, and abstracted away code, but we still found ourselves bogged down writing opaque Arel queries that were prone to error and difficult to debug. On top of that, trying to optimize for performance while navigating a complex database authorization layer led to difficult tradeoffs between load times and security.

Enter HAQL. Rather than struggle through defining each query in ActiveRecord, Arel, or risky raw SQL blocks, why not simplify the interface and focus on the specific needs of a fast analytics query engine? At its core, HAQL is just that: a simplified query interface for writing performant aggregate queries on tables modeled purposefully for data analysis.

On the backend, HAQL consists of a Ruby class that constructs Arel nodes from a given input, enabling fine-grained control over the available schema, authorization, database functions, data types, output formats, database connections, row limits, error handling, and more.

The query inputs themselves are highly structured and strictly typed, which makes it easier to validate malicious payloads and enforce access controls. And most importantly for our original use case, the structured inputs and outputs grant us the ability to rapidly build new dashboards.

We leveraged this HAQL response contract to write reusable React components on the frontend for all our most common data visualizations; now a chart becomes a configuration, and creating a new dashboard becomes a low-code activity.

So how does it work?

A preview of platform features enabled by HAQL

The Anatomy of a HAQL Query

A HAQL query has many of the familiar components of a SQL query: a required select statement along with optional where predicates, join statements, order by specifications, and limit directives. Queries are typically executed via GraphQL, but can also be defined explicitly as JSON. Let’s look at an example.

Flexible Data Retrieval at Scale with HAQL

With this query, we’re retrieving the sum of bounties grouped by asset in the select statement. We join the assets table to the bounties table in the join statement, and specify the join conditions via a series of predicates. Finally, we order the results by the summed bounty amount.

Behind the scenes in our Rails backend, each query component is parsed, validated, and incorporated as a node in an Arel query. At this point, we also apply authorization predicates and other safeguards against improper access. Results are then returned in a key-value format that’s compatible with GraphQL.

In PostgreSQL, the above query would translate to:

Flexible Data Retrieval at Scale with HAQL

Though a simple interface, HAQL allows for quite complex queries. We’ve found that in combination with reusable frontend components and well-defined patterns for grouping related queries, HAQL has brought down the time to build a new dashboard from weeks to hours in typical use cases.

Investing in Catalysts

It’s often the case in the world of engineering that small improvements have reverberations greatly exceeding the initial problem in scale. Discovering these force multipliers is more art than science, but with HAQL we found out almost immediately that there were a number of applications, many of which were completely unexpected.

One of the most exciting opportunities for HAQL is its relationship to Hai, HackerOne’s AI copilot. HAQL’s schema is naturally dense with information and highly structured, making it easy for Hai to learn the language via conventional Retrieval Augmented Generation (RAG) techniques and enabling Hai to fetch, analyze, and render data as an agent in real time.

Without this analytics query layer, our Hai developers would’ve been required to hand-engineer data access rules and schematic context to coax generated SQL from an LLM, they would’ve needed to write parsers for validation, and they would’ve been forced to implement complex logic for handling diverse response formats. Instead, a relatively simple addition to the LLM’s system prompt unlocks a powerful new functionality: context-aware, chat-based insights across the HackerOne platform.

As an added benefit, simple yet strict authorization rules give us greater confidence in Hai’s ability to safely execute HAQL queries, and support for rich metadata allows us to “steer” LLMs towards more reliable queries that wouldn’t have been possible otherwise.

Limitations

In the age-old tradeoff of build vs. buy vs. open source, HAQL is no exception. Are there other tools that could have potentially helped us solve this problem? Of course. Are there downsides to managing a homegrown custom query engine in a Rails app? For sure. Are there unknown risks we haven’t uncovered yet? Definitely.

The verbose syntax may also feel heavy-handed for experienced SQL users at first, and for more complex operations such as subqueries, CTEs, unions, and the like, HAQL is not the best option (yet). But there’s always the right tool for the right job, and at HackerOne, HAQL is a powerful one to have in the toolbox.

Looking Forward

In the future, we expect HAQL to have uses that go far beyond powering dashboards. It already enables a handful of REST API endpoints, and in the future, it will likely be queryable directly via the API. The number of datasets available in the HAQL schema has also been growing steadily to cover a greater share of HackerOne’s product suite. Finally, the integration with Hai is sure to attract additional product and engineering investment as we discover creative new ways to surface and interact with data.

We only anticipate these capabilities to grow in what is turning out to be a very exciting time for cybersecurity and technology as a whole.

Source link