Supporting `VIEW`s in Diesel

Prototype Fund is funding work to add support for querying SQL VIEWs from Diesel. As I'm slowly working through adding this feature to Diesel, I would like to use this blog post to publish some more context about this work.

Adding VIEW support to Diesel

Conceptually, the end result of this work will be pretty boring, as the "only" new thing that appears is a diesel::view! macro. This macro will work mostly in the same way like the existing diesel::table!. Users can then use the data structures generated by these macros to build their queries. In contrast to the table! macro, the view! macro will ensure that it can only be used in SELECT queries for now and reject any usage in other statements like INSERT, UPDATE, or DELETE. Implementing this part in Diesel was pretty straightforward. It already has landed on our main branch as part of the following pull requests:

The final piece to make using this construct convenient is to generate these view! definitions from our CLI tool, as we already do for table! macros. For table! macros, this implementation is pretty straightforward: We connect to your database and query information about tables from there. This includes information about primary and foreign keys, column types, and also if a certain column is allowed to contain NULL values or not. From a conceptual point of view, we can get the same information from the database for VIEW as well. The problem there is that in contrast to tables, databases often do not know if a field of a view might contain NULL values or not. As part of this blog post, I would like to review existing solutions to this problem and describe the solution I ended up working on.

Infer the nullability of VIEW fields.

The underlying problem I need to solve is figuring out for each expression in the SELECT clause of a given query if the expression can result in a NULL value or not. This problem is not unique to Diesel or even to database VIEWs but a general problem that also other crates and tools. I will try to present various solutions in the following sections.

Using database provided information about the VIEW

PostgreSQL and MySQL support the information_schema to query data about different schema parts. This includes a view to get information about VIEWs defined in your database. You can combine these information with the information from the information_schema.columns view to get information about the fields produced by a VIEW. SQLite exposes similar information via PRAGMA table_info('view_name). This closely matches what Diesel already does for tables. For tables this includes detailed information about whether a column might contain null values. Unfortunately that's often not the case for views as both PostgreSQL and SQLite always claim that a view field might contain NULL values. That's the case even for trivial cases like CREATE VIEW test AS SELECT 1. On the other hand MySQL seems to correctly infer this information.

Using this approach would result in Diesel always using a Nullable<> type for such fields, which in turn would require our users to always wrap fields of a VIEW in Option<T> values in their Rust code. While this behaviour is technically correct as Option<T> can represent all relevant values for T, it's adding additional overhead to what our users need to take care of.

Asking the database for information about what a specific query returns

Another approach is to use prepared statements to ask the database directly what a specific query would return. This approach is chosen by crates like SQLx and clorinde. In theory that sounds like a really nice solution as you offload all the work to the database system which needs to figure out this kind of information anyway to actually execute your query. In practice you run into exactly the same limitations than with the previous approach: PostgreSQL and SQLite will tell you that all fields of your query might return NULL values. I guess this is kind of expected as the tools used for the other approach are also likely just built on top of this feature.

A common problematic construct here seems to be queries like SELECT users.id, posts.id FROM users LEFT JOIN posts.user_id = users.id where posts.id itself is declared as a not nullable column in its table but might contain null values due to the join.

Both other crates then apply various heuristics on top of asking the database to figure out whether a column is nullable or not. SQLx started with considering columns in doubt as not nullable, which obviously leads to problems for more complex queries. To workaround this, SQLx accumulats a lot of complexity by inspecting query plans via EXPLAIN statements and hooking the SQLite virtual machine. Their issue tracker indicates that their inference still considers fields as non-nullable in specific cases.

Clorinde seems to choose a more conservative approach, defaulting to nullable columns if information do not exist.

Overall I personally consider purely relying on the database system as dead end. After all the main problem is that the database system is unable to provide the required information in the first case, so it looks for me that using various different ways to essentially access the same incomplete information won't lead to better results.

Analyzing the SQL itself

Another possible approach is to analyze the SQL query itself. This seems to be implemented by the sqlc tool written in Go. Their tool is able to correctly infer most of the problematic cases shown before and also seems to be able to handle more complex queries as well.

Now analyzing SQL sounds like a total nightmare as you would need to mirror every database specific detail to be 100% correct. It sounds like using this approach would essentially require reimplementing large parts of the database system for which you want to analyze queries.

From looking at the sqlc code they roll their own SQL parser to construct an AST out of the parsed queries. They then use the parsed AST like a real compiler, applying various passes on top of it.

Choosing an approach for Diesel

Especially SQLx seems to have extended the "Just ask the database" approach to its maximum. While this approach can generate reasonable results for some queries, it appears fundamentally limited in what can be supported. In the end, you will always end up with incomplete or incorrect results using this approach without changes to the database system itself.

Analyzing SQL sounds complicated as well: You need to deal with parsing complex queries and later analyze them. Fortunately, the Rust ecosystem already has a mature widely used crate for parsing SQL. This crate simplifies much of the complexity otherwise needed to implement this approach. Starting from the parsed AST it is much easier to implement targeted passes to correct nullability of specific constructs one by one as you can reason about expressions in isolation. It certainly also helps that all database systems supported by Diesel normalize queries used to define VIEWs to a certain degree. This involves already resolving wildcards (*) and always using qualified names for columns and other fields.

My personal approach for Diesel here is to gradually start resolving nullability for fields by concentrating on specific examples. This will include the following cases from the start:

  • Literal values
  • Columns from tables
  • Columns from joins

Later, I plan to add various heuristics to support functions, operators, and more complex query constructs. Information about how to propagate nullability for such expressions is already encoded by Diesel's DSL in the Rust type system for some of these cases, so we already know how it internally works.

I plan to publish this analyzing part as a new separate crate so that it can be used by other crates as well. A first draft of this crate functionality exists in this PR: Add a new diesel_infer_query crate.

Summary and Outlook

The next Diesel feature release will add a new diesel_infer_query crate to analyze SQL statements. For now this will only include information about whether a field returned by a certain query can contain NULL values or not. The crate doesn't perform any interaction with the database itself, it allows users to instead provide a resolver to get type information about certain basic constructs like table/view fields if required. This hopefully enables different users to provide their own source for this information.

For the next Diesel feature release, this crate will power the inference whether a field of a VIEW contains NULL values or not. We likely will consider this inference experimental for at least one Diesel version to give us some room to improve the accuracy on real-world use cases. You already can help with that by sharing your VIEW definitions.

In the longer run, I imagine that this functionality is also used to provide a better way to express plain SQL queries in Diesel. We could use our schema.rs files as data source instead of connecting to a database there. Possibly it's also feasible to utilize Rustdoc's JSON format to extract information already encoded in the Rust type system via Diesel's DSL, making the inference aware of all these cases already handled by Diesel. I'm looking forward to seeing people experimenting with this in the future.

Finally, I would like to use this opportunity to thank Prototype Fund again for funding this project. Without their help and funding, none of this wouldn't have been possible. They also put out a new call for applications for next year's round of funding: If you have ideas you want to work on make sure to apply for funding until 30.11.2025 here. I enjoyed working with them and can only recommend others to apply as well.