People are intrigued that NoSQL databases are exploring different data models that promise much easier scaling and fault tolerance. But these same people are justifiably daunted by the prospect of abandoning their extensive SQL-based applications, retraining their developers to interact with databases in new ways and then rebuilding applications from scratch (applications are rarely ported from SQL to NoSQL). So it’s an appealing thought: being able to port existing SQL apps with little or no development work, take advantage immediately of scaling and fault tolerance, and adapt to a NoSQL API to take advantage of schema flexibility and enhanced performance.
However, there are several possible migration paths, and migrating to NoSQL isn’t always an upgrade, either; there’s plenty great about SQL. Let’s look at a few novel approaches.
Optimizing the hotspots
Building applications often involves a lot of work. The architecture of the application can often be thrashed out in under a day (producing a nice whiteboard full of diagrams with databases and clouds, and if you’re lucky, even a specification document!). But actually designing and implementing every single page in a Web application is a lot of work. Even new applications are worth building in SQL, simply because it’s quite easy to use for many tasks. In addition, developers will already know it, so the cost of retraining doesn’t have to be added in.
However, the way GenieDB integrates SQL and NoSQL access means both can be used to access the same piece of data. It’s possible for an application to be written using the MySQL plugin interface. But when parts of the application turn out to be performance bottlenecks, those parts can be individually rewritten to use the native API, to gain sub-millisecond query latency and significantly enhanced throughput. For example, a URL shortening service might have an extensive user interface for creating and managing links -- but the vast majority of load on the system (and the crucial part!) comes from the quite simple part that, when a request comes in for a shortened URL, logs the fact and issues the HTTP redirect to the target URL. It would be simple to rewrite this one part of the application from an initial SQL implementation to one using the native API, even without reading more than the examples in the manual - as all it has to do is fetch a record by primary key (to look up the link) and insert a new record (to log the use of the link). Yet doing so can shave tens of milliseconds from the query processing time and decrease overall system resource consumption, meaning that users get forwarded more quickly and each server can handle more requests per second.
Queryable subset
Some applications benefit greatly from escaping the rigid data model of SQL in the first place. In particular, highly object-oriented systems such as multiplayer games are a bad fit to rigid schemas; the many different items a player may carry in his inventory all have some properties in common (such as weight and appearance) but also have many differing properties (weapons have damage statistics, while armor, potions, MacGuffins, and so on all have very different information). Most ways of dealing with this in SQL are rather cumbersome. A data model where each record can have whatever fields are appropriate to it (and even compound values in fields, such as lists of items) can make programming such an application much easier, and remove the burden of “schema migration” when uploading new software releases to the existing live system.
However, such exciting systems often have more conventional components -- such as billing and management reporting -- that have a naturally regular data model and benefit from the ability of the SELECT statement to perform complex analytic queries on tabular data. They also benefit from being able to use existing tools and techniques.
Tables that use a regular schema can be exposed via the MySQL plugin. Indeed, even tables with complex schemas can; it’s just that MySQL might not be able to make much sense of the complex parts. But if every object in the Inventory table has a primary key, a value, and an owner, the complex type-specific fields can just be ignored by MySQL - and the central reporting system can do a simple SELECT to count up the total value of objects owned by each player.
Multiple views
SQL gives you the ability to create “views” - virtual tables, in effect, generated on the fly from a SELECT statement. This is usually done in order to simplify application logic by removing detail that parts of the application need not be concerned with, or for security (by giving different access control rights to different views, parts of the database may be exported to different parts of the application in a fine-grained way). However, that’s all on a single server. Because using GenieDB as a distributed SQL database involves running lots of independent MySQL servers, you can actually give the same table a different schema on each MySQL server, and therefore see different columns.
This means that different components of a large multi-component application can run against their own MySQL servers, seeing different tables and different columns of shared tables, from a single shared “master database”.
Say you have an e-commerce system with a public Web interface where orders can be placed, a fulfillment center where orders are processed and invoices raised if the orders can be fulfilled, and a payment processing center where invoices are handled and (hopefully) closed. It is desirable to give the Web interface access to the catalogue and the orders table, while fulfillment needs access to orders and invoices, and payment needs access to invoices – and everyone needs access to the customers table, if only to get billing/invoicing addresses. Needless to say, the head office wants access to everything.
Using GenieDB, a single distributed database can be run over servers in the head office, in the data center for hosting the Web site, and at the fulfillment and payment processing offices; MySQL is configured to reveal only the respective tables and columns. This keeps the visible schema for each component of the system simple, and provides added security. This arrangement also ensures that, if the links between sites go down, business operations will still continue at each location - and when the links return, new orders will automatically flow to fulfillment, new invoices to payment processing and so on.
Conclusion
Having multiple interfaces to the same data gives application designers new flexibility to use different tools and techniques for different parts of their application - which is good when different parts of the application have differing requirements!