Table Toolkit: fast iteration data development
Table Toolkit is a data development tool that aims for tech savvy business analysts to get more done, by making it easy to work across the entire data spectrum using one tool, and increase iteration speed.
Breaking down the walls between ETL and report development
Table Toolkit is built on the assumption that for a quick iteration speed on data analysis work, you need to be able to visualize your results easily. This allows you to make a quick judgement calls whether your result makes sense or needs more work, and makes it easy to show others you work to ask for feedback.
Every iteration will often require work at the ETL and reporting layer. In a traditional setup, these layers are handled by vastly different tools, and often by different people. This creates a lot of overhead due to communication and non-transferable knowledge, which slows things down.
Table Toolkit allows you to prototype the report and any ETL that you may need to power it from the same tool - from the same file even. This allows your most productive team members to get things done without things getting in their way. They can do everything to build an MVP for a report in the scope of one tool, and ask for feedback in quick iterations.
For somebody with experience in a traditional data organization with separate ETL engineers and analysts using different tools, the mixing of ETL and reporting concerns in a single file may seem weird or wrong at first. Breaking down these walls is the key to productivity though. This is similar to how React first introduced components with HTML, CSS, and Javascript combined in a single file. This looked off-putting for purists, but this was the key design decision that allows front-end developers to be more productive in building web application frontends.
Including ETL concerns into your report is an example of leveraging technical debt. In the end, you may want to move the ETL parts of the report to your ETL framework so the resulting dataset is available for everyone to use. However, being able to postpone this, you create leverage that allows you to quickly verify that a dataset makes sense for its intended purpose before making the investment in building a proper ETL script. Table Toolkit aims to make it easy to migrate the dataset building parts of your report into the ETL framework of your choice.
Increase iteration speed by caching intermediate results
Table Toolkit was designed by looking at the process of developing a report and seeing how it can be accelerated, rather than optimizing for the end result. The idea is that the quicker report developers can iterate, the faster they can deliver a result that provides answers to the questions you are looking for.
The main realization is that this iterative process involves a series of small edits to the queries and calculations that make up the report. If every change requires the entire report to be re-evaluated, including performing queries and running expensive calculations, this process ends up being painfully slow. Table Toolkit automically caches all intermediate results, and is intelligent about what it can reuse after a small edit is made. It will live- update the report in the browser as you work on a file to give you the quickest feedback possible on your work.
This compares to the virtual DOM in React, which allows you to write a web page from scratch without taking current state into account, and the virtual DOM will figure out the minimal change to update the page to reflect the desired state. Table Toolkit will figure out the “minimum calculation” to evaluate your report.
This is also very much like how spreadsheets work, which will only re-evaluate cells that have a dependency on cells that have changed. However, Table Toolkit will avoid some of the pitfalls of spreadsheets, and will bring the best parts of the software development process to the business analysis domain.
Bringing the best of software development to business analysts
- Version control
- Using your editor of choice.