Amir Netz’s two billion and seven rows demo at last year’s BI conference was not just demoware. Unlike PowerPivot, where the amount of data you can load is bound by the workbook size, the amount of data you can load into tabular models is limited only by server capacity. This is true at both design time and at deployment time.
If you have a workspace database server that’s got enough RAM to hold and manipulate your billions of rows, you really can sort, filter, add columns, etc on your tabular model from the designer today. This is one of the big advantages of the out of proc server architecture. You can use a standard developer desktop, connect to a beefy workspace database server, and still get decent responsiveness for a large row set. The size of the metadata (number of tables, columns, etc) has a much greater impact on the performance of the designer than the size of the data.
There are a couple of things to keep in mind when working with large row sets. First, you should ensure that your Data Backup property is set to “Do not backup to disk” before dealing with models this size. This is the default, so don’t worry too much here. Data backups are useful sometimes (I’ll explain later), but the gains to be had by backing up are lost by the longer save times and higher disk space requirements on developer workstations.
Also, the tabular designer works against the metadata and the data, and there is no way to separate the two. That means the mandatory first step for any tabular project is to import from a data source, which processes a bunch of data. Although working with the data is fast enough, processing the data still takes time.
If you would like to cut your data set down to size to reduce processing time, you can do one or more of the following:
That last option, importing the first partition only and then adding more later, is a very simple and powerful way to create a subset of your data. I will go through this step-by-step in my next post. It is definitely not the only way to work with large data, but it is perhaps the most accessible way to get up and running quickly with a large row set.