Table limits record retrieval to 11,000 records in a table that has more than that

Barry_Schnell · June 17, 2016, 6:06am

I have a table that will have millions of rows. I built a stock Tab page for the object and started clicking “load more.” Unfortunately, when the record count got to 11,000, the load more button disappeared and there was no indication that more records were available.

I can understand that there are some reasons why limiting the number of records in memory in a model would be desirable. That said, having no indication that there is more could lead to business decisions being made (e.g. think Inventory tables) that would be wrong (e.g. oh, we must not have any more widgets, time to go buy some).

Questions:
1) Why did it stop allowing rows to be retrieved
2) Why did it not give any indication that there are more records even if it wouldn’t allow more to be retrieved?

Steps to reproduce
1) Put a lot of records in any object (e.g. hundred thousand or so)
2) Build a stock tab page for the object
3) Preview page
4) click load more

Expected Behavior
All records in object are available for loading OR at least some indication that there is still more out there

Actual Behavior
Only 11,000 records are available and NO indication that there are more

Thanks!

JD_Bell · June 17, 2016, 7:09pm

Skuid uses SOQL’s OFFSET keyword to handle pagination, but there are limits on how many records we can offset. These limits are mentioned in Skuid’s Table Export documentation.

I agree, however, that we could do a better job reporting when we’ve hit those limits. Obviously, Skuid is a presentation layer, and so is ill-suited for arbitrarily large ETL-like tasks. If your users need to download more than the current limits, it’d be better to use a tool with higher memory and computational limits than that of a browser.

We’ll look into better error reporting in these situations.

Barry_Schnell · June 17, 2016, 7:53pm

Thanks for the reply and info JD.

Agreed that large datasets are better suited for ETL tools. The key here, as you mentioned, is that the user needs visibility to when they need to use an ETL tool.

If Skuid can provide information when the limit is reached and there is still more data, this would drive the user to the ETL tools to avoid the potential of making uninformed business decisions.

A few additional questions:

1) Unless I’m misunderstanding the docs, there is a limit to the number of rows retrieved in a single OFFSET statement (2000), but not the total number of rows across multiple OFFSET statement. Given that, why does Skuid stop at 10,000 (or in my repro 11,000)?

2) Given that Skuid uses OFFSET and not queryMore(), the risk of data changing while paging is potentially high. For example, OFFSET statement occurs, new records then inserted, then another OFFSET for next page. This leads to inconsistent results in the pages and potentially data that will be missed all together (missing data is a risk with queryMore as well). Is there a technical reason OFFSET was chosen over queryMore?

3) Currently Skuid maintains all data in the model and new pages are just folded in to the existing model. There are memory management considerations client side so loading up large datasets would not be ideal. That said, from a true paging perspective, why not discard model rows, make the SOQL call and fill the model with the results of the paged data? This would yield a better performing solution all around since the model would only have the “current page” inside of it.

4) From the docs, the max row count is 50,000 for SOQL but 10,000 for QueryLocator (or 110,000 in managed package). Does Skuid use queryLocator? If not, then why is export limited to 10,000 when 50,000 in a single statement is possible?

5) For export itself, the client sends the entire dataset to the server and then the server packages the data. This is fairly inefficient - would it make more sense to just re-issue the SOQL in the ExportData call rather than send all the data down to server? This would also potentially open up a larger dataset to be available in the export itself (max 50,000 unfortunately without using Async or Bulk API).

Thanks!

Matt_Sones · June 18, 2016, 5:57pm

Barry, I’m with you on all these except #3. I often have scenarios where I want to select rows from multiple pages to perform mass actions. Discarding all the rows that are not visible on the page would prevent that helpful feature.

Barry_Schnell · June 19, 2016, 3:56am

Hey Matt -

Agreed on your input on #3. I wasn’t thinking eliminating current behavior, rather to offer some new options. For example:

1) Only maintaining current page in the model; AND/OR
2) Maintain current page and any selected items from any page in the current model (limiting the number of selected items).

#2 is what SFDC standard tab does. In the standard tab, you can scroll through all data in the object with a max of 200 Selected items. This means that you can have at most 400 records in “memory” at a time (200 from page and 200 selected).

As JD mentions and as I eluded to in my OP, memory usage has to be considered but at the same time, there are use cases for when people just want to view data (and there might be a lot of it). Offering the option to control how model rows are managed would provide the best of both worlds

JD - I wasn’t familiar with how queryMore worked when I wrote the above but have read the docs since and I think I see the reasons why Skuid uses OFFSET vs. queryMore. Given the current design (all models rows are maintained in model), limiting to 10,000 makes sense. That said, would be interested in your thoughts on the following:

1) Adding options to Table to control how model rows are mainted (keep 'em all, keep only current page, keep current page plus selected)

2) For export, why Skuid chose to pass the data from the client to the server instead of just re-issuing the SOQL on the server during an export?

I think the most important thing of all this, at least short term, is making sure the user is aware that there is more data when they reach the current limits. I know that for my use case, this is absolutely critical.

Thanks!

Ben_Hubbard · June 20, 2016, 3:38pm

Hey Everyone, Hopefully I can clear some of this up.

1. Skuid only uses OFFSET for pagination some of the time. In the vast majority of cases, it uses Standard Set Controllers. It uses Standard Set Controllers for all custom objects and for the small list of standard objects (shown below) that support Standard Set Controllers.

(‘Account’,‘Asset’,‘Campaign’,‘Case’,‘Contact’,‘Contract’,‘Idea’,‘Lead’,‘Opportunity’,‘Quote’,‘Pricebook2’,‘Product2’,‘Solution’)

For all other standard objects, (Task is a common one) Skuid has to use OFFSET because Standard Set Controllers are not supported for that object. OFFSET is very limited because the max OFFSET that can be used is 2000, that means the largest amount of data you can retrieve via this method is 2000 + your query limit.

2. Standard Set Controllers / OFFSET were chosen over queryMore() because they are available to be called straight from APEX, and do not require an API call. I believe missing data would be a problem in all of these approaches. The best way to mitigate this would be to put a good “Order by” on your model. I think “Id” is usually a good choice.

3. In my experience, modern browsers are great at handling large amounts of data in-memory without many problems. 100s of thousands or even millions of records should be possible. You will start running into performance issues when you start trying to iterate over these models to do summaries, or try to use something like MODEL_LOOKUP ui-only formula fields. It’s the DOM that gets really slow, quickly. Skuid already doesn’t add records to the DOM until you actually paginate to that page. I don’t think clearing records out of models because of pagination would be really worthwhile in most cases.

4. Actually, since skuid runs in “read-only” mode, the max row count is 1,000,000. But try processing anything near that amount of data in apex (outside of batch apex) and you’ll run into heap size errors. The export has to come from the client, as I’ll explain in #5. So the limit here is going to be how much data you can get into your model, and possibly heap size issues in Apex.

5. I haven’t looked at the export code in a while, but I believe you can export template fields, ui-only fields, formatted reference fields (with display templates), etc. This is all processed on the client. If you just need a straight up DB dump of your data, there are other tools that may be better suited for that job.

As for the issue with Skuid not alerting users to truncated data, I think this is a bug. You actually get this message when you create a table on the Task object. Once you try to “load more” past 2000 + query limit, you will see a message at the bottom of your table indicating that the pagination limit has been reached. This should be happening for objects that use the standard set controller, but doesn’t seem to be doing that.

I think the reason that for your particular table, you’re getting 11,000 rows instead of 10,000 is because of the differences in how default ordering works in SOQL vs Standard set controllers. I would guess if you told your model to order by “Id”, it would stop at 10,000 records.

Ben_Hubbard · June 20, 2016, 3:44pm

I forgot to mention, if you really want to get more data out of Apex, you can use a technique called “PK Chunking”. Here’s a blog post about it. https://developer.salesforce.com/blogs/developer-relations/2015/11/pk-chunking-techniques-massive-or…

Barry_Schnell · June 21, 2016, 7:36am

Ben -

First things first - This is great information, thank you (and JD from earlier) for providing the details behind how Skuid works and also why the decisions were made the way they were. The picture is much clearer now and reading through information vs. digging through code to figure this out (server side wouldn’t even be possible) makes the “learning curve” way more efficient. To re-iterate my comment to JD above, not trying to make Skuid an ETL tool, just trying to leverage as much as possible from it. Keeping users in “Skuid” is the preference vs taking to other tools/pages. Understanding how Skuid functions is critical to accomplish that goal - getting the most from it as possible while while also not pushing it beyond it’s intended limits.

Secondly, I’m glad you feel that the lack of messaging/warning regarding truncated data is a bug (I agree it’s a bug and not an enhancement/idea). To that end, I’ve created a new issue to track that since this thread has taken multiple tangents.

Thirdly, appreciate the PK Chunking pointer, very cool stuff that very well (albeit unfortunately) might come in handy for some of my use cases.

Fourthly (is that a word? lol), You are correct, Skuid exports UI fields, templates, etc. Also, there’s external data sources to consider so the way Skuid handles exports makes perfect sense.

And lastly, a couple of questions -

1) Agreed that modern browsers handle large datasets well (as long as everything isn’t sitting in the DOM which Skuid does a great job of avoiding). Since Skuid uses Set Controllers most of the time, why the 10,000 row limit for model data then? As long as the page size is small enough, why not let users pull data in a paged manner to the very end? I believe this limit was in-place prior to UI Only formula fields so the only thing I can think of is protecting against exporting huge quantities of data (maybe export just gets disabled after 10,000 rows are in model). Am I not connecting enough dots on this?

2) I think I’m getting 11,000 because my initial model row count is 5000, wasn’t using an order by (as you suspected) and therefore it would go from 5,000 to 7,000 to 9,000 and finally 11,000. Adding an Order by effected it to go from 5,000 to 6,000 then 8,000 and finally 10,000. Unclear why the order by changed the first page to be only 1,000 instead of 2,000. Thoughts? Also, should the unordered scenario stop at 10,000 or be allowed to go to 11,000?

Thank you!

Ben_Hubbard · June 21, 2016, 2:29pm

Hi Barry,

1. This 10,000 limit is not a Skuid enforced limit, it’s a Salesforce limit. We would love it if the Standard Set Controller could be initialized with QueryLocators with more than 10,000 rows. The only way around this is with PK Chunking or some other way to change the conditions. You could have a condition that asked for records with names starting with A-D, then move on through E-G and so on.

2.

Scenario 1: No Order By

a. SOQL Loads in 5000 records

b. Skuid Limits Standard Set Controller Page Size to 2000 and Page Number to 3 (Math.Ceil(Model.data.length/pageSize)). Standard Set Controller’s default ordering is different than SOQL, so we get 2000 new records that were not previously in the model.

c. We continue to get 2000 new records with each batch until pages 3,4, and 5 are loaded. If we had loaded in pages 1 and 2, we could have loaded a total of 15,000 records if Skuid didn’t find duplicates. 5,000 from SOQL and 10,000 from the Standard Set Controller. I’ve never researched what the exact default ordering for SOQL and Standard Set Controllers are, but I do know that they are different.

Scenario 2: Order By Included

a. SOQL Loads in 5000 records

b. Skuid Limits Standard Set Controller Page Size to 2000 and Page Number to 3 (Math.Ceil(Model.data.length/pageSize)). Since the ordering is the same, we get records 4,000 through 6,000. 1,000 of these records are duplicates that are already in the model. Skuid simply updates the 1,000 duplicate records with possible new data, and adds in the new 1,000 records.

c. The rest of the loads are pretty straightforward because 2,000 new records are added each load until the 10,000 limit is reached.

Barry_Schnell · June 21, 2016, 3:25pm

Hey Ben -

Thanks again for quick reply and info.

#1 - Ok, I see the 10,000 limit with Standard Set Controllers (https://developer.salesforce.com/page/Paginating_Data_for_Force.com_Applications).

Question - Why does a standard SFDC Tab page allow for more than 10,000 records to be returned (page size 200)? What technique is standard tab page using?

#2 - Makes sense now. The dot I wasn’t connecting on this one was the first call is SOQL, the rest are controller methods but based on what the math says the next page should be.

Thanks!

Ben_Hubbard · June 21, 2016, 3:28pm

Unfortunately, Salesforce often “cheats” and they don’t use their own apis in their standard pages. Maybe something like queryMore could get us to more than 10,000 records, but we haven’t explored that yet.

Barry_Schnell · June 21, 2016, 3:52pm

Never liked cheaters lol

I created a new issue to explore adding the functionality of going beyond 10,000 rows. Thanks again for all your help!