Bizarre behaviour with long running transactions

Gary · November 19, 2015, 8:12am

I’m having some bizarre issues, that are going to be a pain to re-create, so I thought I’d solicit the community’s input whilst I try to recreate the error!

I have a page that ends up creating a parent record and some child records. This child object is a child of two objects; both objects have roll up summaries. One of them itself is a child object of yet another object. There’s other stuff going on as well with Processes built with Process Builder, but you get the idea - the system has a lot to do when these records are saved. They are saved with an action that saves both models and has the “rollback on error” option checked.

There’s a managed package in this org that creates triggers based on an admin’s actions, i.e. they are in the org’s namespace. We have some of these triggers on the parent objects I mentioned before.

Back to Skuid. We were having issues with this page, getting CPU Limit exceeded messages (this is in the JS console). Debugging what was going on is really difficult (is it me or is debugging on Salesforce with managed packages a pain?). Occasionally, the parent object would be created, but not the children (I think I’ve had this addressed before, but need to revisit the answer now I have a bit more data). In the debug logs, I was able to see the FATAL_ERROR.

I suspected these managed-package-but-not-really triggers were to blame, so I deployed the necessary metadata to disable them. Back on the page, it was still reporting the CPU error - but all records were actually created successfully. In the debug logs, no trace of the FATAL_ERROR either. I can see a DML_BEGIN and a DML_END over 30 seconds later (and nothing in-between, despite the fact the system must be doing workflow, processes etc. Stupid debug logs!)

I appreciate this is a rather esoteric issue and it’s unlikely someone is come along and say “Me too! Here’s why…” but does anybody out there have any idea of what the possible causes might be or avenues of investigation? I’m actually thinking of moving the logic into a web service instead (as well as possibly other things to optimise performance - the solution was built to a deadline), as that gives me more options when it comes to figuring out what’s going on, but I’d love to hear ideas as to what may be causing this issue here.

My best guess so far: I’ve seen a Salesforce Known Issue that debug logs mis-report managed packages limits usage. Skuid is exceeding it’s CPU limit, but the transactions that it initiates are not. Therefore Skuid receives an error which is thrown back to the client, but the record creation actually succeeds. Now this sounds non-sensical to me - the work to insert the records would come under Skuid’s limits, and be considered a transaction in the skuid namespace, but maybe I mis-understand how managed packages work on the platform. Some weekend reading for me perhaps…

Thanks in advance, and for reading this far

Zach_McElrath · November 24, 2015, 3:16pm

Gary,

Since the Salesforce Winter 16 release went live, we’ve been seeing scattered occurrences of “Apex CPU timeout limit exceeded” error messages coming from various customer orgs — and other Salesforce ISV partners have been reporting the same issue. The most frustrating thing about these messages to both us and other ISV partners is that we have yet to be able to replicate them on demand. They seem to occur in small bursts at certain times of day, and then they don’t occur again (if they do occur again) until sometime the next day — not at the exact same time, and not in the same portion of code. Which makes this an issue which Salesforce Partner Support has deemed “unreproducible”.

I mention all this in case it’s related to your issue. Your experience of a “reproducible” CPU time limit exceeded error message is encouraging in that it’s at least something to go on.

Which Salesforce Known Issue are you referring to? Your description brings to mind this known issue: https://success.salesforce.com/issues_view?id=a1p300000008Y6EAAU . Could you try disabling all debug logs / turning them down to the lowest possible debug level, and see if the issue still occurs?

As far as a workaround, I think that you’re wise to move relevant portions of your code into asynchronous execution contexts if at all possible, as this will (a) give you separate transaction contexts in which the CPU Limits are enforced (b) asynchronous contexts have higher CPU time limits (60 seconds vs 10 seconds) — see docs. So any logic you can move into an asynchronous execution context should absolutely be moved there.

Hope that helps,

Zach

Gary · November 24, 2015, 9:59pm

Thanks Zach.

I think you’re right, I don’t think this is related to the uptick in CPU errors you’ve seen recently as this is somewhat reproducible, and somewhat understandable in terms of the config and code in the org.

As for the Known Issue I was talking about - I think this is the one I was thinking of: https://success.salesforce.com/issues_view?id=a1p300000008XTbAAM You may note that a. it’s about Heap not CPU, and b. it doesn’t actually mention managed packages at all. But it does chime with me as it suggests something is broken with limits profiling in the debug log, and I think I see this too. But hey - maybe I’m just too lazy/ignorant to really understand the debug log…

I’d love to spend some time digging away and working on a repro, but that tends to impact billable hours, so I’ll need to park this activity until later.

Thanks for the input, I’ll let you know if I make any headway.