Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation request - synchronous function in async #629

Closed
askhogan opened this issue Sep 21, 2014 · 7 comments
Closed

Documentation request - synchronous function in async #629

askhogan opened this issue Sep 21, 2014 · 7 comments
Assignees
Labels

Comments

@askhogan
Copy link

I recently ran into an issue with async.eachSeries where I ran into a maximum call stack exceeded. There are several closed issues related to this within the issues sections. Most of them come with github members writing the following comment

"Dont call a synchronous function in async"

or the program maintainer @caolan writing

"I've added some code to call async.nextTick if it detects a synchronous iterator which should avoid a stack overflow in these cases. But seriously, STOP CALLING SYNC FUNCTIONS IN ASYNC! ...make your functions consistently synchronous or consistently asynchronous instead ;)"

I understand that over time people consistently doing the same behavior is annoying, which leads to the all caps type response. However, for people running into this issue, my guess is that they didn't start out the day saying to themselves, "I really love calling sync functions in async", my guess is they were probably drawn to node by how fast it was and the ability to operate in async with good intentions. They probably wrote some methods, placed them in async, got the error, and then when they read the common response they had no idea what you were talking about, because this response doesn't give any examples (and there is nothing in the pitfalls section about this very important rule) It would be helpful if some examples of this behavior were cited in the Common Pitfalls section.

In the error I am receiving, obviously I am somehow calling a sync function in async (even after stepping through all code and not realizing how). Each one of the methods that I use, uses async callbacks. I do have a nested eachSeries in one of the methods (so this may be it). All of the guides on the net say to use process.nextTick() quite often, but I have read comments where @colan says this is unnecessary with async... so I am a bit stumped whether I should be converting all my callbacks to process.nextTick() to make them async or what I should be doing.

tl;dr - please add code examples of using async.eachSeries or async.each and what calling a sync function in async behavior is.

@arch1t3ct
Copy link

+1 Why is this being ignored?

@beaugunderson
Copy link
Collaborator

@askhogan & @arch1t3ct: I'm helping with docs now.

I think the primary confusion @ashkogan is or was having is that a function that calls a callback does not actually make it asynchronous in the way that the async library requires.

It's the difference between this:

function notActuallyAsync(cb) {
   cb(null, 'hello world');
}

and this:

function actuallyAsync(cb) {
   setImmediate(function () {
      cb(null, 'hello world');
   });
}

The first function does not give I/O event callbacks already in the event queue a chance to fire. The second one does. Anything that does actual asynchronous I/O is OK to use without wrapping in setImmediate (like the async fs functions, the request library, etc.). Furthermore, as long as an outer function does asynchronous I/O an inner function could be synchronous without issues.

The confusion appears to be with the word 'async' itself, which is doubly confusing because it's also the name of the library. Does this help clear things up at all? I agree that the requirement should be documented in the README.

@askhogan
Copy link
Author

This is the part that confuses me though. I feel like there are 2
disparate examples of the "Don't do synch in asynch!" and I still haven't
figured out what the maintainer's are talking about.

One
There is the literal example, which is easier to understand, but still
difficult to avoid, where you are iterating over a large collection.
Somewhere inside that async function you just want to do some synchronous
processing of the returned data.

async.eachSeries(arr, function(el, cb){

   request({uri: el.uri}, function (error, response) {
        if (error || response.statusCode !== 200) {
            return cb(error);
        }

        // This is doing synchronous in the literal sense, inside of async
        var news = _.findWhere(response.data, {newsroom: "The New York Times"})

        cb(null, news);
    });

}, callback);

So then I believe the solution is to simply avoid the synch processing inside of asynch and instead send it outside to a function that accepts a callback. But then, this is where it gets muddy, because
even here, people say a callback isn't async unless its wrapped... see two.

Two
Then there is this one, which makes me second guess whether I am writing
code properly. Maybe I am using an async framework, but not writing async
code! Like some sort of SHAM! :() This second guessing comes from
articles on the internet posting that the only true way to make a function
asynchronous is to wrap the callback in process.nextTick(). So

http://howtonode.org/understanding-process-next-tick

So when I write code that passes callbacks from one function to the next, I
start second guessing whether I am writing synchronous code in async. Like
even if I modularize all my synchronous code out of async as the
maintainer's mention, I am still working in a synchronous manner. See how
cb gets called and recalled.

    function updateOrganization(port, organization, cb) {
        //update organization port in background
        var phoneConfig = organization.phoneConfig;
        phoneConfig[0].port = port.toString();
        var meta = organization.meta;
        meta.syslogPort = port;

        organization.update({
            phoneConfig: phoneConfig,
            meta: meta
        }, cb);
    }

This one is harder to understand, because even in popular libs on NPM, I do a code search for this function's usage, and do not see it used in abundance.

Does this make sense? I guess for me, the statement "Don't write sync in async" - without any examples, is just specific enough to get me most of the way there, but without the same level of library experience as the maintainers have, I am left wondering what they are talking about.

@rlidwka
Copy link

rlidwka commented Nov 30, 2014

Somewhere inside that async function you just want to do some synchronous
processing of the returned data.

Synchronous processing is fine (all processing in io.js is synchronous anyway, because it's single-threaded).

It's all about not calling cb until the function is finished executing (i.e. on the same tick). Check this example:

console.log('start')
doSomething(function() {
  console.log('bang')
})
console.log('finish')

If it outputs "start, bang, finish", the function doSomething is synchronous. If it outputs "start, finish, bang", it is asynchronous.

So your example is good, because request is asynchronous function, and it's not supposed to call cb right away. If it does, it's a bug there, but your code is fine nonetheless.

That said, it's more like a code style thing, which many people (myself included) don't always follow. Just keep in mind to add setImmediate whenever you're getting stack overflow issues (and whenever you work with large collections), and you'll be fine. It's a good thing to have anyway.

@beaugunderson
Copy link
Collaborator

OK, let's look at both examples:

async.eachSeries(arr, function(el, cb){
   request({uri: el.uri}, function (error, response) {
        if (error || response.statusCode !== 200) {
            return cb(error);
        }

        // This is doing synchronous in the literal sense, inside of async
        var news = _.findWhere(response.data, {newsroom: "The New York Times"});

        cb(null, news);
    });
}, callback);

The above example is absolutely fine. Calls to request give queued I/O events time to run.

    function updateOrganization(port, organization, cb) {
        // Update organization port in background
        var phoneConfig = organization.phoneConfig;
        phoneConfig[0].port = port.toString();
        var meta = organization.meta;
        meta.syslogPort = port;

        organization.update({
            phoneConfig: phoneConfig,
            meta: meta
        }, cb);
    }

Whether this is OK or not depends on the implementation of organization.update--is it using request or a library for mongo, postgres, etc. that does I/O asynchronously? If yes, then this is fine to use inside of a call to async. If it instead looked like this:

organization.update = function (values, cb) {
  fs.writeFileSync('./output.json', JSON.stringify(values));

  cb();
};

then you'd need to wrap the call to organization.update inside setImmediate.

I should also note that the resource you linked to about process.nextTick is out of date; I believe setImmediate is preferable here.

@aearly
Copy link
Collaborator

aearly commented May 19, 2015

BTW, if you have a function that sometimes calls it's callback on the same tick, you can wrap your callback with dezalgo or your iterator with acomb.ensureAsync to work around this.

@aearly aearly closed this as completed in b96705a Jun 1, 2015
@aearly
Copy link
Collaborator

aearly commented Jun 1, 2015

ensureAsync will be released as part of v1.1.0, since we are now using it internally in a few places!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants