Skip to content

Latest commit

 

History

History
466 lines (298 loc) · 30.2 KB

02-bootstrap.md

File metadata and controls

466 lines (298 loc) · 30.2 KB

The Bootstrap Process

So George's request for /about-us has been handed to Drupal, and index.php is ready to bootstrap Drupal. What does that mean?

A quick summary

At a code level, we're talking about the drupal_bootstrap function, which lets you pass in a parameter to tell it which level of bootstrap you need. In almost all cases, we want a "full" bootstrap, which usually means "this is a regular page request, nothing weird, so just give me everything."

What is "everything"? I'm glad you asked. All of the possible values for the parameter for drupal_bootstrap() are listed below. Note that they are run sequentially, meaning if you call it with DRUPAL_BOOTSTRAP_CONFIGURATION then it will only do that one (#1), but if you call it with DRUPAL_BOOTSTRAP_SESSION then it will do that one (#5) and all of the ones before it (#1-4). And since DRUPAL_BOOTSTRAP_FULL is last, calling it gives you everything in this list.

  1. DRUPAL_BOOTSTRAP_CONFIGURATION: Set up some configuration
  2. DRUPAL_BOOTSTRAP_PAGE_CACHE: Try to serve the page from the cache (in which case the rest of these steps don't run)
  3. DRUPAL_BOOTSTRAP_DATABASE: Initialize the database connection
  4. DRUPAL_BOOTSTRAP_VARIABLES: Load variables from the variables table
  5. DRUPAL_BOOTSTRAP_SESSION: Initialize the user's session
  6. DRUPAL_BOOTSTRAP_PAGE_HEADER: Set HTTP headers to prepare for a page response
  7. DRUPAL_BOOTSTRAP_LANGUAGE: Initialize language types for multilingual sites
  8. DRUPAL_BOOTSTRAP_FULL: Includes a bunch of other files and does some other miscellaneous setup.

Each of these are defined in more detail below.

1. DRUPAL_BOOTSTRAP_CONFIGURATION

This guy just calls _drupal_bootstrap_configuration(), which in turn does the following:

Sets error and exception handlers.

set_error_handler('_drupal_error_handler');
set_exception_handler('_drupal_exception_handler');

These lines set a custom error handler (_drupal_error_handler()) and a custom exception handler (_drupal_exception_handler) respectively. That means that those functions are called when Drupal encounters a PHP error or exception.

These functions each go a few levels deep, but all they're really doing is attempting to log any errors or exceptions that may occur, and then throw a 500 Service unavailable response.

Initializes the PHP environment

drupal_environment_initialize()

The drupal_environment_initialize() function called here does a lot, most of which isn't very interesting. For example:

  • It tinkers with the global $_SERVER array a little bit.
  • It sets the configuration for error reporting
  • It sets some session configuration using ini_set()

Boring.

That said, it does have this nugget:

$_GET ['q'] = request_path();

It might not look like much, but this is what makes Clean URLs work. We always need $_GET['q'] to be set to the current path because $_GET['q'] is used all over the place. If you have Clean URLs disabled, then that happens by default, because your URLs look like yoursite.com/?q=about-us. So the line of code above will call request_path(), which sees that $_GET['q'] already exists, and returns it directly.

But if you have Clean URLs enabled (you do, right?), and your URLs look like yoursite.com/about-us, then $_GET['q'] is empty by default, and that just won't do. To fix that, it gets populated with the value of request_path(), which basically just cleans up the result of $_SERVER['REQUEST_URI'] (i.e., removes query strings as well as script filenames such as index.php or cron.php) and returns that.

Starts a timer

timer_start('page');

This is actually pretty nifty. Drupal has a global $timers variable that many people don't know about.

Here, a timer is started so that the time it takes to render the page can be measured.

Initializes some critical settings

drupal_settings_initialize();

The drupal_settings_initialize() function is super important, for exactly 2 reasons:

  1. It includes the all-important settings.php file which contains our database connection info (which isn't used yet), among other things.
  2. It creates many of our favorite global variables, such as $cookie_domain, $conf, $is_https, and more!

And that's the end of the CONFIGURATION bootstrap. 1 down, 7 to go!

2. DRUPAL_BOOTSTRAP_PAGE_CACHE

When bootstrapping the page cache, everything happens inside _drupal_bootstrap_page_cache() which does a lot of work.

Includes cache.inc and any custom cache backends

require_once DRUPAL_ROOT . '/includes/cache.inc';
foreach (variable_get('cache_backends', array()) as $include) {
  require_once DRUPAL_ROOT . '/' . $include;
}

This bit of fanciness allows us to specify our own cache backend(s) instead of using Drupal's database cache.

This is most commonly used to support memcache, but someone could really go to town with this if they wanted, just by specifying (in the $conf array in settings.php) an include file to use (such as memcache.inc) for whatever cache backend they're wanting to use.

Checks to see if cache is enabled

// Check for a cache mode force from settings.php.
if (variable_get('page_cache_without_database')) {
  $cache_enabled = TRUE;
}
else {
  drupal_bootstrap(DRUPAL_BOOTSTRAP_VARIABLES, FALSE);
  $cache_enabled = variable_get('cache');
}

You'll note that the first line there gives you a way to enable cache from settings.php directly. This speeds things up because that way it doesn't need to bootstrap DRUPAL_BOOTSTRAP_VARIABLES (i.e., load all of the variables from the DB table) which would also force it to bootstrap DRUPAL_BOOTSTRAP_DATABASE, which is a requirement for fetching the variables from the database, all just to see if the cache is enabled.

So assuming you don't have $conf['page_cache_without_database'] = TRUE in your settings.php file, then we will be bootstrapping the variables here, which in turn bootstraps the database. Both of those will be talked about in more info in a minute.

Blocks any IP blacklisted users

drupal_block_denied(ip_address());

This does exactly what you'd expect - checks to see if the user's IP address is in the list of blacklisted addresses, and if so, returns a 403 Forbidden response. This doesn't strictly have anything to do with caching, except for the fact that it needs to block cached responses from blacklisted users and this is its last chance to do that.

An interesting thing to note here is that the ip_address() function is super useful. On a normal site it just returns regular old $_SERVER['REMOTE_ADDR'], but if you're using some sort of reverse proxy in front of Drupal (meaning $_SERVER['REMOTE_ADDR'] will always be the same), then it fetches the user's IP from the (configurable) HTTP header. Pretty awesome.

But beware that if you have $conf['page_cache_without_database'] = TRUE in settings.php then it won't fetch blocked IPs from the database, because it wouldn't have bootstrapped DRUPAL_BOOTSTRAP_VARIABLES yet (re-read the previous section to see what I mean). Tricky, tricky!

Checks to see if there's a session cookie

if (!isset($_COOKIE [session_name()]) && $cache_enabled) {
  ...fetch and return cached response if there is one...
}

It only returns a cached response (assuming one exists to return) if the user doesn't have a valid session cookie. This is a way of ensuring that only anonymous users see cached pages, and authenticated users don't. (What's that? You want authenticated users to see cached pages too?)

What's inside that "fetch and return cached response" block? Lots of stuff!

Populates the global $user object

$user = drupal_anonymous_user();

The drupal_anonymous_user() function just creates an empty user object with a uid of 0. We're creating it here just because it may need to be used later on down the line, such as in some hook_boot() implementation, and also because its timestamp will be checked and possibly logged.

Checks to see if the page is already cached

$cache = drupal_page_get_cache();

The drupal_page_get_cache() function is actually simpler than you'd think. It just checks to see if the page is cacheable (i.e., if the request method is either GET or HEAD, as told in drupal_page_is_cacheable()), and if so, it runs cache_get() with the current URL against the cache_page database table, to fetch the cache, if there is one.

Serves the response from that cache

If $cache from the previous section isn't empty, then we have officially found ourselves a valid page cache for the current page, and we can return it and shut down. This block of code does a few things:

  • Sets the page title using drupal_set_title()
  • Sets a HTTP header: X-Drupal-Cache: HIT
  • Sets PHP's default timezone to the site's default timezone (from variable_get('date_default_timezone'))
  • Runs all implementations of hook_boot(), if the page_cache_invoke_hooks variable isn't set to FALSE.
  • Serves the page from cache, using drupal_serve_page_from_cache($cache), which is scary looking but basically just adds some headers and prints the cache data (i.e., the page body).
  • Runs all implementations of hook_exit(), if the page_cache_invoke_hooks variable isn't set to FALSE.

And FINALLY, once all of that is complete, it runs exit; and we're done, assuming we got this far.

Otherwise, it doesn't do any of the above, and just sets the X-Drupal-Cache: MISS header.

Whew. That's a lot of stuff. Luckily, the next section is easier.

3. DRUPAL_BOOTSTRAP_DATABASE

We're not going to get super in the weeds with everything Drupal does with the database here, since that deserves its own chapter, but here's an overview of the parts that happen while bootstrapping, within the _drupal_bootstrap_database() function.

Checks to see if we have a database configured

if (empty($GLOBALS ['databases']) && !drupal_installation_attempted()) {
  include_once DRUPAL_ROOT . '/includes/install.inc';
  install_goto('install.php');
}

Nothing fancy. If we don't have anything in $GLOBALS ['databases'] and we haven't already started the installation process, then we get booted to /install.php since Drupal is assuming we need to install the site.

Includes the database.inc file

This beautiful beautiful database.inc file includes all of the database abstraction functions that we know and love, such as db_query() and db_select() and db_update().

It also holds the base Database and DatabaseConnection and DatabaseTransaction classes (among a bunch of others).

It's a 3000+ line file, so it's out of scope for a discussion on bootstrapping, and we'll get back to "How Drupal Does Databases" in a later chapter.

Registers autoload functions for classes and interfaces

spl_autoload_register('drupal_autoload_class');
spl_autoload_register('drupal_autoload_interface');

This is just a tricky way of ensuring that a class or interface actually exists, when we try to autoload one. Both drupal_autoload_class() and drupal_autoload_interface() just call registry_check_code(), which looks for the given class or interface first in the cache_bootstrap table, then registry table if no cache is found.

If it finds the class or interface, it will require_once the file that contains that class or interface and return TRUE. Otherwise, it just returns FALSE so Drupal knows that somebody screwed the pooch and we're looking for a class or interface that doesn't exist.

So, in English, it's saying "Ok, it looks like you're trying to autoload a class or an interface, so I'll figure out which file it's in by checking the cache or the registry database table, and then include that file, if I can find it."

4. DRUPAL_BOOTSTRAP_VARIABLES

This one just calls _drupal_bootstrap_variables(), which actually does a good bit more than just including the variables from the variables table.

Here's what it does:

Initializes the locking system

require_once DRUPAL_ROOT . '/' . variable_get('lock_inc', 'includes/lock.inc');
lock_initialize();

Drupal's locking system allows us to create arbitrary locks on certain operations, to prevent race conditions and other bad things. If you're interested to read more about this, there is a very good API page about it.

The two lines of code here don't actually acquire any locks, they just initialize the locking system so that later code can use it. In fact, it's actually used in the very next section, which is why it's initialized in this seemingly unrelated phase of the bootstrap process.

Load variables from the database

global $conf;
$conf = variable_initialize(isset($conf) ? $conf : array());

The variable_initialize() function basically just returns everything from the variables database table, which in this case adds it all to the global $conf array, so that we can variable_get() things from it later.

But there are a few important details:

  1. It tries to load from the cache first, by looking for the variables cache ID in the cache_bootstrap table.
  2. Assuming the cache failed, it tries to acquire a lock to avoid a stampede if a ton of requests are all trying to grab the variables table at the same time.
  3. Once it has the lock acquired, it grabs everything from the variables table.
  4. Then it caches all of that, so that step 1 won't fail next time.
  5. Finally, it releases the lock.

Load all "bootstrap mode" modules

require_once DRUPAL_ROOT . '/includes/module.inc';
module_load_all(TRUE);

Note that this may seem scary (OH MY GOD we're loading every single module just to bootstrap the variables!) but it's not. That TRUE is a big deal, because that tells Drupal to only load the "bootstrap" modules. A "bootstrap" module is a module that has the bootstrap column in the system table set to 1 for it.

On the typical Drupal site, this will only be a handful of modules that are specifically required this early in the bootstrap, like the Syslog module or the System module, or some contrib modules like Redirect or Variable.

Sanitize the destination URL parameter

Here's another one that you wouldn't expect to happen as part of bootstrapping variables.

The $_GET['destination'] parameter needs to be protected against open redirect attacks leading to other domains. So what we do here is to check to see if it's set to an external URL, and if so, we unset it.

The reason we have to wait for the variables bootstrap for this is that we need to call the url_is_external() function to check the destination URL, and that function calls drupal_strip_dangerous_protocols() which has a variable to store the list of allowed protocols.

5. DRUPAL_BOOTSTRAP_SESSION

Bootstrapping the session means requiring the session.inc file and then running drupal_session_initialize(), which is a pretty fun function.

Register custom session handlers

The first thing that happens here is that Drupal registers custom session handlers with PHP:

session_set_save_handler('_drupal_session_open', '_drupal_session_close', 
  '_drupal_session_read', '_drupal_session_write', 
  '_drupal_session_destroy', '_drupal_session_garbage_collection');

If you've never seen the session_set_save_handler() PHP function before, it just allows you to set your own custom session storage functions, so that you can have full control over what happens when sessions are opened, closed, read, written, destroyed, or garbage collected. As you can see above, Drupal implements its own handlers for all 6 of those.

What does Drupal actually do in those 6 handler functions?

  • _drupal_session_open() and _drupal_session_close() both literally just return TRUE;.
  • _drupal_session_read() fetches the session from the sessions table, and does a join on the users table to include the user data along with it.
  • _drupal_session_write() checks to see if the session has been updated in the current page request or more than 180 seconds have passed since the last update, and if so, it gathers up session data and drops it into the sessions table with a db_merge().
  • _drupal_session_destroy() just deletes the appropriate row from the sessions DB table, sets the global $user object to be the anonymous user, and deletes cookies.
  • _drupal_session_garbage_collection() deletes all sessions from the sessions table that are older than whatever the max lifetime is set to in PHP (i.e., whatever session.gc_maxlifetime is set to).

If we already have a session cookie, then start the session

We then check to see if there's a valid session cookie in $_COOKIE[session_name()], and if so, we run the drupal_session_start(). If you're a PHP developer and you just want to know where session_start() happens, then you've found it.

That's basically all that drupal_session_start() does, besides making sure that we're not a command line client and we haven't already started the session.

Disable page cache for this request

Remember back in the DRUPAL_BOOTSTRAP_PAGE_CACHE section where I said that authenticated users don't get cached pages (unless you use something outside of Drupal core)? This is the part that makes that happen.

if (!empty($user->uid) || !empty($_SESSION)) {
  drupal_page_is_cacheable(FALSE);
}

So if we have a session or a nonzero user ID, then we mark this page as uncacheable, because we may be seeing user-specific data on it which we don't want anyone else to see.

If we don't already have a session cookie, lazily start one

This part's tricky. Drupal lazily starts sessions at the end of the request, so all the bootstrap process has to do is create a session ID and tell $_COOKIE about it, so that it can get picked up at the end.

session_id(drupal_random_key());

I won't go in detail here since we're talking about the bootstrap, but at the end of the request, drupal_page_footer() or drupal_exit() (depending on which one is responsible for closing this particular request) will call drupal_session_commit(), which checks to see if there's anything in $_SESSION that we need to save to the database, and will run drupal_session_start() if so.

Sets PHP's default timezone from the user's timezone

date_default_timezone_set(drupal_get_user_timezone());

You may remember that the cache bootstrap above was responsible for setting the timezone for cached pages. This is where the timezone gets set for uncached pages.

The drupal_get_user_timezone() is very simple. It just checks to see if user-configurable timezones are enabled and the user has one set, and uses that if so, otherwise it falls back to the site's default timezone setting.

6. DRUPAL_BOOTSTRAP_PAGE_HEADER

This is probably the simplest of the bootstrap levels. It does 2 very simple things in the _drupal_bootstrap_page_header() function.

Invokes hook_boot()

bootstrap_invoke_all('boot');

If you've ever wondered how much of the bootstrap process has to complete before you can be guaranteed that hook_boot will run, there's your answer. Remember that for cached pages, it will have already run back in the cache bootstrap, but for uncached pages, this is where it happens.

Sends initial HTTP headers

There's a little bit of a call stack here. drupal_page_header() calls drupal_send_headers() which calls drupal_get_http_header() to finally fetch the headers that it needs to send.

Note that in this run, it just sends a couple default headers (Expires and Cache-Control), but the interesting part is that static caches are used throughout, and anything can call drupal_add_http_header() later on down the line, which will also call drupal_send_headers(). This allows you to append or replace existing headers before they actually get sent anywhere.

7. DRUPAL_BOOTSTRAP_LANGUAGE

In this level, the drupal_language_initialize() function is called. This function only really does anything if we're talking about a multilingual site. It checks drupal_multilingual() which just returns TRUE if the list of languages is greater than 1, and false otherwise.

If it's not a multilingual site, it cuts out then.

If it is a multilingual site, then it initializes the system using language_initialize() for each of the language types that been configured, and then runs all hook_language_init() implementations.

This is a good time to note that the language system is complicated and confusing, with a web of "language types" (such as LANGUAGE_TYPE_INTERFACE and LANGUAGE_TYPE_CONTENT) and "language providers", and of course actual languages. It deserves a chapter of its own, so I'm not going to go into any more detail here.

8. DRUPAL_BOOTSTRAP_FULL

And we have landed. Now that we already have the building blocks like a database and a session and configuration, we can add All Of The Other Things. We require the common.inc file and its _drupal_bootstrap_full() function does just that.

Requires a ton of files

require_once DRUPAL_ROOT . '/' . variable_get('path_inc', 'includes/path.inc');
require_once DRUPAL_ROOT . '/includes/theme.inc';
require_once DRUPAL_ROOT . '/includes/pager.inc';
require_once DRUPAL_ROOT . '/' . variable_get('menu_inc', 'includes/menu.inc');
require_once DRUPAL_ROOT . '/includes/tablesort.inc';
require_once DRUPAL_ROOT . '/includes/file.inc';
require_once DRUPAL_ROOT . '/includes/unicode.inc';
require_once DRUPAL_ROOT . '/includes/image.inc';
require_once DRUPAL_ROOT . '/includes/form.inc';
require_once DRUPAL_ROOT . '/includes/mail.inc';
require_once DRUPAL_ROOT . '/includes/actions.inc';
require_once DRUPAL_ROOT . '/includes/ajax.inc';
require_once DRUPAL_ROOT . '/includes/token.inc';
require_once DRUPAL_ROOT . '/includes/errors.inc';

All that stuff that we haven't needed yet but may need after this, we require here, just in case. That way, we're not having to load ajax.inc on the fly if we happen to be using AJAX later, or mail.inc on the fly if we happen to be sending an email.

Load all enabled modules

The module_load_all() does exactly what you'd expect - grabs the name of every enabled module using module_list() and then runs drupal_load() on it to load it. There's also a static cache in this function so that it only runs once per request.

Registers stream wrappers

The file_get_stream_wrappers() has a lot of meat to it, but it's all details around a fairly simple task.

At a high level, it's grabbing all stream wrappers using hook_stream_wrappers(), allowing the chance to alter them using hook_stream_wrappers_alter(), and then registering (or overriding) each of them using stream_wrapper_register(), which is a plain old PHP function. It then sticks the result in a static cache so that it only runs all of this once per request.

Initializes the path

The drupal_path_initialize() function is called which just makes sure that $_GET['q'] is setup (if it's not, then it sets it to the frontpage), and then runs it through drupal_get_normal_path() to see if it's a path alias, and if so, replace it with the internal path.

This also gives modules a chance to alter the inbound URL. Before drupal_get_normal_path() returns the path, it calls all implementations of hook_url_inbound_alter() to do just that.

Sets and initializes the site theme

menu_set_custom_theme();
drupal_theme_initialize();

These two fairly innocent looking functions are NOT messing around.

The purpose of menu_set_custom_theme() is to allow modules or theme callbacks to dynamically set the theme that should be used to render the current page. To do this, it calls menu_get_custom_theme(TRUE), which is a bit scary looking, but doesn't do anything important besides that and saving the result to a static cache.

After that, the drupal_theme_initialize() comes along and goes to town.

First, it just loads all themes using list_themes(), which is where the .info file for each theme gets parsed and the lists of CSS files, JS files, regions, etc., get populated.

Secondly, it tries to find the theme to use by checking to see if the user has a custom theme set, and if not, falling back to the theme_default variable.

$theme = !empty($user->theme) && drupal_theme_access($user->theme) ? 
  $user->theme : variable_get('theme_default', 'bartik');

Then it checks to see if a different custom theme was chosen on the fly in the previous step (the menu_set_custom_theme() function), by running menu_get_custom_theme() (remember that static cache). If there was a custom theme returned, then it uses that, otherwise it keeps the default theme.

$custom_theme = menu_get_custom_theme();
$theme = !empty($custom_theme) ? $custom_theme : $theme;

Once it has firmly decided on what dang theme is going to render the dang page, it can move on to building a list of base themes or ancestor themes.

$base_theme = array();
$ancestor = $theme;
while ($ancestor && isset($themes [$ancestor]->base_theme)) {
  $ancestor = $themes [$ancestor]->base_theme;
  $base_theme [] = $themes [$ancestor];
}

It needs that list because it needs to initialize any ancestor themes along with the main theme, so that theme inheritance can work. So then it runs _drupal_theme_initialize on each of them, which adds the necessary CSS and JS, and then initializes the correct theme engine, if needed.

After that, it resets the drupal_alter cache, because themes can have alter hooks, and we wouldn't want to ignore them becuase we had already built the cache by now.

drupal_static_reset('drupal_alter');

And finally, it adds some info to JS about the theme that's being used, so that if an AJAX request comes along later, it will know to use the same theme.

$setting ['ajaxPageState'] = array(
  'theme' => $theme_key,
  'theme_token' => drupal_get_token($theme_key),
);
drupal_add_js($setting, 'setting');

A couple other miscellaneous setup tasks

  • Detects string handling method using unicode_check().
  • Undoes magic quotes using fix_gpc_magic().
  • Ensures mt_rand is reseeded for security.
  • Runs all implementations of hook_init() at the very end.

Conclusion

That's it. That's the entire bootstrap process. There are a lot of places that deserve some more depth, and we'll get there, but you should be feeling like you have a fairly good understanding of where and when things get set up while bootstrapping.

Keep in mind this is only a small part of the page load process. Most of the really heavy lifting happens after this, so keep reading!