Patent-Processor Mac OS

Posted on  by

Here is the link to the patent. Thanks to u/freddyt55555 for the heads up on this one. I am extremely excited for this tech. Here are some highlights of the patent: Processor includes one or more reprogrammable execution units which can be programmed to execute different types of customized instructions. Moving forward, I believe Apple will really start taking laptop shares. Their focus will be reaching as many customers as possibles, since they are shifting away from hardware to more services. Hence, iOS apps are able to run on Mac OS, and they are heavily pushing Apple TV+, Apple News, Apple Health and Apple Music.

1. Field of the Invention

The invention generally relates to assigning tasks for processing in a distributed system, and, in particular, to assigning tasks based on a ranking associated with available resources.

2. Description of the Related Art

Distributed computing has become increasingly popular with the maturation of network technology. Oftentimes, it is desirable to exploit the processing power of various networked machines that may otherwise be idle or under utilized. For instace, it may be desirable to use the processing power of the networked machines to compute computationally taxing tasks, such as image processing or rendering, audio processing, video processing, encrypting, decrypting, or the like. One example of a distributed computing architecture is Xgrid™ (Version 1.0) provided by Apple Computer, Inc.

In a typical disturbed computing environment, a central machine on a network divides a project into a number of tasks, which are assigned to one or more of the networked machines for processing or manipulation. The results are then returned to the central machine once the processing is complete.

There are several conventional ways of assigning tasks to volunteer machines. First, tasks may be delegated to pre-determined volunteer machines using a circular, round-robin scheme. In this round-robin approach, incoming tasks are assigned to volunteer machines on a rotating basis in the order those machines are in a list. Second, tasks may be delegated to volunteer machines based on limited information received from these machines regarding their operational capabilities (e.g., processor speed).

Both of these ways can be costly in terms of overhead, and can often produce inefficient results. A round-robin scheme is not particularly efficient for delegating tasks because of the potential mismatch between the amount of work load that is assigned to a particular volunteer machine and its processing capabilities. For example, based on a round-robin scheme, a client machine may delegate a task to a slower, less capable volunteer machine instead of another faster volunteer machine, simply because the slower machine is next in line to receive the task. Similarly, the tasks may be routinely delegated to a volunteer machine that is presently overloaded over an under-utilized volunteer machine based simply on the relative positions of the two volunteer machines in the round-robin scheme.

Like the round-robin scheme, the other scheme (where the controller selects a volunteer machine based on that's machine particular resource capability) also tends to be inefficient and inflexible. This is because the same, fixed criteria (such as speed of the processor) is used to assign tasks to volunteer machines, regardless of nature of the tasks that need to be assigned. For example, a graphics-intensive task that can be more readily processed by a particular graphics card may be assigned to a machine with a faster processor but not the desired graphics card. Similarly, other tasks to be assigned that may not necessarily be suited for volunteer machines that have been identified based on fixed criteria.

Thus, there is a need to efficiently delegate tasks in distributed compilation systems. The present invention is directed to overcoming, or at least reducing, the effects of, one or more of the deficiencies set forth above.

In one aspect of the instant invention, a method is provided for selecting a remote system suitable to process one or more tasks. The method includes transmitting a utility to a plurality of remote systems; receiving ranking values generated by the execution of the utility by each of the plurality of remote systems; and selecting a remote system from the plurality of remote systems to process the task based on the received ranking values.

In another aspect of the instant invention, an apparatus is provided for selecting a remote system suitable to process one or more tasks. The apparatus includes an interface and a control unit. The control unit is adapted to transmit a utility to a plurality of remote systems; receive ranking values generated by the execution of the utility by the plurality of remote systems; and determine one or more remote systems suitable to process a task based on the received ranking values.

In yet another aspect of the instant invention, an article comprising one or more machine-readable storage media containing instructions is provided for selecting a remote system suitable to process one or more tasks. The instructions, when executed, enable a processor to transmit a utility to a plurality of remote systems; receive ranking values generated by the execution of the utility by the plurality of remote systems; and determine one or more remote systems suitable to process a task based on the received ranking values.

In yet another aspect of the instant invention, a distributed compilation system is provided for selecting a remote system suitable to process one or more tasks. The system includes a plurality of remote systems and a controller system. The controller system is adapted to transmit a utility to the plurality of remote systems; receive ranking values generated by the execution of the utility by the plurality of remote systems; and determine one or more remote systems suitable to process a task based on the received ranking values.

The invention may be understood by reference to the following description taken in conjunction with the accompanying drawings, in which like reference numerals identify like elements, and in which:

FIG. 1 is a block diagram of a distributed compilation system, in accordance with one embodiment of the present invention;

FIG. 2 is a block diagram of a client system, a controller system, and/or remote system that may be employed in the distributed system of FIG. 1, in accordance with one embodiment of the present invention; and

FIG. 3 is an illustration of a flow diagram of a rating module executing on the controller system of FIG. 2, in accordance with one embodiment of the present invention; and

FIG. 4 is an illustration of a flow diagram of a delegating module executing on the controller system of FIG. 2, in accordance with one embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Illustrative embodiments of the invention are described below. In the interest of clarity, not all features of an actual implementation are described in this specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.

Referring to FIG. 1, a distributed system 3 includes a plurality of systems, such as a client system 5, a controller system 7, and remote systems 20, in which tasks may be assigned to one or more of the remote systems 20 by the client system 5 via the controller system 7. The types of tasks that are assigned to the remote systems 20 by the client system 5 may vary, depending on the implementation, and may include, but not be limited to, image processing or rendering tasks, audio processing tasks, video processing tasks, encrypting tasks, decrypting tasks, compilation tasks, or other computationally intensive tasks.

In the illustrated embodiment, the client system 5 provides a task requiring processing to the controller system 7, which may then split the task into one or more sub-tasks and submit them to one or more of the remote systems 20. The remote systems 20, upon executing the tasks or sub-tasks, provide the results to the controller system 7, which then provides the results to the client system 5. Although one client system 5 is illustrated in the distributive system 3 of FIG. 1, it should be appreciated that the distributive system 3 may include a plurality of client systems 5 that submit request tasks to the controller system 7 for processing.

As utilized herein, the term “client” refers to an application (or routine) executing on a system that delegates one or more tasks to other systems for completion. For ease of illustration, the system 5 is designated as the “client” in FIG. 1, although it should be appreciated that any of the remote systems 20 may also be configured as a “client” so that it is able to delegate tasks to the other remote systems 20. Thus, the roles of client and remote systems 5, 20 may vary over time in that the various systems may occasionally take on the role of client and at other times operate as a remote system. It may also be possible that, in some instances, a given system 5, 20 performs a dual role of a client system and a remote system by assigning tasks to other systems 5, 20 and, at substantially the same time, performing tasks for the other systems 5, 20.

It should be appreciated that the three-system configuration (which includes the client, controller, and remote systems 5, 7, and 20) shown in FIG. 1 is exemplary, and that in alternative embodiments, other configurations may be used without deviating from the spirit and scope of the present invention. For example, in an alternative embodiment, the functionality of these systems 5, 7, and 20 can be combined or merged with one another. For instance, in one embodiment, the client system 5 may perform the role of the client system 5 as well as the controller system 5. As such, this configuration would include a client system 5 that communicates with the remote systems 20 without a separate, intermediary controller system 7.

The client system 5, the controller system 7, and remote systems 20, in one embodiment, can be coupled to each other by a data network (not shown), which may be a public or a private network. Examples of the data network may include local area networks (LANs), wide area networks (WANs), intranets, the Internet, or the like. The data network may be a packet-switched data network, such as a data network according to the Internet Protocol (IP). A “data network” may refer to one or more channels, links, or paths, and systems or devices (such as routers) used to route data over such networks, channels, links, or paths. If desired, client system 5 and controller system 5 may, in one embodiment, may multicast data packets to the remote systems 20.

The systems 5, 7 and 20 may be any processor-based systems, such as computers in the form of desktops, laptops, mainframes, personal digital assistants, or the like. In one embodiment, the systems 5, 7, 20 may be located at various locations 23, which may be representative of different departments or centers of an organization, or, alternatively, different offices of an organization. Thus, for example, the locations 23, in one embodiment, may represent different offices/centers within a building, within one or more building complexes, within a city or country, or the like.

As described below, in accordance with one embodiment of the present invention, the controller system 7 associates ranking information with the plurality of remote systems 20, and this ranking information is then utilized to identify remote systems 20 that are suitable to process task(s) provided by the client system 5. In general, remote systems 20 are “ranked” based on a ranking utility associated with a task. The ranking utility, which may be an executable routine or a runnable script, includes a criteria (or algorithm) that determines if the remote system 20 is adequately equipped with resource(s) to perform the task provided by the client system 5. The criteria may be based on definitive criteria (such as hardware configuration of a remote system 20), more fluid criteria (such as the operational load of the remote system 20 at a given time), or a combination of both. The assigner of the task selects the criteria that are pertinent to the task at issue such that the remote systems 20 that match closest to the criteria will have a higher rank relative to those that do not. In one embodiment, the ranking values can be scaled (e.g., scaled to a range between 0 to 100, with 100 being the highest ranking, or vice-versa).

Patent-processor Mac Os 11

As noted, the generated ranking values of the various remote systems 20 can then be utilized to determine which of the remote systems 20 are suitable to assist with processing the submitted task provided by the client system 5. In one embodiment, aside from generating a ranking value, the ranking utility may also provide additional information (referred to as “metadata” herein) about the ranking value or the remote system 20. For example, in addition to the ranking value, the ranking utility may indicate variety of information about the remote system 20, such as the amount of configured memory (e.g., 12 gigabytes), which version of the relevant software is installed, the level of processor speed (e.g., 3 gigahertz), or the like. In other embodiments, the metadata can indicate if the resources of the remote system 20 exceed at threshold value, such as whether the configured memory exceeds a certain threshold, whether the amount of available hard disk space is at least a certain specified value, whether the processor speed is about a selected value, or the like. This metadata, in one embodiment, can be used to further refine which remote systems 20 are better suited than other qualified systems to perform the task to be assigned.

One or more embodiments of the present invention allow an assignor of a task (such as the client system 5, in this case) to efficiently and effectively identify and assign tasks to one or more remote stations 20. This is because the task assigner has the option to define its own criteria to identify remote systems 20 that are better equipped to process the task at hand. Moreover, because the defined criteria can be embodied in a ranking utility that can be executed by the remote machines 20, the task assignor need not know in advance the configuration of the remote systems 20; rather, this information can be obtained when the ranking utility is executed by the remote systems 20. Additionally, the use of the ranking utility also makes it possible to collect up-to-date configuration information (or the current conditions) of the remote machines 20.

In the illustrated embodiment, the client system 5 includes an application module 24 that provides one or more tasks to the controller system 7 to delegate to the qualified remote systems 20. In one embodiment, the application module 24 also provides at least one ranking utility 25 that each remote system 20 can execute to generate its ranking value. The ranking value can be used to determine whether a given remote system 20 is suitable to participate in the execution of tasks. In one embodiment, the client system 5 may include more than one ranking utility, each embodying an algorithm or criteria useful in identifying remote systems 20 that are suitable to perform tasks assigned by the client system 5.

In the illustrated embodiment, the client system 5 transmits the ranking utility 25 to the controller system 7, which in turn manages the distribution of the utility 25 to the remote systems 20. In an alternative embodiment, the client system 5 may transmit its ranking utility 25 to one or more of the remote systems 20 without an intervening controller system 7. The manner in which the ranking utility 25 is provided to the remote systems 20 is implementation specific, and thus can vary based on the designer's desires or goals. In some instances, the ranking utility 25 may be preinstalled or manually installed on the remote systems 20 and thus it may not be necessary to transmit a copy of the ranking utility 25.

As noted, the application module 24 of the client system 5 provides one or more tasks that require completion. In one embodiment, in connection with submitting task(s), the application module 24 of the client system 5 also provides an identifier to the controller system 7. The identifier specifies the particular requirements of processing the task. For example, the identifier may indicate the ranking utility that is associated with the incoming task so that the appropriate ranking values can be utilized to determine which remote stations 20 are suitable to participate in the execution of the submitted task.

In the illustrated embodiment, the controller system 7 includes a rating module 26 that determines the ranking of the various remote systems 20 based on the ranking utility 24 provided by the client system 5. The controller system 7 also includes a delegating module 27 that assigns tasks (or sub-tasks) to the remote systems 20 based on the determined ranking values of the remote systems 20.

In the illustrated embodiment of FIG. 1, the remote systems 20 include a daemon module 35, which executes on the remote systems 20, and responds to requests from the client system 5. For example, the daemon module 35 accepts the ranking utility 25 from the controller system 7, executes that ranking utility 25, and provides the results (e.g., ranking value) to the controller system 7. Although not shown, in one embodiment, the client system 5 may also include the daemon module 35.

In the illustrated embodiment, the daemon module 35 utilizes a processing module 40 executing on the remote system 20 to complete the tasks that are assigned to the remote system 20. In the context of a graphics-based task, the processing module 40 performs the appropriate calculations and provides the results to the controller system 20, which in turn can provide the results to the application module 24 of the client system 5. As an additional example, in the context of a code compilation task, the processing module 40 may, for example, compile one or more source files to produce object code files, link files with object code segments to produce executable files, perform pre-processing tasks, assemble files, or the like, and then provide the results for the client system 5.

The application module 24, rating module 26, delegating module 27, daemon module 35, and processing module 40, in the illustrated embodiment, are implemented in software. While these modules 24, 26, 27, 35, and 40 are illustrated as four distinct modules for the purposes of this discussion, it should be appreciated that some or all portions of these modules may be combined or expanded into any number of module(s). The modules 24, 26, 27, 35, and 40 in the illustrated embodiment are executable on the systems 5, 7, and 20, each of which may be, for example, a laptop computer, a desktop computer, a mainframe computer, a handheld device, or any other processor-based system capable of executing instructions. In alternative embodiments, some or all portions of one or more of these modules 24, 26, 27, 35, 40 may be implemented in hardware or firmware.

Referring now to FIG. 2, a stylized block diagram of a system 200 is illustrated, in accordance with one embodiment of the present invention. The system 200 may be implemented as the client system 5, controller system 7, and/or remote systems 20 of FIG. 1. The system 200 comprises a control unit 215, which in one embodiment may be a processor, and is capable of interfacing with a north bridge 220. The north bridge 220 provides memory management functions for a memory 225, as well as serves as a bridge to a peripheral component interconnect (PCI) bus 230. In the illustrated embodiment, the system 200 includes a south bridge 235 coupled to the PCI bus 230.

A storage unit 250 is coupled to the south bridge 235. A variety of modules, such as the application module 24, rating module 26, delegating module 27, daemon module 35, and processing module 40, may be stored in the storage unit 250 and executed by the control unit 215. Additionally, the ranking utility 25 may also be stored in the storage unit 250. Although not shown, it should be appreciated that in one embodiment an operating system, such as Windows®, Disk Operating System®, Unix®, Linux®, MAC OS®, or the like, may be stored on the storage unit 250 and executable by the control unit 215. The storage unit 250 may also include device drivers for the various hardware components of the system 200.

In the illustrated embodiment, the system 200 includes a display interface 247 that is coupled to the south bridge 235. The system 200 may display information on a display device 248 via the display interface 247. The south bridge 235 of the system 200 may include a controller (not shown) to allow a user to input information using an input device (not shown), such as a keyboard and/or a mouse.

The south bridge 235 of the system 200, in the illustrated embodiment, is coupled to a network interface 260, which may be adapted to receive, for example, a local area network card. In an alternative embodiment, the network interface 260 may be a Universal Serial Bus interface or an interface for wireless communications. The system 200 communicates with the remote system 20 coupled to a data network through the network interface 260.

It should be appreciated that the configuration of the system 200 of FIG. 2 is exemplary in nature and that, in other embodiments the system 200 may include fewer, additional, or different components without deviating from the spirit and scope of the present invention. For example, in an alternative embodiment, the system 200 may not include a north bridge 220 or a south bridge 235, or may include only one of the two bridges 220, 235, or may combine the functionality of the two bridges. As another example, in one embodiment, the system 200 may include more than one control unit 215. Similarly, other configurations may be employed consistent with the spirit and scope of the present invention.

Referring now to FIG. 3, a flow diagram of one or more acts that are performed by the rating module 26 of the controller system 7 is illustrated, in accordance with one embodiment of the present invention. In particular, FIG. 3 illustrates one embodiment of a method for identifying the remote stations 20 that are suitable to perform the task(s) submitted by the client system 5. As noted earlier, the ranking values are calculated when the remote stations 20 execute the ranking utility provided by the client system 5. It should be appreciated that, in one embodiment, the client system 5 may provide the ranking utility 25 to the controller system 7 contemporaneously with the task it needs completed, or, alternatively, provide it separately from the task. For ease of illustration, it is assumed that in FIG. 3, the client system 5 provides the ranking utility 25 in advance of the task.

In FIG. 3, the rating module 26 of the controller system 7 receives (at 310) at least one ranking utility 25 (or a copy of the ranking utility 25) from the application module 24 of the client system 5. It should be appreciated that, in one embodiment, a plurality of client systems 5 may each transmit its own ranking utility (or utilities) to the controller system 7. Thus, at any given time, the controller system 7 may be handling a plurality of ranking utilities from a plurality of sources. However, for ease of illustration, it is assumed that one client system 5 transmits the ranking utility 25 (or utilities) to the controller system 7. It should be appreciated that, in one embodiment, the client system 5 may transmit a plurality of different ranking utilities 25 to the controller system 7.

The rating module 26 of the controller system 7 stores (at 310) the received ranking utility 25. The act of storing the received ranking utilities may include storing (at 312) an authenticating value associated with each of the ranking utilities. This authenticating value may be utilized to determine if the previously-stored ranking values are still valid. For example, if the authenticating value of a newly received ranking utility matches that of a previously received ranking utility, then that is an indication that the ranking values collected based on the previously received ranking utility are still valid. As such, the rating module 26 of the controller system 7 need not collect any new ranking values and need not overwrite the previously-received ranking utility. The authenticating value, in one embodiment, may be a hash value or a checksum value for allowing comparison of newly received ranking utility to a previously stored ranking utility.

As part of storing (at 310) the received ranking utility, the rating module 26 of the controller system 7 may also store (at 316) a timestamp of the last time a client system 5 submitted a task to the controller system 7 so that any previously-submitted, older ranking utilities can be removed after some period of idleness. In one embodiment, the rating module 26 updates the timestamp associated with the ranking utility each time the ranking values associated with the ranking utility are used to identify suitable remote stations 20 for processing a received task.

The rating module 26 of the controller system 7 provides (at 320) the received ranking utility 25 (or a copy of the ranking utility 25) to the one or more available remote systems 20 for execution. In an alternative embodiment, the rating module 26, if the controller system 7 receives a plurality of different ranking utilities from the client system 5, may be provide these ranking utilities to the remote systems 20. In one embodiment, the controller system 7 provides the ranking utility 25 to the remote systems 20 as the systems become available or otherwise establish a communication link with (or bind to) the controller system 7. In an alternative embodiment, the rating module 26 of the controller system 7 may multicast a notification to the remote systems 20, which may be communicatively linked to the controller system 7 via a data network, that a ranking utility is available. In a multicasting embodiment, the controller system 7 announces to a router (not shown) that a ranking utility is available for transmission. The router in turn multicasts the announcement to the available nodes or remote systems 20 based on the remote systems 20 identified in a multicast group or distribution list. The remote systems 20, in response to receiving the notification, can retrieve the ranking utility 25. In one embodiment, the router may dynamically update the contents of its multicast group. That is, as remote systems 20 become available or inaccessible, the router updates its multicast group accordingly. In one embodiment, the multicast group or distribution list may contain destination addresses associated with each of the remote systems 20 included in the group or list. The router, in one embodiment, may substantially simultaneously indicate to the available remote systems 20 regarding the availability of task(s). In one embodiment, the router may multicast the task notification to each of the available remote systems 20 using an efficient routing path.

Upon reception of the ranking utility, each of the remote systems 20 can execute its ranking utility 25 and provide the resulting ranking value to the controller system 7. The rating module 26 of the controller system 7 receives (at 330) results from the remote systems 20 that execute the ranking utility 25. The results returned will depend on the criteria specified in the ranking utility 25. In one embodiment, the results received will include the ranking value (see block 332) from the remote systems 20. In an alternative embodiment, the results received may also include metadata (see block 334) about the ranking value or the remote systems 20. The rating module 26 of the controller system 7 stores (at 340) the results that are received. These stored results can be utilized to delegate tasks to the remote systems 20, as described below.

FIG. 4 illustrates a flow diagram of the delegating module 27 of the controller system 7 for assigning task(s) to the remote system 20, in accordance with one embodiment of the present invention. For ease of illustration, it is assumed that the controller system 7 has previously obtained the ranking values from the various remote systems 20 in the distributed system 3 of FIG. 1. One manner of obtaining the raking values is described above in connection with FIG. 3.

In FIG. 4, the delegating module 27 receives (410) information regarding at least one task requiring processing from the application module 24 of the client system 5. In one embodiment, the received information may include information about the task itself (see block 412). In one embodiment, the received information may also include an identifier (see block 414) that specifies the requirements for processing the task. For example, the identifier may indicate that the ranking values generated using a particular ranking utility should be used when determining which of the remote systems 20 are qualified to process the received task. In one embodiment, the identifier may indicate that ranking values from two or more ranking utilities should be combined in determining remote systems 20 that are suitable to process the received task.

The delegating module 27 determines (at 420) if the ranking values that are to be used are current or valid. The ranking values may not be valid for one of a variety reasons. For example, the lifetime of the ranking values may have expired such that they may not reflect current conditions of the remote systems 20. This may be particularly true for ranking values that are based on transient characteristics such as a remote system's current load or the quality of a network connection to that remote system. Another reason the ranking values may not be valid is if the ranking utility 125 that was executed to generate these values is outdated (either because a newer ranking utility has been received or because the lifetime of that ranking utility has expired). Similarly, there may be other reasons the ranking values may no longer be current or valid. In FIG. 4, if it is determined (at 420) that the ranking values are not current, the delegating module 27 updates (at 425) the ranking values. These values may be updated, for example, by requiring the remote stations 20 to execute the ranking utility 125 and provide the updated ranking values.

If it is determined (at 420) that the ranking values stored on the controller system 7 are current or valid, or if the invalid ranking values have been updated (at 425), the delegating module 26 identifies (at 430) which remote systems 20 are suitable or qualified to process the received task based on the results received from the execution of the ranking utility by the remote systems 20 (see blocks 330, 332, and 334 of FIG. 3). As shown in block 330-334 of FIG. 3, the results may include the ranking value, as well as metadata associated with that ranking value. Thus, in one embodiment, the delegating module 26 may determine that that only those remote stations 20 having a ranking value above a selected threshold level are qualified to process the task.

In another embodiment, the ranking value and the associated metadata may both be utilized to identify which of the remote systems 20 qualify to process the received task. For example, the delegating module 26 may initially use the ranking value to identify a select number of remote systems 20 that are qualified to process the received task. From this initial group of remote systems 20, the delegating module 26 may further narrow the number of qualifying remote systems 20 based on the received metadata. For instance, assuming that the metadata returned by each of the remote systems 20 related to the amount of available memory (e.g., 12 gigabytes) in that remote system 20, then only those remote systems 20 that have requisite amount of available memory would be qualified to execute the task. It should be appreciated that the ‘memory’ metadata example provided herein is illustrative only, and that, in alternative embodiments, any variety type of metadata may be employed to allow the task assignor greater flexibility in identifying suitable remote systems 20 to process the task.

Once the remote systems 20 that are suitable to perform the task have been identified (at 430), the delegating module 26 assigns (at 440) the task to at least one of the identified remote system 20. If the entire task is to be assigned to a single remote system 20, the delegating module 26 may select, for example, the remote system 20 with the highest ranking value among the qualifying remote systems 20. If the task is to be broken into several sub-tasks, the delegating module 26 may select, for example, from among those qualifying remote systems 20 that have the highest ranking values.

Once the task or sub-tasks are assigned, the responsible remote stations 20 execute the assigned task (or sub-task) and return the results to the delegating module 26 of the controller system 7. The delegating module 26, upon receiving (at 450), provides (at 460) the results to the application module 24 of the client system 5.

The foregoing description describes one or more embodiments for efficiently and effectively identifying one or more remote systems 20 in a distributed system 3 that are better suited to perform task(s) needing completion. In one illustrated embodiment, the task submitter is allowed to specify criteria in the form of a ranking utility that, when executed by each remote system 20, returns a ranking value for that remote system 20. The ranking value provides a basis to determine which of the remote systems 20 are adequately equipped to handle the task being assigned. Thus, one or more embodiments of the present invention allow the task submitter to specify prerequisite conditions for performing a task and allow the remote systems 20 to indicate by way of a ranking utility as to whether the systems meet those conditions. The use of the ranking utility also provides the task submitter a dynamic way to determine current (or up-to-date) operating conditions (e.g., available memory, network latency to a particular server or disk, etc.) of the remote systems 20 that are available to assist with processing the task.

Those skilled in the art will appreciate that the various system layers, routines, or modules illustrated in the various embodiments herein may be executable control units (such as the control unit 215 (see FIG. 2)). The control unit 215 may include a microprocessor, a microcontroller, a digital signal processor, a processor card (including one or more microprocessors or controllers), or other control or computing devices. The storage devices 250 referred to in this discussion may include one or more machine-readable storage media for storing data and instructions. The storage media may include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy, removable disks; other magnetic media including tape; and optical media such as compact disks (CDs) or digital video disks (DVDs). Instructions that make up the various software layers, routines, or modules in the various systems may be stored in respective storage devices. The instructions when executed by a respective control unit 215 causes the corresponding system to perform programmed acts.

The particular embodiments disclosed above are illustrative only, as the invention may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Furthermore, no limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope and spirit of the invention. Accordingly, the protection sought herein is as set forth in the claims below.

The present invention relates generally to call processor architectures, and in particular to methods and apparatus for high-availability call processor systems and methods.

Conventional call processor systems use special purpose, dedicated systems for implementing redundancy. For example, some conventional systems utilize two computer systems, one of which is active and the other standby, and special purpose hardware and software that interacts with each computer system to implement high-availability. The special purpose hardware and software communicates with the active computer system to capture status information so that in the event the active system goes down the standby system can start in place of the active system using the information collected by the special purpose hardware and software.

Thus, conventional high-availability architectures require special purpose hardware and software, which raises system costs. The additional costs make systems very expensive. There is, therefore, a need for a high-availability architecture that solves the problems associated with special purpose hardware and software high-availability systems.

The call processor system consistent with the invention takes advantage of commercial off the shelf (COTS) products while maintaining high availability. The call processor system is comprised of four main components: X86 Single Board Compute Platform (CP PII), System Area Network (SAN) interface, high-speed pipes (HSP's), System Utility Card (SU), and a Core to Network Interface Card (cCNI).

The compact peripheral component interconnect (cPCI) standard specifies both the packaging, format and the electrical interface for the back plane and the on card bus structure of the system. The cPCI standard also include the hotswap specification. The hot swap specification has standardized a way to insert and remove circuit packs in a live system without affecting other circuit packs that share the same bus. The specification also specifies how an operating system (OS) can be notified so as to allow dynamic loading of drivers.

By using COTS components, future upgrades to newer technologies are easy and quick, thus leveraging the time to market opportunities that can be gained by using, third party off the shelf components. The system implements a high-availability architecture based around a system area networks (SAN) technology that allows the call processor to perform better than conventional systems, and also build a system that could easily evolve. The system features include a 1+1 sparing strategy, graceful switchover that preserves all call states, ungraceful switchover in case of node failure that would preserve all existing, established calls, hard disk drive redundancy, and IP take over (the system presents a logical Ethernet connection to the OA&M connection). The system addresses all of these issues and in addition provides for querying the health of the inactive side at any time. Due to the high bandwidth of new high-speed network interface cards, the complete system memory image can be transferred to the inactive system on command within one second; this type of performance allows the active processor to run asynchronously to the standby system, allowing significant performance gains to be achieved on the active side. When a graceful switchover is required, the complete system state of the active processor can be transferred to the standby processor allowing the standby processor to continue as if it had been in control all the time. This strategy works well for hardware failure.

FIG. 1 is a logical hardware structure.

FIG. 2 is a block diagram of a call processor system.

Patent-processor

FIG. 3 is a block diagram of a call processor system.

FIG. 4 is a top level view of the software used in a call processor system.

FIG. 5 is a state diagram showing the states of a logical call server.

FIG. 6 is a state diagram illustrating the state transition of a voter.

FIG. 7 illustrates disk redundancy.

FIG. 8 is a state diagram illustrating the redundancy states of a call processor.

FIG. 9 shows the states that two nodes move through when going from active to inactive or inactive to active.

FIG. 1 illustrates the logical hardware structure of various components that make up the system consistent with the present invention. At the top level compact Peripheral Component Interconnect (cPCI) back plane is the master to all the individual components.

FIG. 2 is a block diagram showing a call processor system 44 consistent with the present invention. Call processor system 44 is comprised of three primary elements: a first compact Peripheral Component Interconnect shelf (cPCI) 10, a second cPCI shelf 12, and network 14. First cPCI shelf 10 and second cPCI shelf 12 in one embodiment are redundant systems. Network 14 processes calls from plural phones 16-18.

In a redundant system, one side is in active mode running call processing, and the other side is in standby mode ready to take over. This is called a redundancy approach. Both the active side and the standby side are running, in parallel, however, the active side is running call processing applications while the inactive side is running only diagnostics and management tasks. Since the status of the inactive side is always known, the system can make better decisions on whether or not to switchover. Redundancy control in a call processor consistent with the present invention is software-based. As a result, all the redundancy related functions are implemented by software. This has several advantages, such as loosening the coupling between the two sides, thus reducing adverse interference between them. Certain features of the redundancy control may still be implemented in hardware.

First cPCI shelf 10 includes a system utility 20, a compact Core to Network Interface (cCNI) 22, a compute platform (CP PII) 24 and a high-speed pipe interface (HSP) 26. cCNI 22 provides a TDM interface into network 14. System utility card 20 interfaces to various system elements, such as the system power monitor (XSM) security device 28. First cPCI shelf 10 is connected to second cPCI shelf 12 via Ethernet 40. XSM 28 is connected to the corresponding XSM of the redundant system. Intergroup switch (IGS) 30 is connected to network 14.

Network 14 is comprised of several groups of elements, each group comprising three-port extender (3PE) 32, XNET 34, XPEC 36, and line card (LC's) 38.

FIG. 3 is a block diagram illustrating call processor system 44 of FIG. 2 in greater detail. Many of the elements of FIG. 3 are similar to those shown in FIG. 2, and have therefore already been discussed. Call processor system 44 includes cPCI CP board 120, which is connected to backplane PCI bus 1104 via S. bridge 88. cPCI CP board 120 is comprised of CPU module 62 connected to PCI bus 074. PCI bus 074 is connected to ISA bus 102 via P2P bridge 76. CPU module 62 includes CPU 64, L2 cache 66, and DRAM 60. DRAM 60 is connected directly to PCI bus 074, and to N. bridge 70 via bus 72.

PCI bus 074 is connected to other buses in the system, and certain devices via PCI interface cards. PCI bus 074 is connected to ISA bus 102 via P2P bridge 76, and to slot 0 on PCI bus 1104 via S. bridge 88. PCI bus 074 is also connected to P2P bridge 80 via PCI interface 78, to Ethernet interface 84 via PCI card 82, and directly to SCSI interface 86.

ISA bus 102 is connected to several devices, including IDE 90, TOD 92, flash 94, WD timers 96, keyboard/mouse 98, serial interface 100. PCI bus 1104 has several slots and serves as a backplane. In this embodiment, each slot is occupied by a device connected to a PCI interface card. The devices include cCNI 106, cCNI 108, cCNI 110, cCNI 112, HSP GNIC 116, and system utility (SU) main 118. cCNI 1112 is further connected to three-port extender (3PE) 126, XNET 128, XPEC 130, and LC 132 (corresponding to 3PE 32, XNET 34, XPEC 36, and LC's 38 of FIG. 2). Each of cCNI's 106, 108, and 110 can be further connected to a configuration of devices similar to 3PE 126, XNET 128, XPEC 130, and line card (LC) 132. SU main 118 is connected to SU transition card 124. Slot 3 is a spare slot 114. XNET 128 is the TDM to messaging interface. XPEC 130 converts the messages down to LC 312. HSP GNIC 116 connects to the other system, as shown in FIG. 2.

FIG. 4 illustrates a top level view of the software used in a call processor system consistent with the present invention. In one embodiment, each layer only uses the interface provided by the layer below or above it. Software module structure 260 includes applications 262, application adapters 264, operating system (OS) extensions 266, real-time OS kernel 268, TCP/IP stack 312, lightweight protocol stack (LWPS) 314, hardware abstraction layer 274, and board support package (BSP) 276. Real-time OS kernel 268 may, for example, be VxWorks 5.3 from Wind River Systems.

Application adaptors layer 264 provides necessary interface to applications. Applications layer 262 contains new and legacy applications.

Hardware abstraction layer 274 implements hardware-specific drivers. To achieve better fault handling and recovery capability, all the hardware devices are managed by HI. This requires that all the drivers comply with HI driver template. Hardware abstraction layer 274 includes CNI 316, disk 318, Ether 320, HSP 320, serial 324, timer 326, system utility 328, time of day (TOD) 330, and hardware (H/W) watchdog 332.

BSP 276 is responsible for controlling start-up sequence, which includes hardware initialization, board configuration and vote. BSP 276 includes start up/vote 334 and BSP module 336. Start up/vote 334 ensures that at startup one of the sides becomes active and takes ownership of the system. Logical call server (LCS) 286 checks the health of the system, and reports this information to start up/vote 334. A voter is used for deciding which CPU becomes active when two sides boot up at same time. Each side checks its own health and sends a health message to the other side. The voting result is based on the health measure of its side. The side with better health will become the active side. In the case of two sides having the same health, two random numbers that are generated one on each side will be compared. The side that has the greater random number will become the active side. Side 0 may be chosen as the default active side in the case of same health. The voting message contains health, redundant state and random number (if needed). If HSP is down, voter 334 does voting through ethernet channel. Since in the pre-vote state, which side is active has not been decided yet, voting is a symmetric process between two sides. So they have the same state diagram. FIG. 6 is a state diagram illustrating the state transition of voter 334. Voter 334 moves between idle, HSPWaitRep and ether WaitRep states when transferring voting information in the system.

Exception and interrupt handling mechanism of the Real-time OS Kernel 268 provides better fault tolerance. Exception handler 310 looks for any condition that will cause an exception, and initiates corrective measures.

Communication stacks 270, 272, which include TCP/IP stack 312 and LWPS 314 respectively, are used for communicating with LAN and as the backup pipe between CPUs. HSP is used for redundancy control. Traditional IP stacks are implemented as a device driver and use kernel resources for its operation. A kernel-independent light-weight stack is useful for implementing the stop and copy operation. In addition, a light-weight stack significantly reduces performance overhead.

OS extensions 266 include logical call server ((LCS) 286, switchover manager (SWOMgr) 288, disk redundancy 290, protected data manager (PdataMgr) 292, IP management 300, communication manager (CommMgr) 302, and heart beat 304.

Communication Manager 302 provides messaging service for local task communication and inter-CPU communications. When any task wants to talk with another entity, such as another task or an entity outside the system, communication manager 302 coordinates the communication between the task and the entity.

LCS 286 manages the call server's redundant state and coordinates redundancy-related operations, such as, switchover, split, join, and IP address management. Redundant CPU's provides fault-tolerant capability. Therefore, a redundant call server with two physical CPU's is basically a logical unit for providing computing service for applications. This requires that there be some software components that manage two CPU's as whole. LCS 286 is designed to (1) maintain a redundant system's state and control state transitions, (2) accept all the redundant related management commands, such as manual switchover, split and join, (3) start system initiated switchover, both graceful and ungraceful, and (4) control inter-CPU tasks, voting, heartbeat and protected data synchronization and protected data updating. This approach has several advantages, including (1) centralizing redundancy control in order to eliminate the possibility of inconsistent software states among different subsystems, and (2) isolating redundancy-specific software such that other modules can be redundancy-transparent.

FIG. 5 is a state diagram showing the states of LCS 286. Note that broken lines indicate manual initiated and solid lines indicate system initiated actions. LCS 286 starts the initiation by moving to a vote state. If no vote occurs, meaning that there is a catastrophic problem such as one of the systems being down, the state moves to R/A, X, meaning a single side is up. If a vote is performed, the state moves to a switch state, and the system will move either to system active s(A) or system inactive s(I). If the system is active LCS 286 moves to S/A, as indicated in state S/A, S/I. A cutbover of the inactive (cutover I) causes the state to move to S/I, S/A (and back if cutover A occurs). If a join is performed, then LCS 286 moves to state synch in block R/A, R/I. State synch means that data will be sent to the other side.

From synch on the block R/A, R/I, LCS 286 can cause the system to move to single state R/A, X. The system may also move from synch to update upon synch A being done. Update is the normal state of the system, and updates are periodically being sent to the other side.

From update, the system may move to state HSP down if it is determined that the high speed pipe is down. From HSP down, the system may move back to synch if the high speed pipe comes back up.

The system may also move from update in block R/A, R/I to update in block R/I, R/A if there is a graceful switchover active (g-swoA). For example, there might be a graceful switchover to the other system for routine maintenance. The system may move back to the update in block R/I, R/A if there is a graceful switchover inactive (g-swoI).

From block R/I, R/A, there may be an ungraceful switchover (ug-swoI), which moves the system to the synch state in block R/A, R/I.

From switch state, LCS 286 may also move the system along s(I) to S/I in state S/I, S/A. When a join inactive (joinI) is performed, the system moves to ,synch in block R/I, R/A. The movements among states on block RI/, R/A are similar to those on the block R/A, R/I.

From synch in block R/I, R/A, if a split inactive (splitI) is performed the system moves to S/A in state S/I, S/A.

Heart beat 304 periodically monitors the other side's status and reports any status change to LCS. Heartbeat manager 304 is responsible for monitoring the status of the other node. It sends a message via HSP to exchange status between the two sides. Heartbeat manager 304 is designed to provide several functionalities. The heartbeat manager 304 on the inactive side sends a heartbeat request to the heartbeat manager 304 on the active side. The heartbeat manager 304 on the active side sends back a heartbeat reply to the heartbeat manager 304 on the inactive side. The heartbeat messages, both request and reply, contain health information. Heartbeat manager 304 will inform any health change of other side to LCS. LCS controls heartbeat manager 304 by sending start or stop message to HB manager. To ensure that a failure in the high speed pipe does not cause an unnecessary ungraceful switch-over, dual communication channels are used for heartbeat messages.

In one embodiment, two pipes are used for vote and heartbeat messaging: High Speed Pipe and Ethernet. Two pipes have several advantages, including: (1) eliminating single point of failure to increase system reliability, and (2) avoiding unnecessary ungraceful switchover when the HSP pipe fails.

SWOMgr 288 controls and coordinates system redundancy related activities, namely, graceful switchover (G-SWO), ungraceful switchover (UNG-SWO), Split, Join, and Cutover. Upon detection of a faulty condition in the active system, the present invention implements one of two types of switchovers, namely, graceful and ungraceful. A graceful switchover occurs when the contents of the DRAM in the backup system has the same contents as the DRAM in the active system, and the state of CPU 64 of the active system is also saved in the other redundant system. When the active system determines a fault condition in a graceful switchover, the system is briefly halted. During the halt all of the memory in the active system, including the memory contents of CP PII 24, are transferred to the inactive system over HSP 26. This allows switchover from the active to the inactive system within one or two instructions. All calls are maintained in a graceful switchover, even those that are in transition.

An ungraceful switchover occurs when a more catastrophic event occurs, such as when CP PII 24 dies completely. This requires the active system to switchover to the inactive system by copying only a section of DRAM 60 over to the other system. Calls that are in progress are maintained, but calls that are in transition are lost.

LCS 286 is master of SWO manager 288. When LCS 286 decides that a switchover should occur, a message is sent to SWO manager 288 requesting that a switchover be effectuated. The switchover manager 288 on each side communicates information regarding the switchover to the other switchover manager 288 to coordinate the switchover, bringing down one side and bringing up the other side. SWOMgr 288 implements low level functions for redundancy related operations, e.g., save and restore registers, stop and copy during, a graceful switchover. Because the system has both graceful switchover and ungraceful switchover capability, Stop and Copy with PData Replication copies the memory region of main OS from the active side to the inactive side at G-SWO time and has PData replicated all the time so that it is ready for an UNG-SWO anytime.

PDataMgr 292 is responsible for mirroring protected data from the active side to the inactive side. PDataMgr 292 makes sure that information is sent over the high-speed pipe for replication on the other side.

Disk redundancy 290 ensures that important files are replicated on two disks. Disk redundancy 290 is turned on by syncing the two disks. It is turned off by stopping the mirroring mechanism. LCS 286 is responsible for sending messages to disk redundancy 290 to turn on/off. LCS uses disk redundancy 290 when it wants it to sync, stop sync, or stop updating. The primary interface to disk redundancy 290 is through the I/O system. Normal reads and writes are intercepted on their way down below the I/O system. Disk redundancy is transparent to the user application.

During disk synchronization all updates (writes and iocti calls that modify the disk) on the protected and unprotected partitions will be mirrored to the slave CP when the system is in redundant mode.

Redundancy state is stored in LCS 286, and can be read by disk redundancy 290. With the state known, it can decide whether or not to propagate all update I/O across the high-speed pipe. FIG. 7 illustrates disk redundancy. From active node 140, when an application does an update I/O, the request is sent down through the layers (i.e., IOS 146, Part. Select. 148, DOS 150) until it reaches the redundancy agent 152. Redundancy agent 152 sends the request down to the block device driver 156 as normal, but also sends a command to the high-speed pipe (shown in broken line) to instruct inactive node 144 to do the update I/O as well. When inactive node 144 receives the command, it writes it to the redundant partitions. Inactive node 144 has its own I/O as well. Certain applications run on the inactive node and do all I/O to the independent partition. Partition manager 154 is the layer that routes incoming I/O to the appropriate physical partition (e.g., Protected Partition 180, Unprotected Partition 182, or Independent Partition 184).

In split mode, the system behaves a little differently. In this scenario, there are two active nodes, each totally independent. When the nodes are in split mode, they both only use their individual protected and unprotected partitions. The independent partition is not used. The redundancy agent does not propagate update I/O because the system is in split mode.

A multiple partitioning scheme is implemented that provides three separate partitions on the same hard disk. It can set up the disk into standard MS-DOS format partitions. In the CPP model, three partitions are setup according to the following table:

SidePartition requestedPartition used
Master/p/C
Master/u/D
Slave/p/C
Slave/u/E

The partitioning scheme also allows an application to retrieve the partition information from the disk. This includes the number of partitions, the sizes of each, where they begin/end, and the type of file system located on each. The users of disk redundancy only include other software modules.

Initialization of disk redundancy 290 is typically performed before disk I/O is needed, and initialized after communication manager 302 because it is dependent on the high speed pipe. In one embodiment, it is initialized before the communication manager is initialized—it will lie dormant until a disk sync message is received.

The disk redundancy system consistent with the principles of the present invention has several advantages, including: providing high data availability with low performance overhead, keeping the cost low by providing a design that does not depend on customized hardware, and minimizing modifications to the existing software by providing application-transparent disk redundancy software.

In one embodiment, disk redundancy is built on an IDE controller which comes with the CP card. As a result, no additional cost is required and no change is necessary to the existing hardware.

Data mirroring between the two disks is achieved via messaging over a high speed pipe. This same high speed pipe is used for protected memory mirroring, heartbeat, and switchover. There are several advantages of this approach, including: (1) two CPU nodes can be physically located at much farther distances, and (2) combined with the loosely-coupled CPU redundancy strategy, the redundant system will become a message-based system, which is advantageous for moving toward distributed computing.

Mac Os Processor

In one embodiment, disk redundancy software is built on top of the disk device driver and under the DOS file system. This approach provides simplicity. The software is self-contained, independent of the file system, and there is no need to manage file system states during switchover, split and join operations.

Disk redundancy software in each side of the redundant system reports faults to the fault handling module of its own side. The disk redundancy software itself does not exchange fault information between the two sides. One advantage of this approach is that general fault handling mechanism should be the same for both simplex and duplex systems, and allow the logical call server to handle redundancy specific decisions. This also has the advantage that the logical call server module is the only module that has the information of both sides.

To effectively handle the differences in traffic requirements between disk redundancy, stop and copy, and the heartbeat/checkpointing facilities, in one embodiment three separate channels are implemented across the high speed pipe.

IP management 300 is responsible for managing the IP address for the call server. The main purpose of having an IP network in the switch is to provide a connection to an external management system such as craft access or other external applications. The IP management module 300 has several features. IP management 300 provides a virtual IP interface. IP management 300 manages IP addresses so that the call processor system can be addressed using a single IP address, even though each redundant side has its own IP address. IP management 300 in each side communicate information regarding IP addresses so an external entity, such as a craft access, can use a single IP address, and IP management ensures that the information is appropriately routed to the side that is active at the time.

To facilitate communication with external systems running on a LAN, the present invention is equipped with two Ethernet ports, which are connected to the LAN. Thus, certain IP management functionalities are provided. Since both nodes are running, two IP addresses allow external systems to talk to both nodes even though the system is addressed using a single IP address. An IP address is not assigned to a physical node (node 0 or node 1). External devices communicate with the call processor using a single IP address. The internal IP address for addressing a node is determined by whether a node is active or inactive. Active-IP-address is used by the active side and the inactive-IP-address is used by inactive side. This approach enables external systems to maintain connection to the active node during a switchover.

Gratuitous address resolution protocol (ARP) handles run-time IP address change. When a host's IP address is changed, it broadcasts its IP address and its MAC address to the network. Other nodes on the network can continue to communicate with the host. In the present invention, gratuitous ARP is used during a graceful switchover. Graceful switchover copies the entire memory image from the old active side to the new active side. As a result, the new active side gets the active-IP-address but it still has the existing MAC address. To inform this change the new active side sends a gratuitous ARP as one of its post-switchover actions. The gratuitous approach is useful in a multi-node architecture. During the post-switchover, the new inactive side sets its IP address to the inactive P-address by calling a function, such as VxWork's if AddrSet( ), which does gratuitous ARP.

The new active side goes through a warm restart to set its IP address to active-IP-address. DRAM 60 is divided into three areas: protected, unprotected, and persistent. In conventional systems, a restart erases the contents of DRAM 60. In the system shown in FIG. 3, however, part of DRAM is preserved upon restart. This allows a warm restart of the system. During normal operation, when a transaction occurs that changes DRAM 60, the same data is also transmitted over PCI bus 0, S. bridge 88, PCI bus 1104, and over HSP GNIC 116 to the corresponding DRAM in the other redundant system.

FIG. 8 is a state diagram illustrating the redundancy states that call processor systems consistent with the invention go through. In a redundant state 210, there are two states, true redundant 214 and single redundant ready 216. The call processor system moves between true redundant 214 and single redundant ready 216 when one side is taken down or brought up.

In true redundant state 214, switchover causes a state change between redundant/active and redundant/inactive (R/A, R/I) 222 on one side and, correspondingly, redundant/inactive, redundant/active (R/I, R/A) 224 on the other side. Thus, when one side is in R/A, the other side is in R/I, and vice versa. In single redundant ready state 216, the sides are in R/A, down 226 or down, R/A 228.

Split mode is used when it is desirable to break redundancy in a controlled way. Split is used when one side needs to be brought down, such as for maintenance. In split state 212, the call processor system moves between true split 218 and single redundant not ready 220 when one side is taken down or brought up. In true split state, cut over causes one side to move between split/active and split/inactive (S/A, S/I) 230 and the other side to move, correspondingly, between split/inactive and split/active (S/I, S/A) 232.

In single redundant not ready state 220, the call processor system exhibits S/A, down 234, down, S/I 236, S/I, down 238, and down, S/A 240.

FIG. 9 shows the states two nodes move through when going from active to inactive or from inactive to active. The states represent the states that are moved through when LCS 286 requires a switchover using a graceful switchover for minimum impact. When node 0 goes from active to inactive, and node 1 goes from inactive to active, node 0 moves through steps 1-5 in the left block of the figure, and node 1 moves through steps 5-10 of the right block.

Node 0: (1) take semaphores for Pdata & file, (2) shutdown CNI, (3) intLock, (4) save processor state, and (5) copy. Node 1 then (5) receives, (6) restore processor state, (7) int Unlock, (8) set Active_IP, (9) enable CNI (H/W), and (10) give semaphores. Thus, node 1 has basically reversed back out of the states node 0 went through.

Node 1: (1) suspend H/B task, (2) close files, (3) close interface, (4) intLock, (5) copy, (6) intUnlock, (7) kill reg tasks, (8) LCS: change to inactive, (9) start OS testing, and (10) set Inactive_IP.

For node 0, step 1 involves taking a semaphore for Pdata and file access, so they are locked. This prevents the other tasks other than LCS 286 from performing these functions until the semaphore is released. Step 2 involves shutting down the core to network interface, shutting down the interfaces into the TDM network. In step 3 the interrupts are locked, effectively stopping the OS from running. In this state, below the horizontal line in the left block of the figure, the OS is disabled. The high speed pipe, though, is able to transfer the necessary data to the other node without assistance from the OS. In step 4, the context of the processor is saved. The processor in node 0 has an instruction that saves all registers, stacks and other processor memory content to a particular memory area. The high speed pipe then transfers this data in step 5.

Continuing with the step 6 in the right block, node 1 restores the processor state, unlocks interrupt, sets IP active, enables CNI, and gives the semaphores. Giving semaphores allows the other tasks in node 1 to begin operating. Thus, node 1 has become active in the identical state that node 0 was in when node 0 went inactive.

For node 0, in step 6, the interrupts are unlocked, the regular tasks are killed in step 7, LCS is changed to inactive, OS testing is started, and the IP is set to inactive.

For node 1, going into the active from inactive, step 1 requires suspending heartbeat task in step 1, files are closed in step 2, the interface is closed in step 3, interrupts are locked in step 4, and then the processor is copied.