Skip to content

Async What is that and how to help fixing it?

Alexander Grund edited this page Nov 19, 2017 · 9 revisions

WORK IN PROGRESS

Asyncs

What is that?

In order to understand what an Async is and why it poses a problem for the game you have to understand how the game works: Each player runs the simulation of the game at their local computer and makes inputs like building a road or attack another player. If you play alone, everything is fine, but if you play with other players their computers also have to know that, for example, you build a road on your computer. As it would be too much data to transfer if the whole state of the game would be constantly transmitted, only the given commands are exchanged between each client. Everything else can be computed from these inputs. This only works if each client obtains the same result for the same inputs. Due to programming errors this might not be the case. This means that two players see different results of the same action. For example, you enable production for a building and due to a race condition on player As computer the donkey picks up the ware and for player Bs computer the carrier picks it up. In the long run this will lead to different behaviour on both PCs e.g. if the carrier is then required elsewhere and on 1 PC he is available and on the other he is not. Think of the wings of the butterfly that cause a storm on the other end of the earth.

How to help fixing it?

If you encounter an async state in the game you will be immediately notified and the game is stopped. Each player will now have a savegame async_<timestamp>.sav and a log async_<timestamp>Player.log in his SAVE and LOGS folders. Additionally the server will have an async_<timestamp>Server.log.
What you can do now

  • [Optionally] Try to reproduce with minimal (better: no) interaction required, e.g. by starting from an earlier save and redoing the same actions or not doing anything at all to see if this changes anything. Any bit of information helps us to fix this faster!
  • Gather async logs, savegame and replay at least from server, better from all players (especially the replays!)
  • Report with as much information as possible (version, OS, actions, ...)

How to fix it myself?

  • look at async log, find obj id
  • add code to observe behaviour of object id
  • load game, reproduce async add more code to backtrack source

Example of finding cause for Async #6

The following gives an impression of how to find an async. Of course there are many ways and each case may need some special steps, but this might help as a starting point. It general it may be easier/more convenient to use a debugger like gdb or the Visual Studio debugger.

Reproduce situation

I started two instances of the game running on Windows 7 x64, hosted the savegame and joined with the other client. As it was claimed that the game went async after 30-90 seconds I waited a couple of minutes, but nothing happened. Next I tried to recreate the situation better by using a Linux (inside a VM) and a Windows client. Without interaction the game went async several thousand gameframes after loading the savegame. Reproducible - that is important for the following steps.

Using the async log

For the next step I looked at the created async logs. That log contains calls and results to the pseudo random number generator used by the game. Each clients seeds the pseudo random number generator with the same seed, so it should always return the same 'random' number for each client. All differences are programming bugs. The log begins with the last call that was the identical for two clients.
All the log files I created by loading the savegame several times showed the same lines in the beginning, with the same IDs:

[I]: 14807:R(6)=0,z=1977 | /home/user/s25client/src/noAnimal.cpp Z: 270|id=1193327
[S]: 14808:R(2)=0,z=4992 | /home/user/s25client/src/RoadSegment.cpp Z: 323|id=765424
[C]: 14808:R(2)=0,z=4992 | ..\..\src\RoadSegment.cpp Z: 323|id=765938

The line starting with [I] represents the last identical call to the pseudo random number generator. The second and third line are showing the difference between two clients: The host [S] called the random generator from line 323 of file RoadSegment.cpp with an object with the object ID 765424 while the client [C] called the same method, but for a different object ID (765938). In short: At some point in the game the host used a road A, while the client used road B.

Tracking the Problem through the Code

Start: The Road Segment

To further track down the issue, I looked at the RoadSegment.cpp. Line 323 lies within in the RoadSegment::AddWareJob method, which is called when a ware arrives at a flag and needs to be transported by a carrier. The random call in this method is used to decide whether a donkey or the human carrier should be used to carry the ware (Yeah, you just could prefer one of them without pseudo random, but that's not the issue here). But why are there two different road segments used?

I added some debug code to output the location of the road segments:

if (this->obj_id == 765424 || this->obj_id == 765938)  
{   
	std::cout << "ID " << this->obj_id << ": " << std::endl;   
	std::cout << "Pos F1 " << this->f1->GetX() << "/" << this->f1->GetY() << std::endl;   
	std::cout << "Pos F2 " << this->f2->GetX() << "/" << this->f2->GetY() << std::endl; 
}

It turns out that the two road segments are in fact next to each other. In the next run I observed the two roadsegments in both clients and was able to see the problem visually: In one client a bread took the one road and in the other client the other road. Why?

Next: The Flag and the Bread

With the location known from the previous output I added another debug output, this time in noFlag.cpp to the noFlag::AddWare method. This method is called when a carrier (or anyone else) drops a ware at a flag.

if (x == 60 && y == 117)
{
	if (ware->type == GD_BREAD)
	{
		std::cout << "bread id: " << ware->GetObjId() << std::endl;
	}
}

In the next run I was able to see the object ID of the 'evil' bread (actually I saw two IDs, so I tracked both). Probably the target location of the bread was already determined after the creation of the bread. With that in mind I added another output to the constructor of a ware in Ware.cpp to find out when & where the bread was created:

if (obj_id == 1615239 || obj_id == 1619488)
{
	std::cout << "bread ID " << this->obj_id << ": " << std::endl;
	std::cout << "created at " << location->GetX() << "/" << location->GetY() << std::endl;
}

Now I knew where the bread was created: At a not-too-far-away bakery. Next I searched the code for the spot where the destination of a procuded bread is determined. I found the method GameClientPlayer::FindClientForWare in which for a given ware the best goal is searched based on the ingame distribution settings, distance and current requirements of each building.

Final Approach

With two more runs and some debug output I realized that the same bread was sent to different mines in the two clients. I added more debug output to see the actual 'need' of each building that requests the two breads and learned that both mines had the same need and the same distance to the bakery. Huh, I wondered how the algorithm decides which consumer to prefer in that kind of situation...

bool operator<(const ClientForWare b) const
{
	if (estimate == b.estimate)
	{
		return(points > b.points);
	}
	return(estimate > b.estimate);
}

This is bad. If estimate (distance) and points (need for ware) are the same for two buildings the outcome is decided by the order of calls. And this order can be different for different implementations of a sorting algorithm. So I added a tiebreaker based on the location of the buildings, which can never be located at the same absolute position.

bool operator<(const ClientForWare b) const
{
	if (estimate != b.estimate)
        {
		return(estimate > b.estimate);
        }
	else
	{
		if (points != b.points)
		{
			return(points > b.points);
		}
		else
		{
			if (bb->GetX() != b.bb->GetX())
				return bb->GetX() < b.bb->GetX();
			else
				return bb->GetY() < b.bb->GetY();
		}
	}
}

Problem solved. :)