Frege’s Concept Horse Paradox in the Simply-Typed λ-calculus

In “On Function and Concept,” Gottlob Frege makes a distinction between concepts and objects. Specifically, he claims that objects are saturated object-expressions while concepts are unsaturated function-expressions. My goal is to illuminate what the object-concept distinction entails in the context of simply-typed lambda calculus $(\lambda^{\rightarrow})$ and whether or not this entailment is susceptible to problems like the $\mathtt{\text{concept horse paradox}}$.

Introduction

Frege’s “On Function and Concept” (1892) claims that there is a substantive difference between concepts and objects. This distinction is described on pg. 147:

Statements in general, just like equations or inequalities or expressions in Analysis, can be imagined to be split up into two parts; one complete in itself, and the other in need of supplementation, or ‘unsaturated’. Thus, e.g., we split up the sentence ‘Caesar conquered Gaul’ into ‘Caesar’ and ‘conquered Gaul’.

Definition 1. Complete parts of statements are ‘saturated’ while incomplete ones are ‘unsaturated’.

Example 1. Consider the following statement:

$$\text{‘Fido jumped over the fence’}$$

By Definition 1, we can see that (1) is split up into the saturated ‘Fido’ object-expression and ‘jumped over the fence’ unsaturated function-expression.

Remark 1. These two categories, as it turns out, have simple analogues in $\lambda^{\rightarrow}$. Unsaturated function-expressions are terms with functional types, e.g. $N^{\sigma\rightarrow\tau}$ or $N:\sigma\rightarrow\tau$, while saturated object-expressions are terms with non-functional types, e.g. $M^{\sigma}$ or $M:\sigma$.

Definition 2. We say that if $M$ gets type $\sigma$ and $N$ gets type $\sigma\rightarrow\tau$ then the application $NM$ is legal (as $N$ is considered a function from terms of type $\sigma$ to terms of type $\tau$) and gets type $\tau$.

Definition 3. The set of types of $\lambda^{\rightarrow}$, denoted by Type($\lambda^{\rightarrow}$) is inductively defined as follows. We write $\mathbb{T}=\text{Type}(\lambda^{\rightarrow})$ where

                $\begin{array}{ll}\alpha,\alpha^{\prime},\alpha^{\prime\prime},\ldots\in\mathbb{T} & \text{(type variables);}\\\sigma,\tau\in\mathbb{T}\Rightarrow(\sigma\rightarrow\tau)\in\mathbb{T} &\text{(function space variables).}\end{array}$

Definition 4. We will now define a new variant of $\mathbb{T}$ that more closely models Frege’s natural language notions of concept and object. This new variant, $\mathbb{T_{\mathcal{F}}}$, has the following restrictions:

  1. There are three and only three types: $\begin{cases}\alpha\in\mathbb{T_{\mathcal{F}}} & \text{(}objects\text{);}\\\mathbf{\text{H}}\in\mathbb{T_{\mathcal{F}}} & \text{(truth-values);}\\(\alpha\rightarrow\mathbf{\text{H}})\in\mathbb{T_{\mathcal{F}}} & \text{(}concepts\text{).}\end{cases}$
  2. Objects have an unrestricted domain.
  3. A truth-value is a boolean type where $\mathbf{\text{H}}=\{\text{‘true’},\text{‘false’}\}$.
  4. A concept is a functional type that takes an object as an input and returns a truth value. This mapping is done by some valuation function $v$ where $v$ maps to $\text{‘true’}$ if the function-expression applied to the argument(s) is true, e.g., in (1), if Fido did jump over a fence then $v(\text{‘Fido’})=\text{‘true’}$; otherwise, $v$ maps to $\text{‘false’}$.
  5. Note that $(\mathbf{\text{H}}\rightarrow\alpha)\notin\mathbb{T_{\mathcal{F}}}$.

Remark 2. Until otherwise noted, we will work only with types in $\mathbb{T_{\mathcal{F}}}$.

Definition 5. Let $M$ be an untyped $\lambda^{\rightarrow}$-term. Given the non-functional type $\alpha$, we say that $M:\alpha$ is a saturated object-expression, or simply an object.

Definition 6. Let $N$ be an untyped $\lambda^{\rightarrow}$-term. Given the functional type $\alpha\rightarrow\mathbf{\text{H}}$, we say that $N:\alpha\rightarrow\mathbf{\text{H}}$ is an unsaturated function-expression, or simply a concept.

Definition 7. We will now make a semantic distinction between two kinds of uses of the token ‘is’ in natural language. To illustrate the distinction, consider:

  1. ‘Mark Twain is Samuel Clemens’: here, we have the ‘is’ of identity, or $x=x$. The equality operator $=$ is used as a mathematical formalism and may not necessarily be computable in $\lambda^{\rightarrow}$. So, $v(\text{‘Mark Twain’})=\text{‘true’}$ iff $\text{‘Mark Twain’}=\text{‘Samuel Clemens’}$.
  2. ‘Mark Twain is dead’: here, we have the ‘is’ of predication which, if combined with some property $P$ will yield a functional type of form $\alpha\rightarrow\mathbf{\text{H}}$ for some object $\alpha$. As in Definition 4, if $\alpha$ has property $P$, $v(\alpha)=\text{‘true’}$; otherwise, $v$ maps to $\text{‘false’}$.

Simple Natural Language Sentences in $\lambda^{\rightarrow}$

Lemma 1. Consider the same sentence from Example 1:

$$\text{‘Fido jumped over the fence’}$$

This sentence can be formulated in $\lambda^{\rightarrow}$ like so:

$$\Gamma\vdash J:\alpha\rightarrow\mathbf{\text{H}},\,\Gamma\vdash F:\alpha\Rightarrow\Gamma\vdash(JF):\mathbf{\text{H}}$$

Proof. We will set the object-expression ‘Fido’ $=F$ and the (one-place) function-expression ‘$\left(x\right)$ jumped over the fence’ $=J\left(x\right)$. Note that we will drop the $\left(x\right)$ in practice for $\lambda$-calculus wff satisfiability. By Definition 5, $F$ must be of type $\alpha$. And to make $(JF)$ legal, by Definition 2, $J$ must be of type $\alpha\rightarrow\mathbf{\text{H}}$. Therefore, a correctly typed $(JF)$ from $J:\alpha\rightarrow\mathbf{\text{H}}$ and $F:\alpha$ looks like
$$\left(JF\right):\mathbf{\text{H}}$$
which is precisely what we wanted to show in (3). $\square$

Proof. We can also derive a more formally-rigorous proof. Consider:
$$\dfrac{\Gamma\vdash J:\alpha\rightarrow\mathbf{\text{H}}\,\,\,\,\,\,\,\Gamma\vdash F:\alpha}{\Gamma\vdash JF:\mathbf{\text{H}}}{\scriptscriptstyle (\rightarrow e)}$$

where we get the same result: $JF$ is of type $\mathbf{\text{H}}$. $\square$

Remark 3. Natural language predicates (e.g. ‘is blue’, ‘jumped over the fence’, ‘is a concept’, ‘conquered Gaul’, ‘may have stolen my wallet’, etc.) are concepts in $\mathbb{T_{\mathcal{F}}}$ while subjects, be they proper nouns, improper nouns, pronouns, etc. (e.g. ‘the car’, ‘Fido’, ‘he’, ‘the tall man’, ‘the Brooklyn Bridge’, etc.) are objects. We will now look at more nontrivial examples of natural language propositions and attempt to model them in $\lambda^{\rightarrow}$ under $\mathbb{T_{\mathcal{F}}}$.

Example 2. Consider the statement

$$\text{‘The concept horse is a concept.’}$$

Claim 1. We will set the object-expression ‘The concept horse’ $=T_{ch}$ and the (one-place) function-expression ‘$\left(x\right)$ is a concept’ $=C$. Example 2 is witness to a semantic ambiguity between natural language and our type system ($\mathbb{T_{\mathcal{F}}}$). More specifically, (6) evaluates to ‘true’ in natural language, but ‘false’ under $\mathbb{T_{\mathcal{F}}}$, i.e. $CT_{ch}:\mathbf{\text{H}}$ has an ambiguous truth-value. So, if $T_{ch}:\alpha$ is the saturated object-expression ‘the concept horse’ and $C:\alpha\rightarrow\mathbf{\text{H}}$ is the unsaturated function-expression ‘is a concept’, then the mapping $v:\alpha\rightarrow\mathbf{\text{H}}$ is ambiguous.

Proof. Consider the following natural language statement

$$\text{‘The house is a house.’}$$

Trivially, (7) is true. Per Definition 7, the ‘is’ here seems to be an ‘is’ of identity, so $v(\text{‘house’})=\text{‘true’}$ since $\text{‘house’}=\text{‘house’}$.

Now, let’s define a new set of types $\mathbb{T_{\mathcal{F}}^{\mathrm{*}}}$ where $\mathbb{T_{\mathcal{F}}^{\mathrm{*}}}$ is just like $\mathbb{T_{\mathcal{F}}}$ except that $\mathbb{T_{\mathcal{F}}^{\mathrm{*}}}$ has one and only one additional type: $\beta\in\mathbb{T_{\mathcal{F}}^{\mathrm{*}}}$. We will call all terms of type $\beta$ houses. Now, we have an ambiguity in (7); it is unclear whether we’re asking if the identity $\text{‘house’}=\text{‘house’}$ holds or whether we’re asking if the saturated object-expression ‘the house’ is of type $\beta$. The former is true per the first half of this proof. The latter is false, as any object-expression is of type $\alpha$ in $\mathbb{T_{\mathcal{F}}}$ and, by extension, also in $\mathbb{T_{\mathcal{F}}^{\mathrm{*}}}$ and $\alpha\neq\beta$. This same kind of ambiguity happens in (7). $\square$

Definition 8. Given the proof of Claim 1, we will now denote two variations of $v$, essentially defining two valuations for $\alpha\rightarrow\mathbf{\text{H}}$:

  1. $v_{0}$ for meta-language predicates;
  2. $v_{\mathfrak{L}}$ for natural language predicates.

In light of this, we now have two interpretations for statements that involve objects, concepts, and truth-values. So, in some cases, $v_{0}(CT_{ch}:\mathbf{\text{H}})$ will evaluate to ‘false’, but $v_{\mathfrak{L}}(CT_{ch}:\mathbf{\text{H}})$ evaluates to ‘true’ or vice-versa. This is known as the $\mathtt{\text{concept horse paradox}}$.

Theorem 1. Given any object-expression, $\mathbb{O}:\alpha$, and if we let the function-expression ‘$\left(x\right)$ is a concept‘ $=C$, $v_{0}(C\mathbb{O}:\mathbf{\text{H}})$ will always evaluate to ‘false’.

Proof. From the proof of Claim 1 and Definition 8. $\square$

Example 3. Now consider the statement:

$$\text{‘Seabiscuit is the concept horse.’}$$

Claim 2. We will set the object-expression ‘Seabiscuit’ $=S$ and the (one-place) function-expression ‘$\left(x\right)$ is the concept horse’ $=C_{h}$. Example 3 avoids the ambiguity in (6) because its truth value happens to be false under both valuations, so $v_{0}(C_{h}S:\mathbf{\text{H}})=v_{\mathfrak{L}}(C_{h}S:\mathbf{\text{H}})$.

Proof. By Theorem 1, $v_{0}(C_{h}S:\mathbf{\text{H}})$, will always evaluate to ‘false’. In natural language, ‘Seabiscuit’ is not ‘the concept horse’ (at best, it is an instance of ‘the concept horse’), so $v_{\mathfrak{L}}(C_{h}S:\mathbf{\text{H}})$ will evaluate to ‘false’ here as well. Since both valuations return ‘false’, we avoid the paradox. $\square$

Example 4. Consider the statement:

$$\text{‘Anna is something Bill is not — namely, a student.’}$$

Remark 4. Example 4 is not ambiguous as it only uses terms found in natural language. This is trivial to prove (since the only valid valuation is $v_{\mathfrak{L}}$) However, (9) introduces another problem with $\mathbb{T_{\mathcal{F}}}$ (more specifically, with $\lambda^{\rightarrow}$): subtyping. It seems intuitive that ‘being a student’ should be a member of ‘things Bill is not’. As it turns out, $\lambda^{\rightarrow}$ has no way of representing this.

Proposed Solution: Semantic Culling

The most straightforward solution to the semantic problem of the $\mathtt{\text{concept horse paradox}}$ is a simple one. Instead of using terms like concept and object, that not only have a meaning in the meta-language, but also in natural languages, we will use terms that have no meaning in natural language. Consider a new type system, $T_{\mathcal{F}}^{\mathrm{**}}$.

The types of $\mathbb{T_{\mathcal{F}}^{\mathrm{*}}}$: $\begin{cases}\alpha\in\mathbb{T_{\mathcal{F}}} & \text{(}foo\text{);}\\\mathbf{\text{H}}\in\mathbb{T_{\mathcal{F}}} & \text{(baz);}\\(\alpha\rightarrow\mathbf{\text{H}})\in\mathbb{T_{\mathcal{F}}} & \text{(}foobaz\text{).}\end{cases}$

Under such a schema, statements like ‘The concept horse is [a] foobaz’ are meaningless in natural language, but are semantically well-formed in the meta-language. Thus, we can infer that we need to valuate the statement with $v_{0}$ as opposed to $v_{\mathfrak{L}}$ and avoid any ambiguity.

Remark 5. The caveat here is that we lose some expressiveness in natural languages. In Example 3, we were able to refer both to the semantics of natural language and the meta-language without ambiguity. This would no longer be possible.

Proposed Solution: Syntactic Sugar

Another proposed solution is syntactic in nature. We previously showed that ‘Fido jumped over the fence’ (Example 1) can be formulated as follows:

$$\dfrac{\Gamma\vdash J:\alpha\rightarrow\mathbf{\text{H}}\,\,\,\,\,\,\,\Gamma\vdash F:\alpha}{\Gamma\vdash JF:\mathbf{\text{H}}}{\scriptscriptstyle (\rightarrow e)}$$

We will now introduce a new type of $\lambda^{\rightarrow}$-term that will differentiate between what valuation $v$ one should use to determine the truth-value of some $M:\mathbf{\text{H}}$. To do this, consider a new type system $\mathbb{T_{\mathcal{F}}^{\mathrm{\omega}}}$.

The types of $\mathbb{T_{\mathcal{F}}^{\mathrm{\omega}}}$: $\begin{cases} \alpha\in\mathbb{\mathbb{T_{\mathcal{F}}^{\mathrm{\omega}}}} & \text{(}objects\text{);}\\\mathbf{\text{H}}\in\mathbb{T_{\mathcal{F}}^{\mathrm{\omega}}} & \text{(truth-values);}\\(\alpha\rightarrow\omega\rightarrow\mathbf{\text{H}})\in\mathbb{T_{\mathcal{F}}^{\mathrm{\omega}}} & \text{(unrestricted }concepts\text{);}\\(\omega\rightarrow\mathbf{\text{H}})\in\mathbb{T_{\mathcal{F}}^{\mathrm{\omega}}} & \text{(restricted }concepts\text{);} \\\omega\in\mathbb{T_{\mathcal{F}}^{\mathrm{\omega}}} & \text{(}\textbf{worlds}\text{).} \end{cases}$

A derivation of (10) would now look like the following:

$$\dfrac{\dfrac{\Gamma\vdash J:\alpha\rightarrow\omega\rightarrow\mathbf{\text{H}}\,\,\,\,\,\,\,\Gamma\vdash F:\alpha\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,}{\Gamma\vdash JF:\omega\rightarrow\mathbf{\text{H}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\Gamma\vdash W:\omega}{\scriptscriptstyle (\rightarrow e)}}{\Gamma\vdash JFW:\mathbf{\text{H}}}{\scriptscriptstyle (\rightarrow e)}$$

The purpose of $W$ is to pick out one (or several, if there are no ambiguities) valuating function(s). So, in Example 1, $W$ picks out $v_{\mathfrak{L}}$, in Example 3, $W$ picks out $\{v_{\mathfrak{L}},v_{0}\}$, and in Example 2, $W$ either picks out $v_{\mathfrak{L}}$ or $v_{0}$ but not both. A benefit of $W:\omega$ is that it can pick out the meta-language ($v_{0}$), meta-meta-language ($v_{1}$), meta-meta-meta-langauge ($v_{2}$), etc. ($v_{n}$). Furthermore, $W$ can also pick out combinations of languages ($\{v_{\mathfrak{L}},v_{0},v_{2},v_{n}\}$), provided the truth value stays consistent.

Remark 6. A caveat of this syntactic addition is that $W$ tends to be implicit and thus, the $\mathtt{\text{concept horse paradox}}$ merely shifts (now we have an ambiguity of $W$) and does not completely disappear. A rather elegant fix is letting $W$ pick out $v_{\mathfrak{L}}$ by default. In such a case, the paradox would dissolve and we would still preserve the truth-value in sentences like Example 3.

Finishing what Intel started — Building an Arduino-powered anti-cheat

In 2007, during Research@Intel Day, an interesting tech demo took place; but in spite of the project being featured prominently in the media, users throwing money at their screens, and the concept being genuinely fascinating, it never came to fruition. Maybe it was the right idea at the wrong time. Maybe Intel didn’t pour enough man-power into it. Maybe it was the economy on the down-swing. Who knows. But it unquestionably fizzled out of existence…
schluessler-Resized
What was it? A hardware-based anti-cheat system dubbed “Fair Online Gaming” developed by Intel engineers Travis Schluessler and Daniel Pohl, pictured to the right. At the time, I only found out about it because I was working in the same domain: the Cyberathlete Amateur League had begun prototyping their client-side anti-cheat and I helped develop the first incarnation of the CAL-AC alongside Wim Barelds (one of the creators of zBlock) and Andres “misty” Gunaratne (now with Fantasy Esports). The CPL was acquired by WoLong Ventures and CAL went along with it. Professional and casual gamers alike left CAL for ESEA, CEVO, and the ESL. After a not-too-shabby career, I decided to quit gaming and go back to school. The story doesn’t end there, but let’s rewind a little. How does an anti-cheat like Intel’s FOG work, and, more importantly, why do we need a hardware anti-cheat?

Software Sucks

It’s not fair to accuse Intel of completely dropping the ball here. After all, FOG was, in some ways, a vision of the future that got many things right (and a few wrong, but more on that later). First of all, it was a hardware-based anti-cheat solution. Software ACs rely on a few methods of detecting whether or not a player is cheating and most of these methods are, unfortunately, quite easily beatable. Let’s look at a couple.

Signature-based detection

Signature-based detection (SBD) is a relatively simple method of virus or cheat detection. SBD begins when a change is noticed either on (a) the file system (e.g. a new file is downloaded from the internet) or (b) in user-space memory. Suppose we have a large database of known cheats. This database consists of some identifiable “fingerprint” that the cheat possesses when either on disk or in memory. For example, given the following database:

Cheat name Fingerprint
SuperWallhack 0xAH 0x44 0x32
Aimbot2000 0x00 0x04 0xFF
...

And given the following memory layout (note the highlighted sections):

...
0x33
0xF0
0x00
0x04
0xFF
0x58
0x33
0xF0
0x58
0x47
...

We can conclude that the player is aimbotting. To do this search, variations of Boyer-Moore are used; often times in conjunction with suffix trees or suffix arrays. As long as the fingerprint is long- and unique-enough, collisions should be a rare happenstance.

The drawback of SBD is, unsurprisingly, that it requires a database. For a virus to be detected, it must have already been detected (or, at the very least, isolated) in the past. This is not only problematic for anti-virus software, but exponentially more problematic for anti-cheat software. Consider a private cheat that is sold through various grey markets online. The “benefit” of a virus is that it behaves unruly: no computer user wants a virus on their PC. This means that whenever a PC is infected with an “under the radar” virus, it quickly becomes quarantined and recorded in an AV database. This is not the case with cheats. Players that want to cheat have an interest in keeping their cheats private (lest they get caught). Therefore, it is virtually impossible for any SBD-like methods to catch them.

Furthermore, SBD has another drawback. Consider cheats or viruses that are non-static in nature; that is, use self-modifying code (e.g. are polymorphic or metamorphic in nature), or download encrypted parts of their own source code over the internet. Catching viruses with SBD methods might be a first step, but as far as cheats are concerned, the method seems untenable from the get-go.

So, in conclusion:

  • False positives? None, given a large-enough key. Collisions should realistically never happen.
  • False negatives? Many. Almost all purchasable cheats will not be caught as they update fairly regularly.
  • How to beat? If you’re writing a cheat yourself, don’t release the source code. If you do release the code, make it polymorphic or metamorphic.

Monitoring API Calls

Monitoring API calls is a more heuristic approach used by anti-cheat software. A cheat may hook into, say, a rendering function with code that looks like the following (this code uses the Microsoft Detours library and assumes a double F(double) function signature):

#define ADDRESS 0xDEADB33F					// this is the address where our targeted function begins
double (__cdecl* originalFunction)(double);	// pointer to the function we are going to hook, must be declared same as original

// modified function code that is going to be executed before continuing to the code of original function
double hookedFunction(double a) {
	std::cout << "original function: argument = "<< a << std::endl; // we can access arguments passed to original function
	a = 42;															// modify arguments
	return originalFunction(a);										// before returning to normal execution of function
}

BOOL APIENTRY DllMain(HANDLE hModule, DWORD dwReason, LPVOID lpReserved) {
	switch (dwReason) {
		case DLL_PROCESS_ATTACH:
			// magic
			originalFunction = (double(__cdecl*)(double))DetourFunction((PBYTE)ADDRESS, (PBYTE)hookedFunction);
			break;
		}
	return TRUE;
}

To prevent such hooking, anti-cheat software may implement a white-list and monitor OS API calls like DetourFunction, LoadLibrary, CreateRemoteThread, and WriteProcessMemory. After all, some applications that hook into games are not malicious. Consider Fraps, the Mumble overlay, OBS, etc. If an application is not white-listed, bad things happen. Case in point: ENB series — a shader beautifier — was ban-worthy for several months. Hundreds of Dark Souls II players were unjustly banned on Steam when using a DLL proxy instead of the vanilla DirectX.

More advanced methods, e.g. virtual method table hooking, are still undetected (and will most likely always be undetected). Hooking vtables is difficult, but not impossible. The most difficult part is finding the method offsets (which often change from update to update). Since VAC3 has implemented some modules that may prevent internal vtable hooking (via sanity checks), this has not stopped cheaters from vtable hooking third-party libraries like Direct3D, OpenGL, and DirectInput. Here’s a code snippet that does just that:

class CGame;
class cVMT;
class cD3D;

class CGame {
public:
    cVMT* pVMT; //0x0000

};//Size=0x0004

class cVMT {
public:
    DWORD pD3D; //0x0000

};//Size=0x0004

const DWORD_PTR dwAddy = reinterpret_cast<DWORD_PTR>(GetModuleHandleA("shaderapidx9.dll")) + 0x1A0A78;
CGame* pGame = (CGame*)dwAddy;

typedef HRESULT(WINAPI* tEndScene)(LPDIRECT3DDEVICE9 pDevice);
tEndScene pEndScene;

HRESULT WINAPI hkEndScene(LPDIRECT3DDEVICE9 pDevice) {
    _asm pushad;
    MessageBox(0, L"Hello world from EndScene", 0, MB_OK);
    _asm popad;
    return pEndScene(pDevice);
}

DWORD WINAPI vThread() {
    DWORD_PTR dwEndScene = NULL;

    while (pGame != NULL) {
        pEndScene = (HRESULT(WINAPI*)(LPDIRECT3DDEVICE9 pDevice)) *(DWORD_PTR*)(pGame->pVMT->pD3D + 0xA8);

        VirtualProtect((LPVOID)(pGame->pVMT->pD3D + 0xA8), sizeof(DWORD_PTR), PAGE_EXECUTE_READWRITE, &dwEndScene);

        while (TRUE) {
            *(DWORD_PTR*)(pGame->pVMT->pD3D + 0xA8) = (DWORD_PTR)hkEndScene;
        }

        VirtualProtect((LPVOID)(pGame->pVMT->pD3D + 0xA8), sizeof(DWORD_PTR), dwEndScene, &dwEndScene);
    }
    return 0;
}

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved) {
    if (fdwReason == DLL_PROCESS_ATTACH) {
        ::CreateThread(0, 0, (LPTHREAD_START_ROUTINE)vThread, 0, 0, 0);
    }
    return TRUE;
}

0xA8 happens to be the magic number here — it’s the offset for IDirect3DDevice9::EndScene, so anything we draw will be drawn on top of the current frame. An AC could monitor VirtualProtect or CreateThread, but a lot of legitimate code use these. Not only that, but there are many ways to cleverly get around using them. Often times, for example, cheat writers use undocumented features of an OS. It goes without saying that as cheats bury themselves in ring1 and ring0, they become virtually impossible to catch by non-invasive methods. So how does API monitoring look?

  • False positives? Assuming a perfect whitelist, none. Real-life cases show otherwise. Carrying heavy penalties can hurt innocent people and going back on a ban can hurt credibility.
  • False negatives? Many. Most purchasable cheats don’t rely on naive injection methods.
  • How to beat?
    1. Don’t write memory in the game’s space
    2. Don’t use obvious hooking methods
    3. Do run it in ring0

Stochastic Methods

And finally, we have stochastic methods. Stochastic methods of anti-cheat detection are based on past behavior and a well-defined probabilistic model. Even though these kinds of methods often catch the most disruptive of players, it would be a grave mistake to rely solely on a probabilistic model. Consider, for example, the following video:

The 15%-20% aim enhancement shown in the video can hardly be noticed by a human viewer, let alone be considered an outlier in a probabilistic model. This is how GameBlocks’ FairFight works — a popular XBOX anti-cheat tool. According to the company,

Fairfight uses two overlapping and mutually supportive approaches to identify cheaters: algorithmic analysis of player statistics and server-side cheat detection. FairFight uses algorithmic models to evaluate gameplay against multiple statistical markers to identify cheating as it occurs. FairFight crosschecks these indicators using objective server-side reporting tools. It takes action when both approaches correlate to cheating — and because you establish FairFight’s tolerances and the in-game actions to be taken against the hacker, you are in control of your game like never before. Learn more about integrating FairFight Into your game today.

Unfortunately, a tightly-tuned cheat could never be caught. In some cases, simply using SBD would be more advantageous than a purely stochastic method.

  • False positives? Depends on the tolerance.
  • False negatives? Depends on the tolerance.
  • How to beat? Make sure your cheat’s behavior is not an outlier.

Hardware Rocks

But let’s get back to Intel’s FOG. Here’s how it was supposed to work:

The fundamental premise behind the Fair Online Gaming technology research is that platform hardware can help detect cheating in online games. If this technology makes its way into PCs, gamers who want to play without cheaters can choose to turn it on and play with other gamers that also turn the technology on in their systems. The Fair Online Gaming technology itself is a collection of hardware, firmware and game software running on client PCs and servers. In our labs, we’ve tested the concept’s effectiveness at detecting entire classes of cheating, not just specific cheats.

I previously mentioned that Intel made a few mistakes, so here they are. First of all, making a vendor-specific anti-cheat creates a rift in the gaming community. Some gamers are diehard AMD fans, others Intel; some nVidia, some ATI, and so on. A cursory glance at the diverse demographic will yield the fact that a hardware anti-cheat would have to be vendor-agnostic. Second of all, Intel’s Fair Online Gaming was (when developed) using Intel’s “Active Management Technology” feature. A major part of AMT is the programmable (read: flashable) firmware. A hardware solution should not be bound by any PC-specific software constraints. It should work out of the box without any client-side software, lest we open the can of worms that is client-side anti-cheat. In short, what we should have is something like this:

Hardware Anti-cheat

The Prototype

What would this device look like? Could I actually build it? Before I did any development, I quickly scribbled down some notes; it had to meet the following requirements:

  • Price: somewhere between 30 and 100 dollars
  • Size: portable
  • Ports: USB in, USB out, Ethernet out
  • Input latency: < 1ms

Just a few months ago, I knew very little about Arduino, but I started doing some research. I wasn’t sure what the hardware platform would be and I was skeptical about reaching a sub-1ms latency, but I was absolutely sure that I could do some kind of “USB passthrough”. Ironically, what I thought would be tough to do ended up being trivial and what I thought would be trivial ended up being difficult. I settled on the Arduino Mega ADK (which has a host USB shield embedded) and an Ethernet shield. What followed was relatively painless. I used the USB_Host_Shield_20 library to grab mouse states from a Razer mouse I connected to my Mega ADK board. Using the OnMouseMove callback, I was getting a stream of dx and dy values. About 2000 lines of code later, I had an almost-fully-functioning prototype, temporarily dubbed “Valkyrie” . Here’s the class definition:

class Valkyrie : public UniversalReportParser {
    public:
        void Setup();
        void Loop();
    protected:
        virtual void Parse(HID *hid, bool is_rpt_id, uint8_t len, uint8_t *buf);
    private:
        // C++ is so fucking stupid
        static void Heartbeat();
        HIDMouseReport *in;
        HIDMouseReport out;
        EthernetClient client;
        Timer timer;
};

#endif // VALKYRIE_H_INCLUDED

There’s nothing too fancy about it, just a lot of housekeeping. It was basically a fancy USB HID pass-through:

void Valkyrie::Parse(HID *hid, bool is_rpt_id, uint8_t len, uint8_t *buf) {
	in = (HIDMouseReport *) buf;

	// in/out transaction
	out.dx      = in->dx;
	out.dy      = in->dy;
	out.buttons = in->buttons;
	out.wheel   = in->wheel;
	out.i0      = in->i0;
	out.i1      = in->i1;
	out.i2      = in->i2;
	out.i3      = in->i3;

	// send packet to AVR controller
	Serial.write((uint8_t *) &out, SMALL);
	// ... and to AC server
	client.sendData(String(out.dx) + " " + String(out.dy));
}

The real magic happens in the LUFA-powered firmware that enables the Arduino to be “seen” as an HID device (specifically a mouse). Some still-pending issues are making the mouse wheel work and making mouse buttons 4 and 5 work. Here’s a picture of the real thing:
Close-up

Testing Counter-Strike: Global Offensive

To test the prototype, I wrote a small server-plugin for CSGO that relays player view angles back to a local server (this would be the anti-cheat server in the diagram above):

public Action:OnPlayerRunCmd(client, &buttons, &impulse, Float:vel[3], Float:angles[3], &weapon, &subtype, &cmdnum, &tickcount, &seed, mouse[2]) {
	if (prev_x[client] != angles[0] || prev_y[client] != angles[1]) {
		new data[128];
		Format(data, 127, "%f:%f::%f:%f", angles[0], angles[1], prev_x[client], prev_y[client] );
		prev_x[client] = angles[0];
		prev_y[client] = angles[1];
		SocketSend(socket, data);
	}
}

As soon as I started getting data from both the Arduino as well as the game server, I had a realization: “holy crap, this is working.” First if all, I could catch triggerbots with 100% accuracy. This alone has amazing ramifications. Triggerbots don’t only affect first person shooters like CS and the Battlefield series, but they can be generalized to DOTA2 and League of Legends where new “instant click” or “instant ability” cheats have started cropping up — cheats that simulate either mouse button presses or keyboard presses.

The data started pouring in and I started graphing it. For example, here’s the y-axis movement of the mouse (straight from the optical sensor) alongside the angle movement in-game — both are given in differentials.
Graphs
Obviously, the two don’t “line up” perfectly and nfortunately, the two will never be isomorphic. Windows (and OSX/Linux) tends to provide various mouse features that consist of smoothing, acceleration, deceleration, interpolation, etc. Furthermore, the “raw” graph tends to be noisier. This is due to the fact that mice can fire new events up to 1000 times a second, whereas Counter-Strike’s best tick rate is about 128 ticks/second. One way of “tightening” the data is to simply look at whether or not the mouse is moving in a positive or negative (x or y) direction ignoring the actual distance the mouse moved. Given this constraint (1 = moving up, -1 = moving down, 0 = stationary), we have this:
Graph2
Why this graph is interesting is that we clearly see the causal relationship between the Arduino data (in red; fresh from the mouse) and the game data (in blue). When this causal relationship breaks, the player is cheating. The test? A resounding success.

Conclusions & Future Developments

So what are the next steps and what did I learn? Well first of all, I learned that Intel has really good ideas — FOG seemed awesome 8 years ago and building my own version of it is both humbling as well as exhilarating. I also learned that the Arduino community is the absolute best. The amount of help and resources that are out there is beyond staggering. I didn’t pick the easiest “first hardware project ever” but I couldn’t’ve done it without the code-base and tutorials that are already out there.

What are the next steps? Well, I need to brush up on some signal processing and figure out a good way of weeding out the aimbotters from the non-aimbotters, I need to implement HMAC (or a variation of it) to make sure that the device itself is tamper-proof, I need to fix some bugs in the firmware that are screwing up overflow bits in the HID report, and I need to also make sure that buttons 3, 4, and 5 work on every mouse I hook up to the Arduino.

And even though I’ve been away from gaming for a while now, I also learned that this may be the device the gaming community needs right now. Among cheating allegations at the highest levels in CS:GO, as well as new cheats coming out for mainstream games like DOTA2 and League of Legends — not to mention the fact that the cheating industry is growing at a ridiculous pace — it’s becoming clear that something needs to be done. Who knows, maybe there’s a Kickstarter on the horizon…

Happy Birthday, VICTOR & 2015!

It occurred to me that 5 years ago today (okay, technically ~5.01 years ago today) my sister and I built a cool little AI construct known as VICTOR. I use ‘cool’, ‘AI’, and ‘construct’ loosely. Really, VICTOR is a deceptively simple little program that has consistently seemed to capture the hearts and minds of academic advisors, potential employees, friends, and family. VICTOR’s been defunct for several years (he doesn’t crawl news sources any more; nor does he post on Twitter), but he always seems to find a place in my heart (and on my resume).

A typical conversation about my projects goes something like so:

Me: Yep, I do startups. To keep my skills honed I often do all kinds of fun little projects!
Me: For example, Jumpino, or skrch, or VICTOR, or this new project I’ve been working on: Jssembly. It’s really cool. It lets you run native Assembly in Java.
—: I see. VICTOR looks amazing. Tell me more about him.
Me: … well, it’s not really technically challenging or anything. And the math isn’t hard. It’s just a silly little toy. Why don’t we look at skrch instead? It’s a sketch-based OCR search-engine! How cool is that?
—: Yeah, it’s pretty cool I guess, but can you tell me how VICTOR works?

Everybody loves VICTOR

D’oh! Well I don’t know why people love you so much, VICTOR, but I’ll dedicate this post to you. So how does he work? It’s very simple, actually. Before we dive in the nitty-gritty, here are some facts about VICTOR:

  • VICTOR is 1832 days old!
  • VICTOR is written in PHP and Prototype. Back then, jQuery wasn’t cool and Angular wasn’t around.
  • VICTOR’s emotional spectrum is based on a schema devised by Dr. Klaus Sherer and his colleagues at the University of Geneva.
  • VICTOR’s design is (perhaps unsurprisingly) inspired by WALL-E’s EVE.
  • VICTOR has a Twitter account with over 4000 tweets. Unfortunately, it’s no longer active.

First, let’s observe VICTOR’s behavior. Why so many people may like him is because his emotional response is (at times) eerily accurate. Consider a few examples. These are 10 random non-cherry-picked samples and were taken simply by refreshing VICTOR’s page and pasting them here.

Emotion Headline Accuracy?
Worried Britney Spears’ Conservatorship is Ending. For Real This Time. (Maybe.) Pretty accurate, especially for a Britney fan, although the headline may be in jest.
Confident Obama denies crisis with Israel I can see that. “Denies” is a confidence-inspiring word.
Curious Ahmadinejad aide says Iran not ready to talk nuclear Curiosity is a neutral emotion. I’d say this is more miss than hit.
Serious Bears get next chance to upend red-hot Tom Brady, Patriots Don’t really see this one. The headline is about sports, not something particularly serious.
Excited Google’s Shopping Spree Explained In 10 Bullet Points (GOOG) A shopping spree is always exciting. This makes sense.
Impressed O’Gara returns for Italy opener Maybe VICTOR knows something about O’Gara that I don’t. This one is a miss, I’m afraid.
Convinced Wage gap shrinks between men and women Confirmation bias at work? If anything, the emotional reponse is funny.
Excited Better Get Ready Europe, TechCrunch Partybus is Heading Your Way Pretty head-on. Everything about the headline screams “exciting”.
Impressed Kurt Warner mindful of the end as Cardinals seek Super return Nope.
Curious Motorola Ming A1680, MT810, and XT806 begin their Android mercy mission in China Curious generally means there’s not enough data to move on the emotional spectrum.

Emotional baggage

That was fun. But how do VICTOR’s emotions work? We’ve already seen some variety in his behavior. Is there a systematic way to “graph” emotion? There might be. Based on research done by Dr. Klaus Sherer (see Toward emotion-sensitive multimodal interfaces: the challenge of the European Network of Excellence HUMAINE), let’s begin by drawing a cartesian coordinate system and labeling some axes. According to the model, we end up with this:

Emotional Spectrum

Given this model, emotions have two dimensions: excitatory power (passive/active), and valence (negative/positive). So, e.g., happiness would probably fall somewhere in the second quadrant, whereas apathy might be in the fourth. The hard part, as you may have guessed, is plotting a nontrivial amount of emotions on this spectrum. Eventually, and using a separate tool to assist in plotting new emotions, we came up with the following (populated) spectrum:

Emotion Plot

The algorithm

Step 1 — Tabula rasa: VICTOR doesn’t know anything. Suppose he reads the following headline:

David is awesome!

As he encounters new words (e.g. “David”, “is”, “awesome”), VICTOR puts them into a database and gives them the emotional charge of curiosity. The database, for example, would look like this:

word eX ey
david 0 0.333
is 0 0.333
awesome 0 0.333


Step 2 — Training
: I see the same headline on VICTOR’s dashboard (“David is awesome”) and I mark it as maximally exciting (coordinates: (-0.707, 0.707)). VICTOR amends the database, so that all words in the headline are closer to (-0.707, 0.707). The database now looks like the following; using the midpoint equation, we have:

word eX ey
david -0.3535 0.52
is -0.3535 0.52
awesome -0.3535 0.52

Step 3 — How do you feel? It’s trivial to see now that any headline with a word like “awesome” (accurate) or “David” (inaccurate) will make VICTOR more excited! This is where the training comes in. With enough data, neutral words like “is” or “David” will slowly migrate towards the center, whereas charged words like “awesome” will accurately model emotional states that VICTOR should feel when reading various headlines.

Wrapping it all up

If my intuitive explanation failed to satisfy some of the more technically-inclined, here’s how I originally explained his inner workings: VICTOR reads 20 headlines every hour. He looks at every word carefully and if he understands the word, he is affected emotionally by it accordingly; if he doesn’t understand the word, he becomes more curious. The emotional central tendency of any word w based on a [0,n] training set is defined by:

As VICTOR reads, he traverses the emotional spectrum via a simple recursive mid-point equation. His final emotion E after reading a headline and traversing over an arbitrary number of words is given by:

Well that’s it for me. Happy new year! And, of course, a warm “happy birthday” goes out to VICTOR.

(To see him in action, click here.)

Cracking the Coding Interview Problem 2.2

I don’t blog often. And I haven’t blogged about coding for a while; but I think I should. So here goes.

Having recently graduated from UCLA, I’m now trying to get a solid job at a solid company making some solid money. My skills primarily consist of writing code and running startups so that’s my target. Apart from the obvious choices (Google, Facebook, Microsoft, and so on), I’m also applying to a slew of startups around LA and the Bay Area. If you’re not familiar with the interview processes for tech companies, they go something like so: HR interview, phone tech interview 1, phone tech interview 2 … phone tech interview n, campus tech interview 1, campus tech interview 2 … campus tech interview n, offer or no offer.

Now I’m all up for hiring smart people that know how to get things done, but I really really loathe the technical interview. To explain why, let me cite an example from Cracking the Code Interview — the de facto bible of tech interviews.

2.2 Implement an algorithm to find the nth to last element of a singly linked list.

Easy, right? Well, not so fast, skipper. Here is the proposed solution (by an apparent Googler):

public static LinkedListNode nthToLast(LinkedListNode head, int n) {
	LinkedListNode p1 = head;
	LinkedListNode p2 = head;

	if (n <= 0) return null;

	// Move p2 n nodes into the list. Keep n1 in the same position.
	for (int i = 0; i < n - 1; i++) {
		if (p2 == null) {
			return null; // Error: list is too small.
		}
		p2 = p2.next;
	}
	if (p2 == null) { // Another error check.
		return null;
	}

	// Move them at the same pace. When p2 hits the end,
	// p1 will be at the right element.
	while (p2.next != null) {
		p1 = p1.next;
		p2 = p2.next;
	}
	return p1;
}

Feel free to scratch your head and go “what the hell” because I know I am. The above is considered a solution to Problem 2.2. First, let me mention what’s wrong with it:

  • Using two pointers is dumb.
  • It’s difficult to read. I needed 10 minutes to get my head around it when the function should be doing something trivial
  • It’s simply bad design. More on this in a few.

Now let’s solve the problem like real software engineers, not hipster programmers that thrive on esoteric nonsense:

// this is how normal people find the length of a linked list
public int getLength(Node head) {
	int length = 0;
	Node n = head;
		
	// iterate through the list
	while (n != null) {
		// yay more length
		++length;

		// go forward!
		n = n.next;
	}
	return length;
}
	
// this is how normal people find the nth to last node of a linked list
public Node getnthFromLast(Node head, int nth) {
	int length = getLength(head);
	Node n = head;
	int location = 0;
		
	// if list is empty or we want to go too far back or we want zeroth to last, return null
	if (length == 0 || nth > length || nth <= 0) return null;
		
	// 1st to last = last
	// 2nd to last = penultimate (length-1)
	// 3rd to last = length - 2
	// .. and so on
	while (n != null) {

		// right location?
		if (location == length - nth) return n;
			
		// next spot
		++location;
			
		// next node
		n = n.next;
	}
}

Simple, concise, readable. What irks me most is that not only will interviewers think code like the one from the book is good, but so will interviewees. Oh well, c’est la vie.

Back in black

After a Dreamhost blunder that left thousands with their websites hanging by a thread (or by a noose), my blog was also deemed infected with some sort of virus. I have no idea what kind of vulnerability DH was trying to fix, but I couldn’t find anything that was compromised. Even so, my WordPress installation wouldn’t start up. So I had to re-install the whole thing, find a new theme, mess around with it a bit and re-launch.

I’m back in black. And to all of my imaginary fans: I love you!