Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diff broken: considers two files or directories as delete+add rather than modify #565

Closed
tstromberg opened this issue Nov 2, 2024 · 4 comments · Fixed by #581
Closed
Assignees

Comments

@tstromberg
Copy link
Collaborator

I'm not sure when this happened (some time before v1.0.0), but I noticed that mal diff is basically broken. Given two directories:

/tmp/old
/tmp/old/lottie-player.min.js
/tmp/new
/tmp/new/lottie-player.min.js

If I run mal diff /tmp/old /tmp/new, it sees it as one file deleted and another one added, rather than a single file that changed:

% m diff /tmp/old /tmp/new                                                                                                                                    

Deleted: ../../private/tmp/old/lottie-player.min.js [⚠️ MEDIUM]
-------------------------------------------------------------------------------------------------------------------------------------
RISK  KEY                             DESCRIPTION                             EVIDENCE
-------------------------------------------------------------------------------------------------------------------------------------
-LOW  data/encoding/json/decode       Decodes JSON messages                   JSON.parse
-LOW  data/encoding/json/encode       encodes JSON                            JSON.stringify
-LOW  impact/words/plugin             references a 'plugin'                   function installPlugin
                                                                              getExpressionsPlugin
                                                                              plugins
                                                                              return expressionsPlugin
                                                                              setExpressionsPlugin
-LOW  net/url/embedded                contains embedded HTTPS URLs            https://www.jsdelivr.com/using-sri-with-dynamic-files
-LOW  net/url/parse                   Handles URL strings                     new URL
-MED  exec/remote_commands/code_eval  evaluate code dynamically using eval()  eval("
-MED  net/download/download           download files                          download_
-MED  os/time/clock/sleep             uses setInterval to wait                setInterval(
-------------------------------------------------------------------------------------------------------------------------------------

Added: ../../private/tmp/new/lottie-player.min.js [🚨 CRITICAL]
-----------------------------------------------------------------------------------------------------------------------------------------
RISK   KEY                               DESCRIPTION                                            EVIDENCE
-----------------------------------------------------------------------------------------------------------------------------------------
+LOW   c2/addr/url/unusual               Contains HTTP hostname with unusual top-level domain   https://api.mantlescan.xyz/
                                                                                                https://mantlescan.xyz/
                                                                                                https://openchain.xyz/
+LOW   credential/ssl/private_key        References private keys                                privateKey
+LOW   crypto/aes                        Supports AES (Advanced Encryption Standard)            AES
+LOW   crypto/ed25519                    Elliptic curve algorithm used by TLS and SSH           ed25519
+LOW   data/encoding/base64              Supports base64 encoded strings                        base64
+LOW   data/encoding/json/decode         Decodes JSON messages                                  JSON.parse
+LOW   data/encoding/json/encode         encodes JSON                                           JSON.stringify
+LOW   fs/file/open                      opens files                                            open(
+LOW   fs/mount                          mounts file systems                                    -o
                                                                                                mount
+LOW   impact/words/password             references a 'password'                                PasswordBasedCipher
                                                                                                to countless passwords
+LOW   impact/words/plugin               references a 'plugin'                                  plugin_relativeTime
                                                                                                plugin_updateLocale
                                                                                                plugins
+LOW   net/resolve/hostport/parse        Network address and service translation                getaddrinfo
+LOW   net/socket/socket/listen          listen on a socket                                     accept
                                                                                                socket
+LOW   net/socket/socket/send            send a message to a socket                             _send
+LOW   net/url/embedded                  contains embedded HTTPS URLs                           https://abitype.dev
                                                                                                https://andromeda-explorer.metis.io/api
                                                                                                https://andromeda.metis.io/?owner=1088
                                                                                                https://api-era.zksync.network/api
                                                                                                https://api-moonbeam.moonscan.io/api
                                                                                                https://api-moonriver.moonscan.io/api
                                                                                                https://api-optimistic.etherscan.io/api
                                                                                                https://api-zkevm.polygonscan.com/api
                                                                                                …
+LOW   net/url/parse                     Handles URL strings                                    new URL
+LOW   os/env/get                        Retrieve environment variable values                   env.DEBUG
                                                                                                env.MODE
                                                                                                env.NEXT
                                                                                                env.NODE
+LOW   os/fd/read                        reads from a file handle                               e.read()
+LOW   os/fd/write                       writes to a file handle                                a.write(o)
                                                                                                decoder.write(n)
                                                                                                decoder.write(t)
                                                                                                e.write(t)
                                                                                                i.write(e)
                                                                                                t.write(o)
                                                                                                this.write(e)
+MED   anti/static/obfuscation/generic/  converts hex data to ASCII                             toString("hex");
       hex_conversion
+MED   c2/addr/ip                        hardcoded IP address                                   114.243.154.69
                                                                                                13.182.181.343
                                                                                                13.23.32.42
                                                                                                14.22.33.243
                                                                                                14.52.54.92
                                                                                                146.288.257.686
                                                                                                15.15.34.34
                                                                                                15.21.28.36
                                                                                                …
+MED   credential/keychain/keychain      May access the macOS keychain                          keychain
+MED   data/embedded/embedded/base64/    Contains base64 url                                    odHRwOi8v::$http
       url
+MED   discover/system/platform          get system identification                              process.platform
                                                                                                process.versions
+MED   exec/remote_commands/code_eval    evaluate code dynamically using exec()                 exec(e))return
                                                                                                exec(e),e
                                                                                                exec(h)
                                                                                                exec(l),null
                                                                                                exec(o))
                                                                                                exec(r))
                                                                                                exec(t)
+MED   exfil/stealer/browser             Uses HTTP, archives, and references multiple browsers  .config
                                                                                                Brave
                                                                                                Chrome
                                                                                                Discord
                                                                                                Firefox
                                                                                                Opera
                                                                                                POST
                                                                                                Safari
                                                                                                …
+MED   fs/path/relative                  references and possibly executes relative path         ./aes
                                                                                                ./blowfish
                                                                                                ./cipher-core
                                                                                                ./core
                                                                                                ./evpkdf
                                                                                                ./format-hex
                                                                                                ./hmac
                                                                                                ./lib-typedarrays
                                                                                                …
+MED   impact/words/agent                references an 'agent'                                  useragent
+MED   impact/words/heartbeat            references a 'heartbeat'                               heartBeatTimeout
                                                                                                heartbeat_pulse
                                                                                                lastHeartbeatResponse
                                                                                                updateLastHeartbeat
+MED   net/download/download             download files                                         Downloads
                                                                                                downloads-view
                                                                                                mobile-download-links
+MED   net/http/http/form/upload         upload content via HTTP form                           POST
                                                                                                application/json
                                                                                                application/x-www-form-urlencoded
+MED   net/http/http/post                submits content to websites                            Content-Type
                                                                                                HTTP
                                                                                                POST
                                                                                                http
+MED   net/http/websocket                supports web sockets                                   WalletLinkWebSocket
                                                                                                WebSocket:gV
                                                                                                WebSocket:typeof
                                                                                                WebSocketClass:h
                                                                                                WebSocketClass:l
                                                                                                clearWebSocket
                                                                                                webSocket:e
                                                                                                webSocket:r
                                                                                                …
+MED   net/url/encode                    encodes URL, likely to pass GET variables              urlencode
+MED   net/url/request                   requests resources via URL                             requests.get(e)
+CRIT  exfil/stealer/wallet              makes HTTPS connections and references multiple        BraveWallet
                                         wallets by name                                        Coinbas
                                                                                                Ronin
                                                                                                http
-----------------------------------------------------------------------------------------------------------------------------------------

The same happens if I use specify the files by path name:

mal diff /tmp/old/lottie-player.min.js /tmp/new/lottie-player.min.js                                                                                                       
Deleted: ../../../private/tmp/old/lottie-player.min.js [⚠️ MEDIUM]
-------------------------------------------------------------------------------------------------------------------------------------
RISK  KEY                             DESCRIPTION                             EVIDENCE
-------------------------------------------------------------------------------------------------------------------------------------
-LOW  data/encoding/json/decode       Decodes JSON messages                   JSON.parse
-LOW  data/encoding/json/encode       encodes JSON                            JSON.stringify
-LOW  impact/words/plugin             references a 'plugin'                   function installPlugin
                                                                              getExpressionsPlugin
                                                                              plugins
                                                                              return expressionsPlugin
                                                                              setExpressionsPlugin
-LOW  net/url/embedded                contains embedded HTTPS URLs            https://www.jsdelivr.com/using-sri-with-dynamic-files
-LOW  net/url/parse                   Handles URL strings                     new URL
-MED  exec/remote_commands/code_eval  evaluate code dynamically using eval()  eval("
-MED  net/download/download           download files                          download_
-MED  os/time/clock/sleep             uses setInterval to wait                setInterval(
-------------------------------------------------------------------------------------------------------------------------------------

Added: ../../../private/tmp/new/lottie-player.min.js [🚨 CRITICAL]
--------------------------------------------------------------------------

Here is the output of v0.10.0, showing the expected behavior (except that the filename is "."):

go run . --diff /tmp/old/lottie-player.min.js /tmp/new/lottie-player.min.js
Changed: . [⚠️ MEDIUM → 🚨 CRITICAL]

+++ ADDED: 24 behavior(s) +++
----------------------------------------------------------------------------------------------------------------------------
RISK   KEY                       DESCRIPTION                                            EVIDENCE
----------------------------------------------------------------------------------------------------------------------------
+LOW   crypto/aes                Supports AES (Advanced Encryption Standard)            AES
+LOW   crypto/ed25519            Elliptic curve algorithm used by TLS and SSH           ed25519
+LOW   encoding/base64           Supports base64 encoded strings                        base64
+LOW   env/get                   Retrieve environment variable values                   env.DEBUG
                                                                                        env.MODE
                                                                                        env.NEXT
                                                                                        env.NODE
+LOW   fs/mount                  mounts file systems                                    -o
                                                                                        mount
+LOW   net/hostport/parse        Network address and service translation                getaddrinfo
+LOW   net/socket/listen         listen on a socket                                     accept
                                                                                        socket
+LOW   net/socket/send           send a message to a socket                             _send
+LOW   ref/site/url/unusual      Contains HTTP hostname with unusual top-level domain   https://api.mantlescan.xyz/
                                                                                        https://mantlescan.xyz/
                                                                                        https://openchain.xyz/
+LOW   ref/words/password        references a 'password'                                PasswordBasedCipher
                                                                                        to countless passwords
+LOW   secrets/private_key       References private keys                                privateKey
+MED   combo/stealer/browser     Uses HTTP, archives, and references multiple browsers  .config
                                                                                        Brave
                                                                                        Chrome
                                                                                        Firefox
                                                                                        POST
                                                                                        Safari
                                                                                        http
                                                                                        zip
                                                                                        …
+MED   data/embedded/base64/url  Contains base64 url                                    odHRwOi8v::$http
+MED   kernel/uname/get          get system identification                              process.platform
                                                                                        process.versions
+MED   net/http/form/upload      upload content via HTTP form                           "application/x-www-form-urlencoded
+MED   net/http/post             Able to submit content via HTTP POST                   HTTP
                                                                                        POST
                                                                                        http
+MED   net/url/encode            encodes URL, likely to pass GET variables              urlencode
+MED   net/url/request           requests resources via URL                             requests.get(e)
+MED   ref/ip                    hardcoded IP address                                   114.243.154.69
                                                                                        13.182.181.343
                                                                                        13.23.32.42
                                                                                        14.22.33.243
                                                                                        14.52.54.92
                                                                                        146.288.257.686
                                                                                        15.15.34.34
                                                                                        15.21.28.36
                                                                                        …
+MED   ref/path/relative         references and possibly executes relative path         ./aes
                                                                                        ./blowfish
                                                                                        ./cipher-core
                                                                                        ./core
                                                                                        ./evpkdf
                                                                                        ./format-hex
                                                                                        ./hmac
                                                                                        ./lib-typedarrays
                                                                                        …
+MED   ref/words/agent           references an 'agent'                                  useragent
+MED   secrets/keychain          May access the macOS keychain                          keychain
+HIGH  ref/site/unusual          unusual http hostname                                  https://api.mantlescan.xyz/
                                                                                        https://mantlescan.xyz/
                                                                                        https://openchain.xyz/
+CRIT  combo/stealer/wallet      makes HTTPS connections and references multiple        BraveWallet
                                 wallets                                                Coinbas
                                                                                        Ronin
                                                                                        http
----------------------------------------------------------------------------------------------------------------------------

However we fix this, we need to add a test as our diff code is really difficult to understand and fragile.

@tstromberg tstromberg changed the title diff mode broken: now considers two files/directories as add/delete rather than modify diff broken: considers two files or directories as delete+add rather than modify Nov 2, 2024
@tstromberg
Copy link
Collaborator Author

@egibs - any chance you can help with this? I'm confident you can fix this far better and faster than I can.

@tstromberg
Copy link
Collaborator Author

It looks like there is at least one example where diff get's things right:


m diff ../bincapz-samples/linux/clean/ls.x86_64 ../bincapz-samples/macOS/clean/ls                                                                            695ms  Sat Nov  2 10:45:37 2024
Changed: ../bincapz-samples/macOS/clean/ls [⚠️ MEDIUM → ✅ LOW]

+++ ADDED: 1 behavior(s) +++
---------------------------------------------------------------------------
RISK  KEY                    DESCRIPTION                    EVIDENCE
---------------------------------------------------------------------------
+LOW  fs/directory/traverse  traverse filesystem hierarchy  _fts_children
                                                            _fts_close
                                                            _fts_open
                                                            _fts_read
                                                            _fts_set
---------------------------------------------------------------------------

--- REMOVED: 3 behavior(s) ---
-------------------------------------------------------------------------------------------------------------------------------
RISK  KEY                           DESCRIPTION                          EVIDENCE
-------------------------------------------------------------------------------------------------------------------------------
-LOW  discover/system/hostname/get  get computer host name               gethostname
-LOW  net/url/embedded              contains embedded HTTPS URLs         https://gnu.org/licenses/gpl.html
                                                                         https://translationproject.org/team/
                                                                         https://wiki.xiph.org/MIME_Types_and_File_Extensions
                                                                         https://www.gnu.org/software/coreutils/
-MED  process/name/set              get or set the current process name  __progname
-------------------------------------------------------------------------------------------------------------------------------

@tstromberg
Copy link
Collaborator Author

Some weirdness: if I use a relative path, diff works:

% cd /tmp
% mal diff old new
├─ 🛑 Changed: new/lottie-player.min.js [MEDIUM → CRITICAL]
│     ▲ anti-static [NONE → MEDIUM]
++       🟡 obfuscation/generic/hex_conversion — converts hex data to ASCII: toString("hex");
│     ▲ command & control [NONE → MEDIUM]
++       🟡 addr/ip — hardcoded IP address:
++           114.243.154.69, 13.182.181.343, 13.23.32.42, 14.22.33.243, 14.52.54.92, 146.288.257.686, 15.15.34.34, 15.21.28.36, …

If I specify absolute paths, it reverts to the deleted+added bug:

% cd /tmp
% mal diff /tmp/old /tmp/new
├─ 🟡 Deleted: ../../private/tmp/old/lottie-player.min.js [MEDIUM]
│     ≡ data [LOW]
│       🟢 encoding/json_decode — Decodes JSON messages: JSON.parse
│       🟢 encoding/json_encode — encodes JSON: JSON.stringify
│     ≡ execution [MEDIUM]
│       🟢 plugin — references a 'plugin':
│           function installPlugin, getExpressionsPlugin, plugins, return expressionsPlugin, setExpressionsPlugin
│       🟡 remote_commands/code_eval — evaluate code dynamically using eval(): eval("
│     ≡ networking [MEDIUM]
│       🟡 download — download files: download_
│       🟢 url/embedded — contains embedded HTTPS URLs: https://www.jsdelivr.com/using-sri-with-dynamic-files
│       🟢 url/parse — Handles URL strings: new URL
│     ≡ operating-system [MEDIUM]
│       🟡 time/clock_sleep — uses setInterval to wait: setInterval(
│
├─ 🛑 Added: ../../private/tmp/new/lottie-player.min.js [CRITICAL]

@egibs
Copy link
Member

egibs commented Nov 3, 2024

Interesting. I'll look into this first thing tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants