National Cyber Warfare Foundation (NCWF)

Learn how to use advanced Linux commands to parse pentest reconnaissance data.

Welcome back, aspiring cyberwarriors!

In this article we will take a practical tour of a small set of classic Unix text tools, such as awk, sed, uniq, sort, and a few shell oneliners. We will show how they fit into a realistic reconnaissance workflow. To make the examples concrete we will use LDAP-style output (the kind of verbose text you get from tools such as PowerView.py) and treat it as the noisy raw data you often face during a pentest. The goal is not to teach every option for every tool, but to give you patterns that turn messy text into useful intelligence. We will get clean host lists, operating system in use, reachable addresses and a first set of targets for follow-up scanning.

The material here covers only a small glimpse of the kind of practical exercises you’ll find in the Advanced Linux for Hackers training. Consider it a friendly invitation.

A Practical Sample: LDAP Output

A very useful sample to experiment on is LDAP query output. LDAP dumps are intentionally verbose. They contain many attributes for many objects. Below you will see snippets from a typical LDAP query. PowerView.py was used to obtain this kind of output. We will extract the parts we need and explain why those parts matter to an assessment.

LDAP output is valuable, as it often contains hostname attributes, operating system strings, and user or service account names. This single file can tell you where legacy systems live, which is useful for identifying exploit opportunities. You will also find domain controllers, file servers, or parts of critical infrastructure.

Early in a test you often ping or resolve these hostnames from a compromised domain machine to learn how names map to IP addresses in the target network. Awareness of the domain’s diversity helps you decide which payloads are likely to work and which will fail.

DNS – grep & awk & sed

To get through a large LDAP listing we usually start with grep to reduce the noise. Using grep with the -a and -i options is a small but practical improvement over plain grep. Here -a treats binary files as text (useful when file type detection is fuzzy) and -i makes the search case-insensitive so you don’t miss attributes that vary in capitalization.

For example:

bash$ > cat DComputers_raw.txt | grep -ai sAMAccountName

We chose sAMAccountName instead of Name specifically to show you how sed will be used later. Now the output is getting closer to what we want. Let’s remove the unnecessary text using awk. A tiny awk cheatsheet is useful to keep nearby:

awk '{print $1}' file         Show the 1st column.
awk '{print $1,$5}' file      Show the 1st and 5th columns.

A typical pipeline that extracts the field containing the hostname looks like this:

bash$ > cat DComputers_raw.txt | grep -ai sAMAccountName | awk -F ' ' '{print $3}'

using awk to remove the unnecessary text

Here, awk -F ‘ ‘ sets the field separator to a space, which can be replaced with something else. Then {print $3} tells awk to print the third whitespace-separated field from each matching line. The pipeline therefore finds lines with sAMAccountName and emits the third token from each of those lines.

After this step you may still have trailing characters such as a dollar sign ($) appended to computer names (machine accounts in Active Directory end in $). We can remove those trailing characters with sed. A small sed cheatsheet:

sed 's/FOO/BAR/g' file       Replace FOO with BAR.
sed 's/FOO//g' file          Replace FOO with nothing.
sed '/^FOO/d' file           Remove lines that start with FOO.

To remove a trailing dollar sign from the end of each line we use:

bash$ > cat list.txt | sed 's/\$$//g'

The sed expression s/\$$//g needs two dollar signs for a reason. In the regex portion \$$ the \$ matches a literal dollar character, while the final unescaped $ is the regex end-of-line anchor. Together \$$ matches a literal dollar that occurs at the end of the line. The replacement is empty, so the result strips a trailing $ while preserving other characters. We include the g flag for completeness, although this particular pattern has no extra effect because the pattern is anchored to the end of the line, but adding g is a common habit when doing substitutions.

When working with domain trusts, it is important to append the domain to the machine name. Otherwise, name resolution will fail, and even the domain controller will not be able to ping the computer. Appending the domain the machine belongs to resolves the issue. Here is how we do it:

bash$ > sed '/^$/d' computers.txt | sed 's/$/.domain.local/' | awk -F ' ' '{print $3}' > DomainComputers.txt

With grep, awk and sed chained like this you end up with clean computer names you can use in later steps.

Analytics – sort & uniq & awk

Another attribute commonly present in LDAP dumps is operatingSystem. Inventorying operating systems is one of the quickest ways to find potential attack vectors, as legacy OS versions often have known privilege escalation bugs, missing patches. The combination of sort and uniq is perfect to transform a long list of repeated OS strings into a compact summary.

Two useful uniq options to know:

uniq -u     Prints only the lines that appear exactly once
uniq -d     Prints only the lines that are duplicated

To produce a list of unique operating systems you might run:

bash$ > cat DComputers_raw.txt | grep -ai operatingsystem | awk -F ' ' '{print $3,$4,$5,$6,$7}' | sort | uniq -u

That pipeline finds lines that contain operatingSystem, prints fields three through seven (this is a pragmatic way to capture the multi-word OS description), sorts the results alphabetically, and uses uniq -u to show only OS descriptions that appear exactly once in the file. In practice sort is essential before uniq because uniq only removes adjacent duplicates, while sort groups identical lines so uniq can find them.

If you instead want to see common OS entries (the ones repeated across many hosts) use:

bash$ > cat DComputers_raw.txt | grep -ai operatingsystem | awk -F ' ' '{print $3,$4,$5,$6,$7}' | sort | uniq -d

using sort and uniq to sort the output and find common operation systems

This shows only OS lines that have at least one duplicate in the sorted output. It helps you quickly identify the major families (for example “Windows 10” or “Windows Server 2019”) that usually dominate the environment.

Counting occurrences is another step that turns it into statistics. You can count how many hosts have each OS with:

bash$ > cat DComputers_raw.txt | grep -ai operatingsystem | awk -F ' ' '{print $3,$4,$5,$6,$7}' | sort | uniq -dc | sort -n

using sort and uniq to get statistic on operation system use

This pipeline does three things of note: uniq -dc produces a count (-c) and only prints lines that are duplicated (-d) along with their occurrence number. The first sort groups identical lines so duplicates are adjacent; the final sort -n sorts the counted output numerically (smallest counts first). That helps you prioritize effort. If 20 machines use one vulnerable OS, that’s a different problem than if only one or two machines do.

Scan – loops & awk & sed

Once you have a cleaned list of hostnames, a natural next step is to resolve them to IP addresses and check basic reachability. Hostnames are useful for understanding naming conventions and the network layout, but sometimes DNS resolution behaves differently depending on where you are in the network or whether you are proxying traffic. Resolving and then pinging hosts from a domain-joined machine gives a first approximation of which systems respond and which are offline, hidden, or only objects in AD with no active DNS record.

A couple of useful oneliners to keep in mind are the following:

for i in $(seq 1 254); do echo 192.168.1.$i; done
for url in $(cat list.txt); do host $url; done

To iterate over hostnames and ping each one, you can use a loop like the one below. On the first run, you may see DNS resolution errors in the raw ping output. This is expected, since AD DNS names are not guaranteed to be resolvable from every vantage point. Redirecting stderr to /dev/null suppresses those error messages, making the loop output much cleaner.

bash$ > for hostname in $(cat pingme.txt); do ping -c 1 $hostname 2>/dev/null | grep ttl; done

This loop reads each hostname from pingme.txt, sends a single ICMP echo (-c 1), discards error output with 2>/dev/null and keeps only successful responses by grepping for ttl (most ping implementations include ttl= in successful replies). If a hostname fails to resolve, you will see no ttl output for it.

You can also extract only the IP addresses from successful ping replies and clean the formatting with awk and sed. The example below prints only the IP address from ping output:

bash$ > for hostname in $(cat pingme.txt); do ping -c 1 $hostname 2>/dev/null | grep ttl | awk '{print $3}' | sed 's/[():]//g'; done

using grep awk and sed to extract ip addresses from ping that are alive

See what happens here: ping -c 1 sends one probe, then grep ttl filters to the reply line(s) that contain ttl=, then awk ‘{print $3}‘ prints the third whitespace field which, in most ping formats, contains an IP wrapped in parentheses (192.168.1.5). After that, sed ‘s/[():]//g’ strips (, ) and : characters so the final output is a plain dotted IP like 192.168.1.5. Different ping implementations vary slightly, so you should test the exact awk field number on your system.

Those IPs are now a compact target list you can feed into a port scanner. Netcat (nc) is a simple, scriptable option that is often quieter than full port scanners and can be chained into proxy tools if you need to reach hosts through a jump or SOCKS proxy. An example stealthy check for SMB (port 445) using a proxy looks like this:

bash$ > for ip in $(cat ips.txt); do proxychains4 -q nc -zv $ip 445; done

creating a simple bash loop to scan 445 port with proxychains4 thorugh a proxy

In this pipeline proxychains4 wraps the nc call so TCP connections go through the configured proxy, -z tells nc to use a zero-IO scan (just test connectability), and -v prints connection results. We used -q with proxychains to reduce its own output verbosity. This scan is intentionally basic and the point here is to show how to combine the simple building blocks. You can of course replace nc with more advanced tools, or add retries, timeouts and parallelism to scale the checks.

Summary

We used grep to find attributes, awk to extract fields, sed to clean strings, sort and uniq to summarize, and simple shell loops to resolve and probe hosts. These patterns are not flashy, but they are composable and often faster to iterate with than heavier GUIs or larger tools when you are in the middle of a pentest. What you have read here is only a glimpse. The Advanced Linux for Hackers training expands these ideas and is ideal for those looking to take Linux to the next level in cybersecurity.

If you enjoyed this article, consider joining the full training for Subscriber Pro between February 17-19.

Source: HackersArise
Source Link: https://hackers-arise.com/linux-using-advanced-linux-commands-in-recon/

Linux: Using Advanced Linux Commands in Recon

A Practical Sample: LDAP Output

DNS – grep & awk & sed

Analytics – sort & uniq & awk

Scan – loops & awk & sed

Summary

Comments