Extract email addresses from a large JSON file

Some requests to API endpoints may generate a very large output in JSON format.

As an example, the crAPI vulnerable web app that I have been practicing with lately has an endpoint that lists the details of all recent posts in the community section of the app. These details happen to include the e-mail addresses of the registered users that have posted in the forum.

Now suppose you want to extract these e-mail addresses to test your password bruteforcing skills, without having to read through all the JSON syntax and potentially miss out some data.

Here’s a quick, dirty and very effective way to speed things up.

Export a JSON file

If you are testing the API in Postman and got a large JSON output as a response to one of the requests you sent, go to the Save Response menu at the top right of the response section and choose Save to a file.

You will obtain a file called response.json.

If you are using another tool to probe your target API, you can just cut and paste the JSON output into a text file.

Use a regex to search the file

Now cd into the directory where you saved the file and run:
cat response.json | grep -oe "[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+"

The regex we are using will allow the grep command to extract from the file all the strings in an e-mail address format.

Get a clean list

You will see in the output that some addresses may appear several times. To get a clean list without duplicates,  just pipe the output into a sort -u command:
cat response.json | grep -oe "[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+" | sort -u

Voila! You have a list of addresses ready for a bruteforce attempt.

This little trick is part of the APIsec University lesson on API authentication attacks. Check out the video here.

Also, you will notice in the video that Corey Ball is using this slightly different command:
grep -oe "[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+" response.json

This syntax works just as well. Enjoy!

 

Hi! I'm a tech journalist, getting my feet wet in ethical hacking. What you will find here is me taking notes on the tools and techniques I’m learning and offering answers to the questions I had when I first got started not so very long ago.

2 Comments

  1. Cairo
    October 23, 2022

    I’ll tell you why doesn’t this work
    grep -oe “[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+” cat response.json
    Cause the grep will see
    Cat
    response.json
    As two different files while u maybe don’t have cat as a afile and u grep more than one so that will lead to an error for sure
    try :

    grep -oe “[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+” < response.json

    That will accept the response.json file as stdin to the grep command and will work fine

    Reply
    1. Edward Lichtner
      October 23, 2022

      You’re totally right. Thanks for clarifying. I adjusted the article accordingly. 👍

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top