Generative AI Trends

All the latest news and trends
Prompting ChatGPT to write MongoDB aggregation pipeline

shared by Naveen Kohli on Saturday, July 8, 2023 •

Few days ago I had to develop some threat analysis for our security monitoring service. We collect lot of data related to threat incidents in real time. The data is stored in MongoDB. I needed to write code to develop data sets to prepare some views that will provide answers to following statements.

  • What is percentage of incidents from each geolocation?
  • What are monthly trends of the threat incidents?
  • What is frequency of threats during each hour of the day?

These are very important metrics for a person who has to make some critical decisions about security controls for an organization. I decided to use ChatGTP to write this code for me. In my earlier article What is Generative AI, I talked about importance of good prompting to get good answers. So I decided to try this out on a task related to writing some code to extract data from MongoDB. Following is the conversation I had with ChatGPT.

how can I group the database records by month over a period of multiple years

As you can see, this is not a good prompt. I did not provide context regarding use of MongoDB. I got the following code ChatGPT.

SELECT EXTRACT(MONTH FROM date_column) AS month,
       EXTRACT(YEAR FROM date_column) AS year,
       COUNT(*) AS record_count
FROM your_table
GROUP BY EXTRACT(MONTH FROM date_column), EXTRACT(YEAR FROM date_column)
ORDER BY year, month;

This is not what I had in mind. I can't blame ChatGPT because I did not provide it clear context and what I had in mind. Now, I changed the prompt.

write this query for mongodb database

In the new prompt, I told it about that I am using MongoDB. Notice that I did not repeat my previous question where I told it about data grouping and aggregation. ChatGPT will carry on the conversation with you from where it left of. It knows that I am asking new question in the context of type of data extraction I am asking it to do. This time I got the result I was looking for.

db.yourCollection.aggregate([
  {
    $group: {
      _id: {
        month: { $month: "$date_column" },
        year: { $year: "$date_column" }
      },
      record_count: { $sum: 1 }
    }
  },
  {
    $sort: {
      "_id.year": 1,
      "_id.month": 1
    }
  }
]);

So far so good. I got the information about how MongoDB query will look like. I also got the explanation of the query as well. That was very helpful. I still was not close to what I had in mind. Before asking for the type of data analysis I wanted to perform, I decided to prompt ChatGPT to show me how will I use this query in my .Net code.

write this for .Net driver

Now I am carrying on the conversation in the same context from where I started. ChatGPT knows that all I am asking it to do is to write the previous data query for the application that is using .Net driver for MongoDB.

using MongoDB.Bson;
using MongoDB.Driver;

var client = new MongoClient("mongodb://localhost:27017");
var database = client.GetDatabase("yourDatabase");
var collection = database.GetCollection("yourCollection");

var pipeline = new BsonDocument[]
{
    new BsonDocument("$group", new BsonDocument
    {
        { "_id", new BsonDocument
            {
                { "month", new BsonDocument("$month", "$date_column") },
                { "year", new BsonDocument("$year", "$date_column") }
            }
        },
        { "record_count", new BsonDocument("$sum", 1) }
    }),
    new BsonDocument("$sort", new BsonDocument
    {
        { "_id.year", 1 },
        { "_id.month", 1 }
    })
};

var result = collection.Aggregate(pipeline).ToList();

I am getting closer and closer to what I am planning to do. I have some prototype code that seems reasonable. Before I proceed any further, I plugged the code into my .Net service. Code compiled and generated output. It is not the output that I needed to answer my original questions.

Now, I needed to prompt ChatGPT about the exact data extraction that I want to do. Following is the conversation I had with ChatGPT.

I need to group the record for each hour in a day of month and year
I need to aggregate this data for multiple years
I am getting error Duplicate element name
When calling GetValue for BsonDocument, inner property is returning null

Above conversation happened in steps. As you can see that I had to tell ChatGPT that the suggestions it provided are not working. I told it about the exact error I got. After knowing about the error, ChatGPT corrected the code. Eventually, I got to the point where I had some prototype code from ChatGPT that I could use as starting point and build the final solution.

You may start thinking that why it requires so many prompts to get an answer. My intent behind this was to show how one can have conversation with Generative AI engine. How you can refine your prompts if the answers are not relevant. If you are getting some errors, do not give up by thinking that Generative AI is not going to be able to help. Tell it about the exact problem you have run into. Give a chance to Generative AI to find a solution to fix the error.

I used the prototype to understand how MongoDB pipeline works. Wrote the code for my repository and built the final solution for security threat analysis.

I will conclude with some words of caution.

  • Do not copy the code as is from Generative AI in your production software. This code could have come from a repository that has terms of use that does not allow you to use it.
  • The code may not meet coding standards of your organization.
  • The code may not meet security standards of your organization.
  • This code may generate run time errors under certain scenarios.

Tags: generative ai chatgpt

Share this post
Recent Posts
July9
Limitations and risks associated with ChatGPT

Shared by Naveen Kohli

Time: 8:34 AM

July8
July8
July7
What is text to image Generative AI

Shared by Naveen Kohli

Time: 10:41 AM

Popular AI Tools