Items 0

CSV Match & Merge

 

‘Cross check two lists from two different CSV files’

 

The challenge we are solving:

If you have two long tables in separate CSV files, and you would like to cross check the values in one column against the values in another column in a different CSV file, what would you do? As this might take you very long time if you want to do it manually, and of course, depending on the length of the columns/lists.

Overview:

This tool will merge the data of two selected CSV files, and generate an output file also in CSV, after performing the required matching and merging criteria specified by the user.

The primary usage of this tool is to cross check two columns (lists) from two different CSV files, and find either an exact match, or a value in one column (list) in one file containing a value from the other column in the other file.   

The user will specify the matching criteria, which is either “Equals” or “Contained in or Equals”, along with specifying only one column in each file to perform the condition at, after specifying if the selected CSV files have headers for the tables or not.

The user will also specify the columns he/she would like to include in the output table, from “4. Select output” explained later.

The user has the option to use saved configuration parameters from an XML file instead of specifying the parameters/options manually very time from the GUI.

Below are few examples, scenarios, and a use case to further explain the features and advantages of this tool.

Example 1:

In this example we have 3 different CSV files. We will be selecting 2 out of these 3 CSV files and perform different matching and merging scenarios. 

Each file contains the following data shown in a tabular format:

File_1.csv:

Field.X

Field.Y

Field.Z

ID

ABC.com

USA

 

1

XYZ.net

France

45

2

ABC.com

China

34

3

txt.com

UAE

5

4

Ola.net

Russian

 

5

3XYZ.net

Egypt

45

6

Hxxp://ht.hhh.com/whatever

India

 

7

When opened with Notepad,

you will be able to tell the “Delimiter” type, which is comma “,” in this case.   

 

File_2.csv:

Field.1

Field.2

Field.3

ID

x.ABC.com

22

432181

1

tdsast123.XYZ.net

4

 

2

safas.txt.com

10

33442

3

xxx.Ola.com

23

 

4

bla bla

 

31112

5

y.ABC.com

 

9191

6

ht.hhh.com

 

 

7

When opened with Notepad:

 

File_3.csv:

Hxxp://ht.hhh.com/something?p=hxxp://eu.disk.com

XYZ.net

Txt.com

Note that this file does not have a Header like the two previous files.


Scenario 1:

  1. The user will select File_1.csv and File_2.csv shown above.
  2. File_1.csv has Header = Yes (selected by the user)
  3. File_2.csv has Header = Yes (selected by the user)
  4. Match criteria specified by user is:  { Field.X in File_1 ‘Contained in or Equals’  Field.1 in File_2 }
  5. Columns to be included in the output file as per “4. Selected output” = All

The output for this scenario will be a CSV file containing the following data, after clicking on “6. Merge CSV files”:

The matching criteria is not case sensitive, such that (XYZ = xyz).

Note:

Notice that the “Hxxp://ht.hhh.com/whatever” in Field.X in File_2, was considered by the tool as “ht.hhh.com”, which is the text between “Hxxp://” and “/”.  This is created by design, as the tool was initially created for DNS Domain Name matching between a list of malicious DNS domains against data from “dns.log”. This is the only exception.

 

Scenario 2:

  1.  The user will select “File_1.csv” and “File_3.csv”
  2.  File_1.csv has Header = Yes
  3.  File_3.csv has Header = No
  4.  Match criteria specified by user is:  {Column1 in File 3 ‘Contains or equal’ Field.X in File 1}
  5.  Columns to be included in the output file: Column1 from File_3, Filed.Y from File_1, Filed.Z from File_1

Note: Since “File_3.csv” didn’t contain a header with specified column names; the tool has created a column name and called it Column1. This is to provide the user with the ability to specify which columns he/she wants to apply the match condition on, despite not having headers, and which columns to be included in the output file.

This means that ‘File_3.csv’ would be processed as if it looks like:

Cloumn1

ht.hhh.com

XYZ.net

Txt.com

This table won’t be displayed to the user.

Note:

Notice that the “Hxxp://ht.hhh.com/something?p=hxxp://eu.disk.com”  in ‘File_3.csv’, was considered by the tool as “ht.hhh.com”, which is the text between “Hxxp://” and “/”.  This is created by design, as the tool was initially created for DNS Domain Name matching between a list of malicious DNS domains against data from “dns.log”. This is the only exception. 

 

Output for this scenario will be:

The matching criteria is not case sensitive, such that (XYZ = xyz).

How to use this tool:

Once you open the tool, you will need to:

1)     Specify the “Delimiter” type for each file you need to select.  One way of knowing the type of the “Delimiter”, is to open the file with a text editor (like Notepad) and see the type of the separator between the data, such as  means that the delimiter is the “Comma” –i.e. “,”.

 

2)     Specify if the files, which you will select contain headers or not. Once done, select the two CSV files by clicking on the “Open File X” button 

 

3)     If the selected CSV file does not contain a header (as specified by user), then the tool will create a sequential column names as a header, such that the user will refer to when using the tool. The referenced column names will be: Column1, Column2, Column3, … ColumnX as shown in this example when the user is specifying the match criteria: 

 

4)     Then specify the columns you need to include in the output file from “Select output”  and click “Apply”. You can leave this option to the default settings, which will include ALL the columns in both files. 

 

5)     Click on “Save to file” to specify the output CSV file name and location. 

 

6)     Finally, click on “Merge CSV files” for the tool to start merging the two files into 1 table and save it in a new CSV output file.

Notes:

1)     The processing time required by the tool highly depends on the number of rows existing in the selected two columns as part of the “match criteria”.  For example, if you have the “Match Criteria” specified by the user as:  {Column1 in File_A ‘Contained in or EqualsField.X in File_B }, while Column1  has 1024 rows, and Field.X has 2500 rows, then the total comparisons processed by the tool will be (1024 x 2500) 2,560,000 comparisons. This will take several minutes depending on the machine resource performance and availability.

2)     The code will only take the value between “Hxxp:// “ and “/”, as shown in the examples. So “Hxxp://ht.hhh.com/something” will become “ht.hhh.com” regardless if it is in File1 or File2.

3)     The tool is not case sensitive.

 

Using the XML configuration file:

The user can save all the settings in an XML configuration file, such that he/she can specify the configuration settings and parameters by simply selecting the XML file, without the need to specify almost every parameter manually from the GUI. Yet the user will be able to modify any parameter after selecting the XML configuration file, such as file name(s), match criteria…. etc. To load the XML configuration file, click on “Load XML”  

XML Example 1:

        <settings>

        <files>

                      <FilePath1 value=.ile_1.csv/>

                      <FilePath2 value=.ile_2.csv/>

                      <ResultFilePath value=.XML_output_1_2.csv/>

             </files>

             <headers>

                      <Header1 exists=True/>

                      <Header2 exists=True/>

             </headers>

             <Matching value=Contained_In_Or_Equal>

                      <Column1 value=field.x/>

                      <Column2 value=field.1/>

             </Matching>

             <OutputFile1Properties>

                           ALL

             </OutputFile1Properties>

                  <OutputFile2Properties>

                           ALL

                  </OutputFile2Properties>

                  <Delimiters>

                           <Delimiter1 value=,/>

                           <Delimiter2 value=, />

                  </Delimiters>

    </settings>

The above XML configuration will give the same exact result as the first example given in this document.

 

XML Example 2:

    <settings>

        <files>

                      <FilePath1 value=C:merge_csv_examplesile_1.csv/>

                      <FilePath2 value=C:merge_csv_examplesile_3.csv/>

                      <ResultFilePath value=C:merge_csv_examplesXML_2_results.csv/>

             </files>

             <headers>

                      <Header1 exists=True/>

                      <Header2 exists=False/>

             </headers>

             <Matching value=Equal>

                      <Column1 value=Field.X/>

                      <Column2 value=Column1/>

             </Matching>

             <OutputFile1Properties>

                           Field.Y, Field.Z

             </OutputFile1Properties>

                  <OutputFile2Properties>

                           Column1

                  </OutputFile2Properties>

                  <Delimiters>

                           <Delimiter1 value=,/>

                           <Delimiter2 value=, />

                  </Delimiters>

    </settings>

 

The above XML configuration will give the same exact result as the 2nd example given in this document.

    

Pre-requisites and Supported Platforms:

1-     The application supports 32-bit and 64-bit Windows platforms.

2-     MS .NET Framework 4.0 is required, which can be downloaded from https://www.microsoft.com/en-us/download/details.aspx?id=17851

3-     Windows Server 2008 R2 and above, or previous version with Powershell 4.0 installed and enabled.

For Windows Server 2003 SP2, you can install Powershell from the following links:
x86: http://www.microsoft.com/downloads/details.aspx?FamilyId=f002462b-c8f2-417a-92a3-287f5f81407e
x64: http://www.microsoft.com/downloads/details.aspx?FamilyId=909bbcf1-bd78-4e03-8c83-69434717e551

Online Tool:

This function is available at http://www.networkstr.com/dns-domain-matching-noiframe.php?certificate=11

Note: The online tool supports CSV files smaller than 2 MB in total, yet you can see the output displayed online, such that you can either download the output CSV file or you can copy it directly from the page, which means that you can reuse it in a HTML format.

The online tool works as follows:

1-     The user will upload 2 files, which contain data in a tabular format. The files can be in CSV, TXT, or Excel.

2-     The user will specify if the files have Headers or not

3-     The user will select only one column name in the 1st file and another column name from the 2nd file. Then the user will specify the matching condition.

4-     The code will then create a 3rd table/CSV file, where columns of both selected files will be in the same output table, based on the matching criteria.

 

Use Case: Identify Malicious Activities By Matching Bad Domains With DNS Log Data

Recommend to read: http://www.networkstr.com/dnscentric/identify_threats_with_dns_logging

In this use case, we will use this tool to perform domain name matching between two different CSV files, ‘dns.log’ (after converting it to CSV) and a list of malicious domains in CSV, then merge the two.

This use case will be very useful to identify what users are trying to communicate with (example: malicious/undesired domains), by cross matching a list of domains with your DNS log data, after converting the ‘dns.log’ to CSV.

The following is an example on how to identify the machines that are trying to communicate with malicious and suspicion sites, or infected by malwares:

1-     Enable DNS Debug Logging on your Windows DNS Server. More at http://www.networkstr.com/dnscentric/enabling_dns_debug_logging

2-     Convert the ‘dns.log’ file generated by your Windows DNS server to CSV. Get the tool from http://www.networkstr.com/dnscentric/dns_log_converter. Example

Note: In order to better manage your ‘dns.log’ files, use the “Auto Archive DNS Debug Logs” tool available at http://www.networkstr.com/dnscentric/auto_archive_dns_log_files

3-     Get a list of the latest malicious domains in CSV. Such as:

a.     http://www.malwaredomainlist.com/ or directly from http://www.malwaredomainlist.com/updatescsv.php  

b.     https://isc.sans.edu/suspicious_domains.html

c.     https://zeltser.com/malicious-ip-blocklists/

d.     …. Many other sites that list the malicious domains are available online

e.     Or you can have your own list of domains that you need to identify the source of the DNS queries for.

Example: 

4-     Once you have the DNS log data converted to CSV, for example “dns.csv”, and you have the list of malicious/monitored domains, use the “Merge_CSV” tool to cross check the list with your DNS Log data –i.e ‘dns.log’. The result will be saved in CSV. The output will then tell you which machines communicated with malicious domains for example.

 

Example:

Result of this example:

 

5-     For even a further analysis, if needed, you can convert the output CSV (created by the Merge_CSV tool) into a DNS Debugging Log format, such as “dns_malware_domains.log’ for example, and import it to Windows DNS Log Analyser. The Windows DNS Log Analyser is a free tool, which can help you in examining Microsoft Windows DNS log files in a graphical presentation with useful statistics.

6-    Last but not least, you can block the malicious domains queried by DNS clients at your DNS Server. More at: http://www.networkstr.com/domain-dns-blacklisting-sinkhole/tool

 

The free trial is available till 15/Sep/2017.

If you have logged in, you can download the User Guide (Solution Note) below to learn more about what you can do with both editions and how to use the tools. If you are not logged in, you won't be able to see the below User Guides.

Please Login to your account or Register for a new account in order to access the Downloads section.

Comments
Comment