Thursday, March 26, 2009

PPTX extractor code

Summary : The main idea behind our simple program is that the ( pptx ) extension is simply a zipped file ( you can try to unzip it ) In this zipped file there are a lot of XML files , So we need to find the files that contain the text Note : You can use SharpZipLibrary to unzip files in your project From http://icsharpcode.net/OpenSource/SharpZipLib I found the text is in %file%ppt/slides The XML tag that contain the text data is " a:t " so we need an XML reader to read this data and write it to our rich text box Then I added some dialoges to be a real program The Main code :

   1: try
   2:  {
   3:     //instance for fastzip library 
   4:     FastZip unzip = new FastZip();
   5:     //unzip to the temp folder in windows
   6:     string tmploc = Path.GetTempPath();
   7:     //we just need to unzip this folder NOT all files for slow computers
   8:     unzip.ExtractZip(openFileDialog1.FileName, tmploc, "ppt/slides");
   9:     //for loop to extract data from XML files 
  10:     //the ( Directory.GetFiles(tmploc + "ppt\\slides", "*.xml") ) is used to stop the loop
  11:     //after reaching the last XML File 
  12:     for (int i = 1; i <= Directory.GetFiles(tmploc + "ppt\\slides", "*.xml").Length; i++)
  13:     {
  14:         //creating a reader to read XML data from this location which change after every loop
  15:         //to get the next file name
  16:         XmlReader rdr = XmlReader.Create(tmploc + "ppt\\slides\\slide" + i + ".xml");
  17:         while (rdr.Read())
  18:         {
  19:             //specify that we need to read a node of type "element"
  20:             if (rdr.NodeType == XmlNodeType.Element)
  21:             {
  22:                 //if the reader reaches an element with the tag ( a:t )
  23:                 if (rdr.Name == "a:t")
  24:                 {
  25:                     //will read the element contents as string and add it to rich text box
  26:                     textdata.Text += rdr.ReadElementContentAsString() + "\n";
  27:                 }
  28:             }
  29:         }
  30:         //close the reader as the file location will change the next loop
  31:         rdr.Close();
  32:     }
  33: }
  34:     //catch any error and show a message to the user instead of terminating the program
  35: catch (Exception err) { MessageBox.Show(err.Message); }
You can download the rest of the code from HERE

0 التعليقات: