Note : All Programs on this site Need .Net Framework 2
Many times we need to extract all text from power point files and add some basic decoration to it . unfortunately the power point just support exporting the outlines only as .rtf NOT all text In other situations we need to extract media files from the presentation , to add some modifications or reuse them I made this simple pptx file extractor (pptx 2 text) ,which extracts text and ( media if exists ! ) from power point 2007 files to *.rtf( rich text format ) text file and allow you to add some basic decoration to the extracted text such as changing the fonts and colors . I provide the explained source code , So you can easily edit it to fit your needs For visual basic users , You can use this converter http://www.developerfusion.com/tools/convert/csharp-to-vb/ or you can write your wishes and I'll try to edit it for you This is version 1 , may be there some other versions and may be not . So Keep track with the blog to know the latest news and programs Note : in case of tables the extracted text will be as this photo The code is availabe under GNU/GPL license so you can modify it and improve it or even use it in your programs freely ,but your code will be also under GNU/GPL and you should provide it free too . Download Here Note : when you going to download an alert will appear and say this file extension may harm your computer , but that is normal as it is .exe file Wait version 2 soon with alot of options ... please comment me for any bugs or new ideasThursday, March 26, 2009
PPTX extractor code
Summary :
The main idea behind our simple program is that the ( pptx ) extension is simply a zipped file ( you can try to unzip it )
In this zipped file there are a lot of XML files , So we need to find the files that contain the text
Note : You can use SharpZipLibrary to unzip files in your project
From http://icsharpcode.net/OpenSource/SharpZipLib
I found the text is in %file%ppt/slides
The XML tag that contain the text data is " a:t " so we need an XML reader to read this data and write it to our rich text box
Then I added some dialoges to be a real program
The Main code :
1: try
2: {
3: //instance for fastzip library
4: FastZip unzip = new FastZip();
5: //unzip to the temp folder in windows
6: string tmploc = Path.GetTempPath();
7: //we just need to unzip this folder NOT all files for slow computers
8: unzip.ExtractZip(openFileDialog1.FileName, tmploc, "ppt/slides");
9: //for loop to extract data from XML files
10: //the ( Directory.GetFiles(tmploc + "ppt\\slides", "*.xml") ) is used to stop the loop
11: //after reaching the last XML File
12: for (int i = 1; i <= Directory.GetFiles(tmploc + "ppt\\slides", "*.xml").Length; i++)
13: {
14: //creating a reader to read XML data from this location which change after every loop
15: //to get the next file name
16: XmlReader rdr = XmlReader.Create(tmploc + "ppt\\slides\\slide" + i + ".xml");
17: while (rdr.Read())
18: {
19: //specify that we need to read a node of type "element"
20: if (rdr.NodeType == XmlNodeType.Element)
21: {
22: //if the reader reaches an element with the tag ( a:t )
23: if (rdr.Name == "a:t")
24: {
25: //will read the element contents as string and add it to rich text box
26: textdata.Text += rdr.ReadElementContentAsString() + "\n";
27: }
28: }
29: }
30: //close the reader as the file location will change the next loop
31: rdr.Close();
32: }
33: }
34: //catch any error and show a message to the user instead of terminating the program
35: catch (Exception err) { MessageBox.Show(err.Message); }
التسميات: Simple Programs