Friday, March 9, 2012

Question on IFilters

Hi,
Are the Ifilters COM/CORBA objects that I could call from my code? I'm trying to find out whether I could call them in my java code to extract text from various document formats such as PDF/MS Office etc before storing them to the database. The document
s that we are looking for full text search are on the average 100Mb in size and I'm looking at ways to cut down the size before storing them in the SQL server database.
Appreciate your reply,
Anantha
************************************************** ********************
Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
Anantha,
The best information source for this is the MSDN Platform SDK "Using Custom
Filters with Indexing Service" at:
http://msdn.microsoft.com/library/de...ufilt_912d.asp
Specifically, click on "Filter Samples" -> HtmlProp Sample:
http://msdn.microsoft.com/library/de...ufilt_0lwl.asp
This provides examples of how to "extract value-type properties. It converts
HTML meta properties to data types other than strings as specified by a
configuration file."
Hope that helps!
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Anantha Padmanabhan" <ananthapus@.hotmail.com> wrote in message
news:#l$5EyZ$EHA.3472@.TK2MSFTNGP14.phx.gbl...
> Hi,
> Are the Ifilters COM/CORBA objects that I could call from my code? I'm
trying to find out whether I could call them in my java code to extract text
from various document formats such as PDF/MS Office etc before storing them
to the database. The documents that we are looking for full text search are
on the average 100Mb in size and I'm looking at ways to cut down the size
before storing them in the SQL server database.
> Appreciate your reply,
> Anantha
> ************************************************** ********************
> Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP &
ASP.NET resources...
|||They are com objects, you can call them from code. Here is an example of how
to call them.
http://sqljunkies.com/HowTo/C4AC6E97...E63EC99B6.scuk
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Anantha Padmanabhan" <ananthapus@.hotmail.com> wrote in message
news:%23l$5EyZ$EHA.3472@.TK2MSFTNGP14.phx.gbl...
> Hi,
> Are the Ifilters COM/CORBA objects that I could call from my code? I'm
trying to find out whether I could call them in my java code to extract text
from various document formats such as PDF/MS Office etc before storing them
to the database. The documents that we are looking for full text search are
on the average 100Mb in size and I'm looking at ways to cut down the size
before storing them in the SQL server database.
> Appreciate your reply,
> Anantha
> ************************************************** ********************
> Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP &
ASP.NET resources...
|||Thanks John/Hilary for your reply and links.
I have another question though. If I store only txt documents in the SQL server (all other docuemnts are converted into txt documents before storing them in the database) does the indexing service still use the filter (in this case the standard filter) t
o extract textual data for indexing purposes?
Anantha
************************************************** ********************
Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
|||yes, it uses the default or null iFilter.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Anantha Padmanabhan" <ananthapus@.hotmail.com> wrote in message
news:%23rnRWam$EHA.208@.TK2MSFTNGP12.phx.gbl...
> Thanks John/Hilary for your reply and links.
> I have another question though. If I store only txt documents in the SQL
server (all other docuemnts are converted into txt documents before storing
them in the database) does the indexing service still use the filter (in
this case the standard filter) to extract textual data for indexing
purposes?
> Anantha
>
> ************************************************** ********************
> Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP &
ASP.NET resources...
|||You're welcome, Anantha,
Yes, it does. However, keep in mind how you import or insert the text can be
important as well as where you store the row text. You may want to consider
using TextCopy.exe that ships with SQL Server 2000.
Thanks,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Anantha Padmanabhan" <ananthapus@.hotmail.com> wrote in message
news:#rnRWam$EHA.208@.TK2MSFTNGP12.phx.gbl...
> Thanks John/Hilary for your reply and links.
> I have another question though. If I store only txt documents in the SQL
server (all other docuemnts are converted into txt documents before storing
them in the database) does the indexing service still use the filter (in
this case the standard filter) to extract textual data for indexing
purposes?
> Anantha
>
> ************************************************** ********************
> Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP &
ASP.NET resources...

No comments:

Post a Comment