Skip to content Skip to sidebar Skip to footer

Get Raw Text From Html

Im on quite a basic level of android development. I would like to get text from a page such as 'http://www.google.com'. (The page i will be using will only have text, so no picture

Solution 1:

From the sample code you gave you are not even reading the response from the request. I would get the html with the following code

URLu=newURL("http://www.google.com");
URLConnectionconn= u.openConnection();
BufferedReaderin=newBufferedReader(
                        newInputStreamReader(
                            conn.getInputStream()));
StringBufferbuffer=newStringBuffer();
String inputLine;
while ((inputLine = in.readLine()) != null) 
    buffer.append(inputLine);
in.close();
System.out.println(buffer.toString());

From there you would need to pass the string into some kind of html parser if you want only the text. From what I've heard JTidy would is a good library for this however I have never used any Java html parsing libraries.

Solution 2:

You want to extract text from HTML file? You can make use of specialized tool such as the Jericho HTML parser library. I'm not sure if it can be used directly in Android app, it is quite big, but it is open source so you can make use of its code and take only what you need for your task.

Solution 3:

Here is one way:

public String scrape(String urlString)throws Exception {
   URLurl=newURL(urlString);
   URLConnectionconnection= url.openConnection();
   BufferedReaderreader=newBufferedReader(newInputStreamReader(
         connection.getInputStream()));
   Stringline=null, data = "";

   while ((line = reader.readLine()) != null) {
      data += line + "\n";
   }

   return data;
}

Here is another.

Post a Comment for "Get Raw Text From Html"