Anime Blog Shop Talk

Automating Blog Tasks – Blog Shop Talk

Automating Blog Tasks – Bring All Your Skills to the Table

Every week, I publish my top five favorite posts from other bloggers. It’s a feature I call Other Posts to Crow About. My goal is to draw attention to posts my readers might like, but from blogs other than mine. I wrote a post that talks about how I format the posts, as well as the approach I take to marketing them on on social media. Now, I’d like to talk about how I get through all those sites. It has to do with automating blog tasks — in this case, going through my list of sites.

To write that post, I check all of the blogs in the Massive List of Anime Sites. Currently, it contains around 360 sites. I allocate about five hours every Friday night to through all of them. If you’re one of those folks who are gifted in math, you might have concluded that works out to about 72 sites an hour or about 1.2 a minute. You might be mildly curious how I do that. At least I hope you are, or this post isn’t going to keep your attention!

So if that’s not at all interesting to you, please feel free to bail!

Automating Blog Tasks: IntelliJ and Java prove their usefulness again and again.

Someday, I’ll have to give up programming. Until that day comes, I still enjoy an occasional foray into Java and IntelliJ.

The real point of the post isn’t the solution I came up with. It’s the idea that you have skills beyond writing. Maybe you draw or write programs. Maybe you host sock-puppet plays. I mention the last because I would love to see someone perform anime reviews with sock puppets.

Be that as it may, I used to write computer programs for a living. So, that’s the skill I brought to the blogging table! I hope that it’ll help you think about how you might approach some of the issues you’re facing.

360 Sites in 300 Minutes

When the Massive List of Anime Sites contained around 200 sites, getting through all of them in an evening was easy peasy. After I topped 300, though, I noticed that some of my Friday night sessions had started to expand into Saturday morning sessions. Since I’m trying to write a novel on weekends (plus Tuesdays!), I had to get a handle on that.

Now, I could have said, “Screw it! Random sampling wins the day!” If I check each site or randomly sample the sites, who would know? Well, I would know, and it would show in my writing. I’m stupidly honest enough to say something like, “I totally did not just sample the blog sites this week!” And then you’d trust me even less than now.

So I thought about the most time-consuming portion of the process. Or, more precisely, the most time-consuming portion that didn’t contribute directly to me reading the posts. I could save a ton of time not reading anything, but talk about defeating the purpose! As you might have guessed, the most time-consuming part of the process is going to the Massive List of Anime Sites, finding the next site to click on, clicking on it, moving to that tab, then reading the site. Then, I have to close the tab and find the next site. Rinse and repeat.

After the third hour, I was surprised how often I closed more than one tab. Or lost my place. So I started thinking about how I’d fix that.

Automating Blog Tasks: Sharing Ideas Helps

Do you remember when Irina from I Drink and Watch Anime used to put together weekly roundups of what blogs were reviewing what shows — like this post, Anime Winter 2019 – Week 9 Roundup!? Creating each of those posts consumed an enormous amount of time. So, trying to be helpful, I came up with a prototype batch or script file to automate opening the sites to see where the most recent review.

Automating Blog Tasks: 50 tabs goes by faster than you think!

50 tabs looks like a lot, but you’d be surprised how fast you can get through them when you don’t have to open them one at a time!

The idea wasn’t bad, but it had a serious, fatal flaw: it had to be manually maintained. To be useful, you’d have to enter all of the sites into the code. After an initial investment, it might save a little time, but that initial investment really sucked. So I’m afraid it wasn’t useful. But I kept the idea in the back of my mind, because I thought it should be useful, if I could solve that issue.

Then it hit me: I already maintain the Massive List of Sites. I could capture the HTML code behind the Massive List of Anime Sites, feed it into a Java program, and create script files to open all 360+ sites for me!

Enter the Not Entirely Terrible Java Application

The workflow looks like this:

  1. I save the HTML source for Massive List of Anime Sites to a temporary directory on my iMac
    1. I edit that HTML so only the code for the unordered list of the active anime blogs remains
    1. Basically, I leave the <ul> tag and everything until the closing tag of </ul>
  2. I run my Java program
    1. The program looks at each entry in the unordered list, which are enclosed within list item tags <li></li>
    2. The program knows to carve the 360+ sites into manageable chunks of 50; that means a script for every 50 sites
    3. For each site, the program creates a script line to open the anime site in a separate browser tab
  3. The program leaves the individual scripts in a temporary directory, and I open a Terminal prompt to get to those scripts
  4. As I go through all 50 tabs looking for posts I like, I note my favorites in a text editor
  5. I keep running the next script until I run out of them

If you’re interested in the code, I’ve reproduced it below.

Astonishingly, It Works

The net effect is that I can go through all of the anime blog sites 50 at a time. I’ve found 50 is about the maximum before my iMac starts complaining. My iMac has 16Gb of RAM, which is usually enough. If I had less, I could adjust the code to create batches of 25 or something.

Automating Blog Tasks: The number of iterations controls how many tabs an individual script will open.

The intCount % 50 (mod) is what controls how many tabs each script will open. You could adjust it to 25 or 30 or whatever.

The code open tabs in Chrome; it should work as well for FireFox or Safari by changing the name of the brwoser. Here’s what the script looks like:

It’s a really simple approach, but it save me buckets of time.

What I like about this solution is that for the last several weeks, I’ve been able to get through all the sites in between 4 and 5 hours. Yes, I’m spending an average of 1.2 minutes per site, but more often that not, that’s enough to get a sense of what’s new. In fact, a large percentage of sites don’t publish even weekly. That means I get to that tab, see there’s nothing new, and close it.

The key idea isn’t the code. It’s the concept that there’s no telling how your skills other than writing might help you or your readers. I have a tendency to think in terms of boxes. This is my work box, so I code. This is my blog box, so I write. I thought a reminder that it’s okay to combine boxes might be helful!

Do you have some non-writing skills you use for your blog? If so, I’d love to hear about them in the comments!

The Java Code

If you’re interested, here’s the code. I didn’t use any libraries other than what’s in Java 1.8, so it should be easy to play with:

package com.interstell.table2script;

import java.io.File;
import java.io.FileWriter;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class jobRunner
{

public void runJob(String strIputFile, String strOutputDirectory) throws Exception
{
File filInput = new File(strIputFile);
if (!filInput.exists())
{
throw new Exception("Input file " + strIputFile + " does not exist");
}

File filOutDirectory = new File(strOutputDirectory);
if (!filOutDirectory.isDirectory())
{
throw new Exception("The directory " + strOutputDirectory + " does not exist.");
}

// See https://www.w3schools.com/java/java_files_read.asp
Scanner myReader = new Scanner(filInput);
List<String> list = new ArrayList<String>();
while (myReader.hasNextLine())
{
String strLine = myReader.nextLine();
String[] arrLines = strLine.split("<li");
for (int x = 0; x < arrLines.length; x++)
{
if (arrLines[x].indexOf("href") > 0)
{
//System.out.println(arrLines[x]);

Pattern p = Pattern.compile("href=\"(.*?)\"", Pattern.DOTALL);
Matcher m = p.matcher(arrLines[x]);
while (m.find())
{
//System.out.println(m.group(1));
list.add(m.group(1));
}

}
}
}
myReader.close();

int intCount = 1;

FileWriter myWriter = new FileWriter(strOutputDirectory + "optca-" + Integer.toString(intCount) + ".sh");


for (int x = 0; x < list.size(); x++ )
{
String strLine = list.get(x);
myWriter.write("open -a \"Google Chrome\" \"" + strLine + "\"\n");
intCount++;
if (intCount % 50 == 0)
{
myWriter.close();
myWriter = new FileWriter(strOutputDirectory + "optca-" + Integer.toString(intCount) + ".sh");
}
}

myWriter.close();

}

}

16 thoughts on “Automating Blog Tasks – Blog Shop Talk

  1. Very cool! Sounds like a nice little project you could show off on a Github profile πŸ˜…

    Ah, this takes me back to my high school days when I would write Java code much like this for assignments – from txt files, not SQL databases as is the case today. *sighs in nostalgia* Being a Node.js guy I see a possibly more easier way to do this using modules, but I won’t fault ya on that 😁

    I do have some questions though; why not use a web scraping tool instead of having to manually download the list and saving it on a file? Seems a bit tenuous if you follow a new site and have to redownload the HTML source.

    1. I’ve worked a bit with Node.js. I like it! But maybe I’m just old fashioned, but I prefer more strongly-typed languages.

      Or maybe I’m just set in my ways.

      Web scraping feels inelegant to me. I’d rather open up my lists’s HTML and just grab everything between two tags. Takes every bit of 20 seconds. Feels more surgical to me.

      The source is really easy — just show source.

      Now, what I’d really prefer is to have my code query the WordPress database directly! But I didn’t want to open the MySQL ports to the Internet, and I didn’t want to mess with restoring the database locally.

      I already dump it every month and copy the result locally for backup, even though I have the JetPack edition that does automatically. I like to occasionally test new plug ins, and I prefer to do that in an environment that doesn’t endanger my “production” site!

      1. I see. Of course view source is easy to do, but I just personally think automating it would save some time in that regard.

        Javascript does have its own strongly-typed framework too if that’s your kind of thing – it’s called Typescript, and it pretty much does with Java or C# handles w.r.t data types, if you’re interested. Though I’ll be honest, Node.js has made me a bit lazy to consider types and all 😐

        Yeah I’d refrain from tinkering with the production DB at all 😡 Especially with WordPress, from what I know it can be a bit weird to navigate. You’re right, opening up ANY DB server is painful to do, and more than required especially for something simple like this.

  2. “That means I get to that tab, see there’s nothing new, and close it.”

    or you see there is a new post but its in a language you don’t speak πŸ™ˆ

    its always impressive to me thinking about you going through all these sites every week, makes sense you would implement ways to automate it

    someone mentioned python, and its kinda what i think of when seeing things being “automated”. im actually learning it this semester and will be including it in my final uni project, a web app for learning programming languages.

    java we did the last semester

    1. “or you see there is a new post but its in a language you don’t speak ”

      I’m tracking a few Spanish-language sites. I actually feel bad for not tracking more! Chrome’s built-in translation does a solid job. I guess I’m not confident enough in the translation to properly represent those posts. And I have no idea if customs about sharing vary in those countries. I really want this to be a positive experience, and I’d hate to accidentally insult someone!

  3. > I would love to see someone perform anime reviews with sock puppets.

    That would be so cool! Kinda like Arlo does on Youtube with Nintendo news? That reminds me of Marcelinho. He was popular I think from 2010 to 2014. A puppet that read 18+ stories and made jokes about them.

    Regarding the main subject of the post – automating blog stuff – that’s a neat solution for what you do! I didn’t know your list of sites was that big!

    A few questions though, as curiosity: did you think about using either a plain RSS reader like Feedly so you only have to go through new posts? And there’s also Goeland, a rss2email fork in Go – https://github.com/slurdge/goeland – that can send you weekly/daily email digests, so you could just see the emails.

    But yeah, I love automating stuff too, although for that I use python. I think I’m going to make a post about what I did for my previous blogs

    1. I did consider an RSS feed! Like I mentioned to Infintezenith, there’s an aspect to my Friday night visits that I didn’t talk about. I like to make sure folks’ sites are up and running. If I see a problem, I’ll try to notify them via Twitter.

      I’ve wanted to get into Python for years. It’s used alot in big data research, and it’s one of the only major languages I have zero experience with. I think it’s a great idea for you to write such a post. I’d love to read it!

      1. Oh, that’s cool! I saw your reply there now.

        Also, I think actually browsing the sites is cooler than navigating rss feeds, right?

        I saw your list, it’s huge!! I saw my previous blog there (falconsensei.wordpress.com). If you want, my new (and I think definitive one) is geekosaur.com. I decided to move away from wordpress, and have my own domain in the end

        1. I do prefer looking at the site rather than just consuming it via an RSS feed.

          We bloggers put a lot of work into our sites. I see it as a sign of respect that I consume the site as they designed it.

          To be clear: I’m not saying anything negative about RSS feeds. I use them! For this feature, though, I want to experience the sites.

          Thanks for pointing out your site’s name! I’ve added it to the list.

          Really enjoyed your Python post, too. Cool example of interacting with Reddit!

  4. Funny, I just spent the entire day staring at JDK 1.8 code. It’s been a few years since I’ve dealt with Java: until very recently, my days consisted of Swift code. Now that I’m in a code review mindset, I see a few magic numbers in there (I’d stick the 50 at the top of the file as a variable to make it easier to change, for instance), and the class should start with a capitalised letter, but beyond this, the code itself looks pretty good πŸ™‚

    For the future, one possible feature could be to do a GET request on the latest post and then do a calculation to see how recent it was. Then if it’s newer than a certain threshold, you could append it to a list and open the post in a new tab to read!

    1. What I should have done with the 50 is make it a command line option. Then I’d just call it with whatever number and not even have to change the code! But alas, I was lazy.

      You’re right about the class names. Someone in my past complained about me capitalizing the names, which I knew was stupid, but I wanted him to STFU, so I started a bad habit. I don’t even remember the details anymore!

      I thought about the GET option and using RSS like Falcon suggested. But I wanted to actually visit each site. Not only do I like to see the posts in the context of their site design, I also want to make sure the sites are working. I didn’t talk about this in the post because I was trying to be concise (which is astonishing for me!), but there are times I’ve caught cert errors or found the sound was broken. That gives me the chance to reach out to the owner via Twitter and make sure they know about it. Most of the time they do, but I’ve helped some folks fix an outage.

      It’s like a really inexpensive (albeit only weekly) up time monitoring service!

Please let me know what you think!

This site uses Akismet to reduce spam. Learn how your comment data is processed.