Wednesday, January 23, 2008

MacBook Air vs. Lenovo X300

A while ago, a few friends showed me a video on youtube on Apple's release of their new ultra-thin & sexy MacBook Air.

Later on, Gizmodo had some *leaked* info & specs on Lenovo's ultra-thin Thinkpad X300.


Thinkpad X300MacBook Air
+Battery is swappable-Battery can't be changed
+Can add 2nd battery-Can not add another battery
+Built-in DVD writer-CD/DVD: Sold as an accessory
+Built-in fingerprint reader-No fingerprint reader
+Max of 4GB RAM @ 667MHz-Max of 2GB RAM @ 667MHz
+Max CPU speed: 2.0GHz, 800MHz FSB, 4MB L2 Cache-Max CPU speed: 1.8GHz, 800MHz FSB, 4MB L2 Cache
+Max resolution of: 1440x900 @ 128 DPI-Max resolution of: 1280x800
+#USB ports: 3-#USB ports: 1
+#Mini PCI-Express: 2 Full, 1 Half-#Mini PCI-Express: None
+Weight: 1.13 kg / 2.5 lbs-Weight: 1.36 kg / 3 lbs
+Antennas: Bluetooth, WiFi 802.11n, WiMAX, GPS, WWAN-Antennas: Bluetooth, WiFi 802.11n
+Wired network: Gigabit Ethernet port-Wired network: Sold as an accessory
+Enclosure material: Magnesium-alloy for a sturdy & durable cage-Enclosure material: Aluminum
+Built-in camera+Built-in camera
-Video output: VGA+Video output: DVI, VGA
-HDD: 64GB Solid State Disk+HDD: 64GB Solid State Disk, 80GB PATA
-Looks: Sturdy, Rigid+Looks: Cute, Sexy
-31.8x23.1x2.34 cm+32.5x22.7x1.94 cm
Keyboard illumination: Backlit keyboardKeyboard illumination: Backlit keyboard with sensor


Seems like the Apple crowd are sacrificing a lot here...

* Lenovo's laptop isn't out for consumers yet.
** Thinkpad's are IBM laptops. Lenovo handles them now, instead of IBM.

Tuesday, January 22, 2008

Arabic Spam: Origination & Continuation

I remember receiving the first bunch in the late '90s, and the start of the new millenium. If I'm not mistaken, the first batch of email addresses was harvested from Microsoft Hotmail's website, where they created default profiles for email users and associated it with their country, as set during the registration process. But that wasn't an easy way to collect the emails, and at the time, wasn't worth the trouble, as companies didn't see the benefit of spam, yet.

A few months later, things evolved and people started harvesting emails from email forwards. Whenever someone sends an email, they stuff all the emails into the "TO" field, and off they send it. When the recipient clicks on the forward button to forward the message, the header of the old message is dumped as part of the new message (the forward), containing the list of emails of all the people to whom it was sent to previously.
I know 4-5 people who actually spend time to clean the email list, before sending a forward. And the same 4-5 people, use the "BCC" field, instead of the "TO" field. Whomever receives the emails, doesn't see the list of emails.
Unfortunately, this continues to this day, and emails are still being collected and spammed, because of 2 things: E-Mail providers are too lazy to filter out list of emails, and people are too lazy to do the same!!!
If you're wondering why this is specific to Arabic Spam only, it's because non-Arab sites spam using Spamming services! They send emails using brute-force and don't actually care whether you're an Arab, or not.

Sunday, January 20, 2008

DSL on Linux

I've been using QualityNet's silver Speedtouch 330 modem for the past 4 years on Linux, and it wasn't an easy task setting it up back then.

I don't like using routers because I don't have much control over them and limited to their restrictive functionality. I have a computer at home running as a gateway, on which I share my Internet connection with the rest of computers on the network.

[i] Internet provided over telephone lines (DSL) use PPP over ATM (PPPoA), where Internet over Fiber use PPP over Ethernet (PPPoE). Even if your ISP tells you to get a PPPoA router, and you have a fiber connection to your house, get a PPPoE and put your fingers in the technicians eyes!!!

Today, I wanted to run my HP ThinClient as gateway at home, instead of my old noisy & power-hungry Pentium4 box. I have Slax as my Linux distribution, using kernel 2.6.16, which according to Linux's Speedtouch FAQs, has built-in features to run the USB ADSL modem! This saves me the pain I had to go through last time on kernel 2.4!

On this page, you'll find the instructions, which I skipped to the part: "The Firmware" -- This is where all the fun begins!

[i] I had to download the firmware & its extractor on my current P4 box, because Slax doesn't have gcc, which means, you can't compile!

[i] If you don't have an existing Linux machine which has gcc, I suggest you download Knoppix, compile on the CD, then copy the 2 .BIN and the file "Makefile" to a USB flash disk. Open the Makefile file in a text editor and see where & how the files are supposed to go.

Create a file, in /etc/ppp, called "pap-secrets" and put the following inside it (including the double quotes!):
"username" "*" "password"


Moving to the next section: "Secrets" -- use the same settings as shown in the green field in the guide, but make sure to replace 2 things:
0) User should be in the form: "user" without the @isp part
1) Replace 0.00 with 8.35; these are the VPI.VCI values used in Kuwait, by the ISP.
Your "/etc/ppp/peers/speedtch" should contain the following:
noipdefault
defaultroute
user "username"
noauth
updetach
usepeerdns
plugin pppoatm.so
8.35

Almost done! What's left now, is to set the thing to automatically dial when it boots; for that, edit the file "/etc/rc.d/rc.local" and add this line:
`which pppd` call speedtch

Note: The characters surrounding "which pppd" are grave accents, not single quotes.

[i] On kernels 2.4.x & pre 2.6.10, the modem needed to be initialized, which took 37 seconds. Now, the modem starts the handshaking process with the ISP immediately.

Tuesday, January 15, 2008

Mass Copy and On-The-Fly Rename

I call this: Mass CPR. It's not the one that involves frenching chicks. This is about copying files & directories from one location to another, and renaming them on the fly, so that they reach the destination with a different name, applying the renaming according to a pattern.A co-worker had an external hard disk, that she put on her Mac to put 20GB of pictures. Later on, she wanted to view the pictures from her Windows laptop, which didn't see the HFS+ filesystem. Luckily, I found another HFS+ formatted disk in the Marketing department, where I copied the files & directories.The problem was that she had put colons in the file & directory names, as separators, instead of hyphens. FAT filesystems do not allow colons in file or directory names. To further complicate things, most of the files & directories were named in Japanese, and there was no temporary space to dump the data (other than the marketing disk). Since my laptop, is the only one with Linux installed, I was the only one capable of seeing the FAT & HFS+ partitions.
I started by writing a small PHP script (in CLI), which only worked on a small segment, and wasn't capable of doing recursions. The hunt for a program to do the dirty work began, and let me tell you this: It was NOT easy!!!After struggling for 4 days, I decided to ask the guys at #linux (on DALnet) and one of the OPs helped and gave me the following line:
pax -rw -s ',:,-,g' src dst
This magnificent line will copy, and replace any colon with a hyphen, all files & directories recursively, from the original directory, named src, to the target directory, named dst.
There was another thought of creating symbolic links to the original files, rename the links, and copy the links to the target disk, with the option to follow the symlinks. I'm not positive if that would've worked as intended, as pax did its job wonderfully. If you want to poke around with this idea, take a look at cpio.

Friday, January 4, 2008

Google Docs: Interactive Transient Docs, Please!

Updated!

As I'm going to write more technical posts, I'll be sharing designs, views, ...etc. with you, and I thought of using Google Docs, since I can write from anywhere and be able to share those documents & spreadsheets.

Yesterday, I emailed my manager a link the design I made for a fileserver, which was first made using Openoffice, then uploaded to Google Docs, and the sheet had forumlas, which change when changing numbers. I realized that my manager can't change the numbers since it's a view-only/public document.

Looking further into all options available, nothing can serve what I need: An Interactive Public Sharing, where each visitor gets to see a transient copy of the document, and make changes, but those changes are never saved.Yes, I realize that I can add you to my list to be able to do it, but that would violate your privacy, which I respect. There's no need for me to see who's viewing my documents, and whoever is viewing, should be able to use those documents to their full extent.

I already submitted a feature request at Google Docs, hoping they'd add that feature soon.

Update: I found a free online document system from a company called Zoho, which offers interactive spreadsheets, where visitors can change values of cells and watch those formulas do the magic! My sheets have a found a home that appreciates it :')

Tuesday, January 1, 2008

Backup Storage Space Estimation

 

a work in progress



Some special characters will be used to give certain meanings:
[ ]: Whatever is between the square brackets, means it's optional.
< >: Whatever is between the angle brackets, means it's mandatory.

Table of Contents:
1. Possible Methods of Backup
1.1. Scenarios of Foiling Backups
1.2. Why The Method Matters
2. Estimation Approach
3. Weapons of Mass Dysfunction
3.1. Guns Loaded
3.2. Marching
3.3. Kung-Fu Style


1. Possible Methods of Backup

I was asked to look for FileServer options, to hold backups of employees' data, or hold the data itself.

There are 3 possible schemes, that I could think of:
  1. Allocate a shared folder/directory on the fileserver, that only the employee has access to, and force all data to be saved there, not on the local disk.

  2. Employees use the local disk for storage, but a copy is kept on a [shared] directory on the fileserver.

  3. Employees use the local disk for storage, but a copy is kept on a [shared] directory on the fileserver. File versioning is enabled here, where the latest copy of the file, and previous copies are kept.

1.1. Scenarios of Foiling Backups

The most reliable method is the 3rd, because if an employee decided to take revenge on the company, s/he would delete the files on the shared directory, or would delete the information inside the existing files, leading to blank files, that will be automatically backed up at a later time. In these 2 scenarios, methods 1 & 2 are defeated, respectively.
Method 3 is foiled when an employee saves the same file, without content (as an example), more times than the maximum versions kept on the fileserver. The result: A set of versions of the file, that have no content.

1.2. Why The Methods Matters

The method should be chosen, depending on the importance of the data. A combination of methods can be used, depending on the employee's rank, and the priority of the data being held. The chosen method will affect the estimation process of the amount of required storage for the backups.

2. Estimation Approach

My idea is to collect some data from workstations, then calculate averages and perform sums, to get the most correct estimation.
The data to be collected is a list of files' path, size, date of creation, last modification & last access. Only the files residing in the user's directory, under "Documents and Settings." will be checked.
As you can see, I already assumed that the machines are running on Windows, and I'm going to further assume it's Windows XP, on an NTFS filesystem. The latter assumption is important, because the required tools to get the data you need, could be specific to a filesystem type.

After data is collected, it's separated into departments; the average file size is calculated for each file extension, and correlations between file creation and modification dates are created, to be able to estimate the amount & average size of new files created & modified per period, hence, estimate the amount of data growth per period. File last access times can be used to estimate the hit-rate on the server. One might need multiple network cards, to serve a big crowd. It's a must, if the fileserver is to have space for other servers as well (i.e., not just for backups).
In case of using file-versioning, keep in mind that file modification could result in copying the whole file as new; this depends on how the versioning utility works: Some copy the difference, others copy the whole file.
The period is a variable, and can be changed to produce different sets of data. Daily check could be beneficial to large corporations, while weekly or monthly, seem more suitable for small to medium companies.

The files' path can be used to calculate the number of sub-directories, and number of files per directory. This can be handy when choosing the hardware for the fileserver, because a directory that holds many sub-directories, with a lot of files inside, requires a lot of RAM & CPU horsepower, however, this applies only when someone wants to restore his/her data. The frequency & times of restores are the factor.

3. Weapons of Mass Dysfunction

Now that the idea is clear, it's time to get some tools & start hacking away. What I need is something to get me all the information I need, quickly, without clutter, without damage, and preferably in command line!
Quickly, in the sense that all information is gathered in one shot, as opposed to traversing the target's directories over & over for each piece of info I'm after. Without clutter means getting the info in a tabular way, to minimize or reduce any chance of filtering and organizing fields of information. It won't be funny, if I caused a faulty filesystem when running a tool, would it? Yeah, I guess not. As for the command line, I just love command line interface (CLI) tools! They give the user so much control over the way the application is used; they can be scheduled, run from scripts, batch files, or other applications, and direct their output to text files, all customized to your set of parameters & options.

3.1. Guns Loaded

Here's a list of what I'm using for this project:
  • PHP: To write scripts to manipulate text, and a few more tricks.

  • FileList: Traverses the directories & gather the required info.

  • Bambalam PHP Compiler: Compile PHP scripts into standalone EXE files.

  • Curl: To send files to an HTTP/FTP server.

  • Apache HTTP Server: To put a PHP script to receive the files.

  • 7-zip: Compression tool.
PHP helps me filter unwanted text and use what I want as an input to other applications, as I can call them from within my PHP script. FileList, is a free tool, that traverses directories and gathers the information I need; it supports patterns and most importantly, the only tool I found that can get the creation time of a file! And it's amazingly fast. Since I'll be executing applications from within PHP, and sending parameters and passwords, I decided to compile my script to an EXE, so that no one looks at the source; a great advantage of doing this, is that I no longer need to put PHP on employees' machines! Apache HTTP is a famous web server, and since I already have a machine at work setup with HTTPD running, I'll use it to receive my files. There is no need to download Curl, as it's part of PHP and is used as an extension.

* Note: All the programs above are free, and all except FileList, are OpenSource.

3.2. Marching

The plan is to execute as follows:
  1. Run "run.exe" with proper parameters
  2. "run.exe" fetches the machine's globally unique MAC Address
  3. "run.exe" feeds the MAC to "FileList.exe" to use it as an output file name
  4. "FileList.exe" traverses the target user directory & its sub-dirs, and gathers info
  5. "FileList.exe" exits and "run.exe" calls "7z.exe" to compress the output with a password
  6. "run.exe" deletes "FileList.exe"'s output file
  7. "run.exe" submits the compressed file to the web server
  8. PHP script on HTTPD receives the file and saves it on the server
  9. "run.exe" deletes all directory files, when server receives the file, then it exits
  10. The Puppet Master (me) gathers all files from the web server
  11. Files are extracted & sorted by departments
  12. A PHP script or more do the proper analysis on the files

3.3. Kung-Fu Style

The time has come, my child, to start the brain whipping and code something useful!

code goes here


To be continued