Tech knowledge dump

Week 04/24/2016

KeyStore vs TrustStore

  • keystore, in which you have the private key and cert you prove your own identity with
  • truststore, which determines who you trust

your own identity also has a 'chain' of trust to the root - which is separate from any chain to a root you need to figure out 'who' you trust.

Reference: Stackoverflow: unable-to-find-valid-certification-path-to-requested-target-error

Print/List/Import certs

1
2
3
4
5
6
7
8
9
10
11
12
13
keytool -printcert -v -file "/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/lib/security/cacerts"

keytool -list -file "/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/lib/security/cacerts"

keytool -importcert -trustcacerts -keystore "/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/lib/security/cacerts" -storepass changeit -alias connectifier-dev -file /etc/ssl/certs/connectifier.crt

# find a specific cert by alias name
keytool -list -keystore "/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/lib/security/cacerts" -storepass changeit -alias "digicertglobalrootg2 [jdk]"

# delete a specific cert by alias name
keytool -delete -keystore "/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/lib/security/cacerts" -storepass changeit -alias "digicertglobalrootg2 [jdk]"

sudo $JAVA_HOME/bin/keytool -importcert -trustcacerts -keystore /etc/riddler/cacerts -storepass changeit -alias connectifier-dev -file /etc/ssl/certs/connectifier.crt

For a Java application, cacerts is usually stored in JDK, e.g $JAVA_HOME/jre/lib/security/cacerts. JVM links it when application starts. For a container application, you may want to restart your container to apply the effect.

Test truststore after importing

send a SSL http request to a specified url with specified truststore

1
java -Djavax.net.debug=all -Djavax.net.ssl.trustStore=/etc/riddler/cacerts SSLPoke local.connectifier.com 443

SSLPoke is a Java class file which need to be downloaded

https://confluence.atlassian.com/kb/unable-to-connect-to-ssl-services-due-to-pkix-path-building-failed-779355358.html https://stackoverflow.com/questions/9210514/unable-to-find-valid-certification-path-to-requested-target-error-even-after-c


Forward proxy vs Reverse proxy

Differences

proxy_reverse_proxy1

proxy_reverse_proxy1


proxy_reverse_proxy2

proxy_reverse_proxy2

Use Case

Both:

  • single point of access and control
  • firewall(whitelist, blacklist)
  • hide identity
  • request header edit/rewrite
  • cache (static content)

Forward proxy

  • monitoring and filtering
  • content-control
  • bypassing filters and censorship
  • logging and eavesdropping
  • translation
  • accessing services anonymously (e.g. TOR or FreeNet)

Reverse proxy

  • load balancing
  • traffic routing
  • A/B testing and multivariate testing
  • SSL encryption
  • optimizing speed by compressing content

Forward-Proxy-vs-Reverse-Proxy

Stackoverflow: difference-between-proxy-server-and-reverse-proxy-server


Covariance vs Coorelation

Covariance

Covariance is a measure of how much two random variables vary together. It’s similar to variance, but where variance tells you how a single variable varies, co variance tells you how two variables vary together.

\[$ Cov(X, Y) = {\sum{(x_i - \bar{X}) (y_i - \bar{Y})} \over (n - 1)} \]$

A large covariance can mean a strong relationship between variables. However, you can’t compare variances over data sets with different scales (like pounds and inches). A weak covariance in one data set may be a strong one in a different data set with different scales.

statisticshowto/covariance

The problem with covariances is that they are hard to compare. The solution to this is to 'normalize' the covariance: you divide the covariance by something that represents the diversity and scale in both the covariates, and end up with a value that is assured to be between -1 and 1: the correlation.

correlation-and-covariance

Coorelation Coefficient

Correlation coefficients are used in statistics to measure how strong a relationship is between two variables. There are several types of correlation coefficient: Pearson’s correlation or Pearson correlation is a correlation coefficient commonly used in linear regression.

Pearson correlation

\[$ r = { \sum_{i = 1}^n{( x_i - \bar{X}} ) \sum_{i = 1}^n{( y_i - \bar{Y}} ) \over \sqrt {\sum_{i = 1}^n{(x_i - \bar{X})^2}} \sqrt {\sum_{i = 1}^n{(y_i - \bar{Y})^2}} } \]$

Alternative formulae:

\[$ r = { \sum{x_i y_i} - n \bar{X} \bar{Y} \over{(n-1) S_x S_y} } \]$

where \[$S_x, S_y\]$ is standard deviation of x and y

Wiki/Correlation coefficient statisticshowto/Correlation coefficient

Advantages of the Correlation Coefficient

The Correlation Coefficient has several advantages over covariance for determining strengths of relationships:

  • Covariance can take on practically any number while a correlation is limited: -1 to +1.
  • Because of it’s numerical limitations, correlation is more useful for determining how strong the relationship is between the two variables.
  • Correlation does not have units. Covariance always has units
  • Correlation isn’t affected by changes in the center (i.e. mean) or scale of the variables

Week 01/23/2016

CIDR

What is CIDR?

Classless Inter-Domain Routing Definition

CIDR notation is a compact representation of an IP address and its associated routing prefix. The number is the count of leading 1 bits in the routing mask, traditionally called the network mask.

E.g. 192.168.100.14/24 represents the IPv4 address 192.168.100.14 and its associated routing prefix 192.168.100.0, or equivalently, its subnet mask 255.255.255.0, which has 24 leading 1-bits.

The address may denote a single, distinct interface address or the beginning address of an entire network.

E.g. The IPv4 block 192.168.100.0/22 represents the 1024 IPv4 addresses from 192.168.100.0 to 192.168.103.255.

Breaking down 192.168.100.0/22

Before the implementation of CIDR, 192.168.100.0/24 was often written as 192.168.100.0/255.255.255.0.

Decimal
Address 192.168.100.0
Netmask 255.255.252.0 = 22
HostMin 192.168.100.1
HostMax 192.168.103.254
Broadcast 192.168.103.255
History

As the initial TCP/IP network grew to become the Internet during the 1980s, the need for more flexible addressing schemes became increasingly apparent. This led to the successive development of subnetting and CIDR. The network class distinctions were removed, and the new system was described as being classless, with respect to the old system, which became known as classful.

More to read:

Advantages
  • CIDR provides fine-grained routing prefix aggregation, more flexible
  • Compact representation: allowing blocks of addresses to be grouped into single routing table entries
How I met this problem?

CIDR is used in Connectifier AdminPathFilter as a representation of IP whiltelist, it's a standard Java network Library, two common implementations: CIDR4, CIDR6


Mailgun testing script

Sending a plain text message:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Template:
curl -s --user 'api:YOUR_API_KEY' \
https://api.mailgun.net/v3/YOUR_DOMAIN_NAME/messages \
-F from='Excited User <mailgun@YOUR_DOMAIN_NAME>' \
-F to=YOU@YOUR_DOMAIN_NAME \
-F to=bar@example.com \
-F subject='Hello' \
-F text='Testing some Mailgun awesomness!'

Sample response:
{
"message": "Queued. Thank you.",
"id": "<20111114174239.25659.5817@samples.mailgun.org>"
}
1
2
3
4
5
6
7
Testing script:
curl -v -s --user 'api:key-4orjxsuw70ngigehr6o28q6qoz58jg75' \
https://api.mailgun.net/v3/connectifier.com/messages \
-F from='Excited User <mailgun@connectifier.com>' \
-F to=cyz19892002@gmail.com \
-F subject='Hello' \
-F text='Testing some Mailgun awesomness!'

Week 01/09/2016

Use Curl post with basic auth

--basic: specify basic authentication, -u specify username and password

1
curl --basic -u asia@connectifier.com:asdf1234 -H "Content-Type: application/json" -X POST --data-raw '{"liSalesforceAccountId":"123456789012345","productType":"CONNECTIFIER_CORP_SEAT","members":[{"email":"test10@test4.com","memberId":78100001,"manager":false},{"email":"test20@test4.com","memberId":78100002,"manager":false}],"freeTrial":false}' https://local.connectifier.com/admin/provision
  • -H specify header data, must specify Content-Type: application/json if you are posting JSON
  • -X POST specify post method
  • --data-raw similarly to --data but without the special interpretation of the @ character

Use Curl to bypass Connectifier HttpsFilter

Add header -H "X-Forwarded-Proto:https”

1
curl --basic -u asia@connectifier.com:asdf1234 -H "Content-Type: application/json" -H "X-Forwarded-Proto:https" -X POST --data-raw '{ "liSalesforceAccountId" : "123456789012345", "productType" : "CONNECTIFIER_CORP_SEAT", "members" : [ { "email" : "yuliinfokiller@test.linkedin.com", "memberId" : 13542747, "isManager" : true } ] }' http://lca1-app0415.stg.linkedin.com:1584/in/provision

Week 12/19/2016

Debug unit test in bitbucket Connectifier doesn’t work

tag:play,sbt,debug,test

In test task, each unit test is forked by default. Set fork := false in frontend/build.sbt will solve the problem.

Notice, you also need to use parameters -jvm-debug 5005 or -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005 with sbt

More to read, scala-sbt.org

Drop Mongo Database in a batch script [MongoDB, Javascript]

Drop all databases whose name starts from “test"

1
2
3
4
5
6
7
for(var i in dbs){
db = db.getMongo().getDB( dbs[i] );
if (db.getName().match("^test.*")){
print( "dropping db " + db.getName() );
db.dropDatabase();
}
}

alternative (need some tweaks)

1
mongo --quiet --eval 'db.getMongo().getDBNames().forEach(function(i){db.getSiblingDB(i).dropDatabase()})'

Week 08/07/2017

Difference between git rm --cached and git update-index --assume-unchanged?

1
git rm --cached <file>

is used to untrack files in a Git branch. This command will remove the file from the staging area and also will remove the file from the repository next time you commit.

1
git update-index --assume-unchanged <file>

will also make the file disappear from the staging area. However, this command is different because it tells Git to only temporarily ignore any changes made to the file. So when you commit the file it will remain a part of the repository assuming it were already there. When you want Git to see the changes made to the file again, you can run this:

git update-index --no-assume-unchanged <file> This will return the file to the staging area, if it were there when you ran assume-unchanged earlier.

Here is a link for git rm --cached, and here is a link for git update-index --assume-unchanged.


Week 09/11/2017

HTTPS ValidatorException Debugging

Exception: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors

A useful tool: SSLPoke

1
java SSLPoke local.connectifier.com 443
  • To enable debugging messages: -Djavax.net.debug=all
  • To specify a specific version of java: use /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/bin/java instead of java (in Mac OSX)
  • To specify a specific trust store: -Djavax.net.ssl.trustStore=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/security/cacerts

Week 10/24/2017

Servlet, servlet container and Jetty Server

Servlet

What is servlet?

A servlet is simply a class which responds to a particular type of network request - most commonly an HTTP request. It declares three essential methods for the life cycle of a servlet - init(), service(), and destroy().

1997 Sun announced the servlet interface.

Servlets targeted CGI. Unlike CGI, which starts a process for each request, Servlets run in a single process using finer grain threads instead.

JSP

Servlets require real Java programming skills. So, what can the graphics people (frontend engineers) do? They can thank Sun for JavaServer Pages (JSP), which was released in 1998. Inspired, some say copied, by the immensely successful Microsoft ASP, Sun made it easy to write dynamic HTML pages.

jsp-life.png

jsp-life.png

References:

JSP and Servlet Overview

Servlet container

A web(Servlet) container (also known as a servlet container), is a component of a web server, that is responsible for managing the lifecycle of servlets, mapping a URL to a particular servlet and ensuring that the URL requester has the correct access-rights.

A web(Servlet) container implements the web component contract of the Java EE architecture, specifying a runtime environment for web components that includes security, concurrency, lifecycle management, transaction, deployment, and other services.

servlet-container-life-cycle.jpg

servlet-container-life-cycle.jpg

Common servlet container

  • Apache Tomcat
  • Glassfish
  • Jetty
  • JBoss

Jetty vs Tomcat

Jetty:

  • Full-featured and standards-based.
  • Embeddable and Asynchronous.
  • Open source and commercially usable.
  • Dual licensed under Apache and Eclipse.
  • Flexible and extensible, Enterprise scalable.
  • Strong Tools, Application, Devices and Cloud computing supported.
  • Low maintenance cost.
  • Small and Efficient.

Tomcat:

  • Famous open source under Apache.
  • Easier to embed Tomcat in your applications, e.g. in JBoss.
  • Implements the Servlet 3.0, JSP 2.2 and JSP-EL 2.2 support.
  • Strong and widely commercially usable and use.
  • Easy integrated with other application such as Spring.
  • Flexible and extensible, Enterprise scalable.
  • Faster JSP parsing.
  • Stable.

Jetty vs Netty

Jetty is a web server (HTTP), similar to the likes of Tomcat and such, but lighter than most servlet containers. This is closer to the traditional Java way of doing server applications (servlets, WAR files). Like Netty it is sufficiently lightweight to be embedded into Java applications.

Netty is a NIO client server framework which enables quick and easy development of network applications such as protocol servers and clients. It greatly simplifies and streamlines network programming such as TCP and UDP socket server. So Netty is focusing on helping to write NIO/non-blocking, asynchronous network programs.

If you deal a lot with network protocols and want it to be non-blocking use Netty (usually for high-performance cases). If you want a light HTTP servlet container use Jetty.

References:

whats-the-difference-between-jetty-and-netty


Week 10/30/2017

What does the curl --insecure option mean?

curl provides an option -k/--insecure which disables certificate validation. It will skip the certificate-chain validation, but the transferred data is still encrypted.

https://stackoverflow.com/questions/8520147/curl-insecure-option


Week 11/27/2017

What is middleware?

Middleware is software which lies between an operating system and the applications running on it. Essentially functioning as hidden translation layer, middleware enables communication and data management for distributed applications.

Middleware is a terribly nebulous term. What is "middleware" in one case won't be in another.

In general, you can expect something classed as middleware to have the following characteristics:

  • Primarily (usually exclusively) software; usually doesn't need any specialized hardware.
  • If it weren't there, applications that depend on it would have to incorporate it as part of their application and would experience a lot of duplication.
  • Almost certainly connects two applications and passes data between them.

You'll notice that this is pretty much the same definition as an operating system. So, for instance, a TCP/IP stack or caching could be considered middleware. But your OS could provide the same features, too. Indeed, middleware can be thought of like a special extension to an operating system, specific to a set of applications that depend on it. It just provides a higher-level service.

Some examples of middleware:

  • distributed cache
  • message queue
  • transaction monitor
  • packet rewriter
  • automated backup system

Quote an example from Stackoverflow:

Lets say your company makes 4 different products, your client has another 3 different products from another 3 different companies.

Someday the client thought, why don't we integrate all our systems into one huge system. Ten minutes later their IT department said that will take 2 years.

You (the wise developer) said, why don't we just integrate all the different systems and make them work together in a homogeneous environment? The client manager staring at you... You continued, we will use a Middleware, we will study the Inputs/Outputs of all different systems, the resources they use and then choose an appropriate Middleware framework.

Still explaining to the non tech manager With Middleware framework in the middle, the first system will produce X stuff, the system Y and Z would consume those outputs and so on.

https://stackoverflow.com/questions/2904854/what-is-middleware-exactly/2904938#2904938

https://azure.microsoft.com/en-in/overview/what-is-middleware/

Differences Prototype and MVP

mvp-vs-prototype.jpg

mvp-vs-prototype.jpg


Existing LinkedIn

Intellij SBT refreshing error

error log: Settings logger used after project was loaded.

It's possibly related to this file: LogManager.scala

It usually happens after you have done something with play console, then refresh IntelliJ's sbt

play reload may solve the problem, sometimes

delpoy config manually

lid-client deploy connectifier -f ei-lca1 --config lid-client control stop -f ei-lca1 -a in-mongodb

rain commands

1
2
3
4
5
6
7
rain slice show 6018e7f9-51b2-4c6b-9739-83beee88930a
rain slice list --application in-mongodb
rain slice show 2cd24555-57a6-4635-b142-729afa562bdd

rain instance list 2cd24555-57a6-4635-b142-729afa562bdd
rain instance create -f ei-lca1 2cd24555-57a6-4635-b142-729afa562bdd
rain instance delete 2cd24555-57a6-4635-b142-729afa562bdd lca1-app1222.stg.linkedin.com -f ei-lca1

HTTP

Queryparameter vs Body

multipart/form-data vs application/x-www-form-urlencoded

curl -d, --data: application/x-www-form-urlencoded -F, --form: multipart/form-data


Email multipart and MIME

Sending Multi-Part Mime Messages (Sending HTML and Text messages in a single email)

Multi-part mime refers to sending both an HTML and TEXT part of an email message in a single email. When a subscriber’s email client receives a multipart message, it accepts the HTML version if it can render HTML, otherwise it presents the plain text version. Additionally, some recently upgraded clients, such as Outlook 2003, enable users to choose to accept HTML or plain text messages by default.

Why send both?

It is assumed that by sending both an HTML version and a TEXT version you will reach your maximum audience as well as give those who have advanced email client software the opportunity to choose a desirable format.

Technical Details of MIME

MIME -- Multipurpose Internet Mail Extensions (MIME) is an Internet Standard for the format of e-mail. Virtually all Internet e-mail is transmitted via SMTP in MIME format. Internet e-mail is so closely associated with the SMTP and MIME standards that it is sometimes called SMTP/MIME e-mail.